Today's lesson describes three groups of built-in Perl functions:
Many of the functions described today use features of the UNIX operating system. If you are using Perl on a machine that is not running UNIX, some of these functions might not be defined or might behave differently. Check the documentation supplied with your version of Perl for details on which functions are supported or emulated on your machine |
Perl provides a wide range of functions that manipulate both the program currently being executed and other programs (also called processes) running on your machine. These functions are divided into four groups:
The following sections describe these four groups of process- and program-manipulation functions.
Several built-in functions provide different ways of creating processes: eval, system, fork, pipe, exec, and syscall. These functions are described in the following subsections.
The eval function treats a character string as an executable Perl program.
The syntax for the eval function is
eval (string);
Here, string is the character string that is to become a Perl program.
For example, these two lines of code:
$print = "print (\"hello, world\\n\");"; eval ($print);
print the following message on your screen:
hello, world
The character string passed to eval can be a character-string constant or any expression that has a value which is a character string. In this example, the following string is assigned to $print, which is then passed to eval:
print ("hello, world\n");
The eval function uses the special system variable $@ to indicate whether the Perl program contained in the character string has executed properly. If no error has occurred, $@ contains the null string. If an error has been detected, $@ contains the text of the message.
The subprogram executed by eval affects the program that
called it; for example, any variables that are changed by the
subprogram remain changed in the main program. Listing 13.1 provides
a simple example of this.
Listing 13.1. A program that illustrates the behavior of eval.
1: #!/usr/local/bin/perl 2: 3: $myvar = 1; 4: eval ("print (\"hi!\\n\"); \$myvar = 2;"); 5: print ("the value of \$myvar is $myvar\n");
$ program13_1 hi! the value of $myvar is 2 $
The call to eval in line 4 first executes the statement
print ("hi!\n");
Then it executes the following assignment, which assigns 2 to $myvar:
$myvar = 2;
The value of $myvar remains 2 in the main program,
which means that line 5 prints the value 2. (The backslash
preceding the $ in $myvar ensures that the Perl
interpreter does not substitute the value of $myvar for
the name before passing it to eval.)
NOTE |
If you like, you can leave off the final semicolon in the character string passed to eval, as follows: eval ("print (\"hi!\\n\"); \$myvar = 2"); As before, this prints hi! and assigns 2 to $myvar |
The eval function has one very useful property: If the subprogram executed by eval encounters a fatal error, the main program does not halt. Instead, the subprogram terminates, copies the error message into the system variable $@, and returns to the main program.
This feature is very useful if you are moving a Perl program from
one machine to another and you are not sure whether the new machine
contains a built-in function you need. For example, Listing 13.2
tests whether the tell function is implemented.
Listing 13.2. A program that uses eval to test whether a function is implemented.
1: #!/usr/local/bin/perl 2: 3: open (MYFILE, "file1") || die ("Can't open file1"); 4: eval ("\$start = tell(MYFILE);"); 5: if ($@ eq "") { 6: print ("The tell function is defined.\n"); 7: } else { 8: print ("The tell function is not defined!\n"); 9: }
$ program13_2 The tell function is defined. $
The call to eval in line 4 creates a subprogram that calls the function tell. If tell is defined, the subprogram assigns the location of the next line (which, in this case, is the first line) to read to the scalar variable $start. If tell is not defined, the subprogram places the error message in $@.
Line 5 checks whether $@ is the null string. If $@
is empty, the subprogram in line 4 executed without generating
an error, which means that the tell function is implemented.
(Because assignments performed in the subprogram remain in effect
in the main program, the main program can call seek using
the value in $start, if desired.) If $@ is not
empty, the program assumes that tell is not defined,
and it prints a message proclaiming that fact. (This program is
assuming that the only reason the subprogram could fail is because
tell is not defined. This is a reasonable assumption,
because you know that the file referenced by MYFILE has
been successfully opened.)
Although eval is very useful, it is best to use it only for small programs. If you need to generate a larger program, it might be better to write the program to a file and call system to execute it. (The system function is described in the following section.) Because statements executed by eval affect the program that calls it, the behavior of complicated programs might become difficult to track if eval is used to excess. |
You have seen examples of the system function in earlier lessons.
The syntax for the system function is
system (list);
This function is passed a list as follows: The first element of the list contains the name of a program to execute, and the other elements are arguments to be passed to the program.
When system is called, it starts a process that runs
the program and waits until the process terminates. When the process
terminates, the error code is shifted left eight bits, and the
resulting value becomes system's return value. Listing
13.3 is a simple example of a program that calls system.
Listing 13.3. A program that calls system.
1: #!/usr/local/bin/perl 2: 3: @proglist = ("echo", "hello, world!"); 4: system(@proglist);
$ program13_3 hello, world! $
In this program, the call to system
executes the UNIX program echo, which displays its arguments.
The argument passed to echo is hello, world!.
TIP |
When you start another program using system, output data might be mixed, out of sequence, or duplicated. To get around this problem, set the system variable $|, defined for each file, to 1. The following is an example: select (STDOUT); $| = 1; select (STDERR); $| = 1; When $| is set to 1, no buffer is defined for that file, and output is written out right away. This ensures that the output behaves properly when system is called. See "Redirecting One File to Another" on Day 12, "Working with the File System," for more information on select and $| |
The fork function creates two copies of your program: the parent process and the child process. These copies execute simultaneously.
The syntax for the fork function is
procid = fork();
fork returns zero to the child process and a nonzero value to the parent process. This nonzero value is the process ID of the child process. (A process ID is an integer that enables the system to distinguish this process from the other processes currently running on the machine.)
The return value from fork enables you to determine which process is the child process and which is the parent. For example:
$retval = fork(); if ($retval == 0) { # this is the child process exit; # this terminates the child process } else { # this is the parent process }
If fork is unable to execute, the return value is a special undefined value for which you can test by using the defined function. (For more information on defined, see Day 14, "Scalar- Conversion and List-Manipulation Functions.")
To terminate a child process created by fork, use the
built-in function exit, which is described later in today's
lesson.
Be careful when you use the fork function. The following are a few examples of what can go wrong: |
The pipe function is designed to be used in conjunction with the fork function. It provides a way for the child and parent processes to communicate.
The syntax for the pipe function is
pipe (infile, outfile);
pipe requires two arguments, each of which is a file variable that is not currently in use-in this case, infile and outfile. After pipe has been called, information sent via the outfile file variable can be read using the infile file variable. In effect, the output from outfile is piped to infile.
To use pipe with fork, do the following:
The process in which outfile is still open can now send data to the process in which infile is still open. (The child can send data to the parent, or vice versa, depending on which process closes input and which closes output.)
Listing 13.4 shows how pipe works. It uses fork
to create a parent and child process. The parent process reads
a line of input, which it passes to the child process. The child
process then prints it.
Listing 13.4. A program that uses fork and pipe.
1: #!/usr/local/bin/perl 2: 3: pipe (INPUT, OUTPUT); 4: $retval = fork(); 5: if ($retval != 0) { 6: # this is the parent process 7: close (INPUT); 8: print ("Enter a line of input:\n"); 9: $line = <STDIN>; 10: print OUTPUT ($line); 11: } else { 12: # this is the child process 13: close (OUTPUT); 14: $line = <INPUT>; 15: print ($line); 16: exit (0); 17: }
$ program13_4 Enter a line of input: Here is a test line Here is a test line $
Line 3 defines the file variables INPUT and OUTPUT. Data sent to OUTPUT can be now read from INPUT.
Line 4 splits the program into a parent process and a child process. Line 5 then determines which process is which.
The parent process executes lines 7-10. Because the parent process is sending data through OUTPUT, it has no need to access INPUT; therefore, line 7 closes INPUT.
Lines 8 and 9 obtain a line of data from the standard input file. Line 10 then sends this line of data to the child process via the file variable OUTPUT.
The child process executes lines 13-16. Because the child process is receiving data through INPUT, it does not need access to OUTPUT; therefore, line 13 closes OUTPUT.
Line 14 reads data from INPUT. Because data from OUTPUT is piped to INPUT, the program waits until the data is actually sent before continuing with line 15.
Line 16 uses exit to terminate the child process. This also automatically closes INPUT.
Note that the <INPUT> operator behaves like any
other operator that reads input (such as, for instance, <STDIN>).
If there is no more data to read, INPUT is assumed to
be at the "end of file," and <INPUT>
returns the null string.
Traffic through the file variables specified by pipe can flow in only one direction. You cannot have a process both send and receive on the same pipe. If you need to establish two-way communication, you can open two pipes, one in each direction |
The exec function is similar to the system function, except that it terminates the current program before starting the new one.
The syntax for the exec function is
exec (list);
This function is passed a list as follows: The first element of the list contains the name of a program to execute, and the other elements are arguments to be passed to the program.
For example, the following statement terminates the Perl program and starts the command mail dave:
exec ("mail dave");
Like system, exec accepts additional arguments that are assumed to be passed to the command being invoked. For example, the following statement executes the command vi file1:
exec ("vi", "file1");
You can specify the name that the system is to use as the program name, as follows:
exec "maildave" ("mail dave");
Here, the command mail dave is invoked, but the program name is set to maildave. (This affects the value of the system variable $0, which contains the name of the running program. It also affects the value of argv[0] if the program to be invoked was originally written in C.)
exec often is used in conjunction with fork:
when fork splits into two processes, the child process
starts another program using exec.
exec has the same output-buffering problems as system. See the description of system, earlier in today's lesson, for a description of these problems and how to deal with them |
The syscall function calls a system function.
The syntax for the syscall function is
syscall (list);
syscall expects a list as its argument. The first element of the list is the name of the system call to invoke, and the remaining elements are arguments to be passed to the call.
If an argument in the list passed to syscall is a numeric
value, it is converted to a C integer (type int). Otherwise,
a pointer to the string value is passed. See the syscall
UNIX manual page or the Perl documentation for more details.
NOTE |
The Perl header file syscall.ph must be included in order to use syscall: require ("syscall.ph") For more information on require, see Day 20, "Miscellaneous Features of Perl." |
The following sections describe the functions that terminate either the currently executing program or a process running elsewhere on the system: die, warn, exit, and kill.
The die and warn functions provide a way for programs to pass urgent messages back to the user who is running them.
The die function terminates the program and prints an error message on the standard error file.
The syntax for the die function is
die (message);
message is the error message to be displayed.
For example, the call
die ("Cannot open input file\n");
prints the following message and then exits:
Cannot open input file
die can accept a list as its argument, in which case all elements of the list are printed.
@diemsg = ("I'm about ", "to die\n"); die (@diemsg);
This prints out the following message and then exits:
I'm about to die
If the last argument passed to die ends with a newline character, the error message is printed as is. If the last argument to die does not end with a newline character, the program filename and line number are printed, along with the line number of the input file (if applicable). For example, if line 6 of the file myprog is
die ("Cannot open input file");
the message it prints is
Cannot open input file at myprog line 6.
The warn function, like die, prints a message on the standard error file.
The syntax for the warn function is
warn (message);
As with die, message is the message to be displayed.
warn, unlike die, does not terminate. For example, the statement
warn ("Input file is empty");
sends the following message to the standard error file, and then continues executing:
Input file is empty at myprog line 76.
If the string passed to warn is terminated by a newline character, the warning message is printed as is. For example, the statement
warn("Danger! Danger!\n");
sends
Danger! Danger!
to the standard error file.
NOTE |
If eval is used to invoke a program that calls die, the error message printed by die is not printed; instead, the error message is assigned to the system variable $@ |
The exit function terminates a program.
If you like, you can specify a return code to be passed to the system by passing exit an argument using the following syntax:
exit (retcode);
retcode is the return code you want to pass.
For example, the following statement terminates the program with a return code of 2:
exit(2);
The kill function enables you to send a signal to a group of processes.
The syntax for invoking the kill function is
kill (signal, proclist);
In this case, signal is the numeric signal to send. (For example, a signal of 9 kills the listed processes.) proclist is a list of process IDs (such as the child process ID returned by fork).
signal also can be a signal name enclosed in quotes, as in "INT".
For more details on the signals you can send, refer to the kill UNIX manual page.
The sleep, wait, and waitpid functions delay the execution of a particular program or process.
The sleep function suspends the program for a specified number of seconds.
The syntax for the sleep function is
sleep (time);
time is the number of seconds to suspend program execution.
The function returns the number of seconds that the program was actually stopped.
For example, the following statement puts the program to sleep for five seconds:
sleep (5);
The wait function suspends execution and waits for a child process to terminate (such as a process created by fork).
The wait function requires no arguments:
procid = wait();
When a child process terminates, wait returns the process ID, procid, of the process that has terminated. If no child processes exist, wait returns -1.
The waitpid function waits for a particular child process.
The syntax for the waitpid function is
waitpid (procid, waitflag);
procid is the process ID of the process to wait for, and waitflag is a special wait flag (as defined by the waitpid or wait4 manual page). By default, waitflag is 0 (a normal wait). waitpid returns 1 if the process is found and has terminated, and it returns -1 if the child process does not exist.
Listing 13.5 shows how waitpid can be used to control
process execution.
Listing 13.5. A program that uses waitpid.
1: #!/usr/local/bin/perl 2: 3: $procid = fork(); 4: if ($procid == 0) { 5: # this is the child process 6: print ("this line is printed first\n"); 7: exit(0); 8: } else { 9: # this is the parent process 10: waitpid ($procid, 0); 11: print ("this line is printed last\n"); 12: }
$ program13_5 this line is printed first this line is printed last $
Line 3 splits the program into a parent process and a child process. The parent process is returned the process ID of the child process, which is stored in $procid.
Lines 6 and 7 are executed by the child process. Line 6 prints the following line:
this line is printed first
Line 7 then calls exit, which terminates the child process.
Lines 10 and 11 are executed by the parent process. Line 10 calls waitpid and passes it the ID of the child process; therefore, the parent process waits until the child process terminates before continuing. This means that line 11, which prints the second line, is guaranteed to be executed after the first line is printed.
As you can see, wait can be used to force the order of
execution of processes.
NOTE |
For more information on the possible values that can be passed as waitflag, examine the file wait.ph, which is available from the same place you retrieved your copy of Perl. (It might already be on your system.) You can find out more also by investigating the waitpid and wait4 manual pages |
The caller, chroot, local, and times functions perform various process and program-related actions.
The caller function returns the name and the line number of the program that called the currently executing subroutine.
The syntax for the caller function is
subinfo = caller();
caller returns a three-element list, subinfo, consisting of the following:
This routine is used by the Perl debugger, which you'll learn about on Day 21, "The Perl Debugger." For more information on packages, refer to Day 20, "Miscellaneous Features of Perl."
The chroot function duplicates the functionality of the chroot function call.
The syntax for the chroot function is
chroot (dir);
dir is the new root directory.
In the following example, the specified directory becomes the root directory for the program:
chroot ("/u/jqpublic");
For more information, refer to the chroot manual page.
The local function was introduced on Day 9, "Using Subroutines." It declares that a copy of a named variable is to be defined for a subroutine. (Refer to that day for examples that use local inside a subroutine.)
local can be used also to define a copy of a variable for use inside a statement block (a collection of statements enclosed in brace brackets), as follows:
if ($var == 14) { local ($localvar); # stuff goes here }
This defines a local copy of the variable $localvar for
use inside the statement block. Any other copies of $localvar
that exist are not affected by the changes to this local copy.
DON'T use local inside a loop, as in this example: while ($var <= 14) { local ($myvar); # stuff goes here } Here, a new copy of $myvar is defined each time the loop iterates. This is probably not what you want. |
The times function returns the amount of job time consumed by this program and any child processes of this program.
The syntax for the times function is
timelist = times
As you can see, times accepts no arguments. It returns timelist, a list consisting of the following four floating-point numbers:
Perl provides functions that perform the standard trigonometric operations, plus some other useful mathematical operations. The following sections describe these functions: sin, cos, atan2, sqrt, exp, log, abs, rand, and srand.
The sin and cos functions are passed a scalar value and return the sine and cosine, respectively, of the value.
The syntax of the sin and cos functions is
retval = sin (value); retval = cos (value);
value is a placeholder here. It can be the value stored in a scalar variable or the result of an expression; it is assumed to be in radians. See the following section, "The atan2 Function," to find out how to convert from radians to degrees.
The atan2 function calculates and returns the arctangent of one value divided by another, in the range -p to p.
The syntax of the atan2 function is
retval = atan2 (value1, value2);
If value1 and value2 are equal, retval is the value of p divided by 4.
Listing 13.6 shows how you can use this to convert from degrees
to radians.
Listing 13.6. A program that contains a subroutine that converts from degrees to radians.
1: #!/usr/local/bin/perl 2: 3: $rad90 = °rees_to_radians(90); 4: $sin90 = sin($rad90); 5: $cos90 = cos($rad90); 6: print ("90 degrees:\nsine is $sin90\ncosine is $cos90\n"); 7: 8: sub degrees_to_radians { 9: local ($degrees) = @_; 10: local ($radians); 11: 12: $radians = atan2(1,1) * $degrees / 45; 13: }
$ program13_6 90 degrees: sine is 1 cosine is 6.1230317691118962911e-17 $
The subroutine degrees_to_radians converts from degrees to radians by multiplying by p divided by 180. Because atan2(1,1) returns p divided by 4, all the subroutine needs to do after that is divide by 45 to obtain the number of radians.
In the main body of the program, line 3 converts 90 degrees to
the equivalent value in radians (p divided by 2). Line 4 then
passes this value to sin, and line 5 passes it to cos.
NOTE |
The trigonometric operations provided here are sufficient to enable you to perform the other important trigonometric operations. For example, to obtain the tangent of a value, obtain the sine and cosine of the value by calling sin and cos, and then divide the sine by the cosine |
The sqrt function returns the square root of the value it is passed.
The syntax for the sqrt function is
retval = sqrt (value);
value can be any positive number.
The exp function returns the number e ** value, where e is the standard mathematical constant (the base for the natural logarithm) and value is the argument passed to exp.
The syntax for the exp function is
retval = exp (value);
To retrieve e itself, pass exp the value 1.
The log function takes a value and returns the natural (base e) logarithm of the value.
The syntax for the log function is
retval = log (value);
The log function undoes exp; the expression
$var = log (exp ($var));
always leaves $var with the value it started with (if you factor in round-off error).
The abs function returns the absolute value of a number. This is defined as follows: if a value is less than zero, abs negates it and returns the result.
$result = $abs(-3.5); # returns 3.5
Otherwise, the result is identical to the value:
$result = $abs(3.5); # returns 3.5 $result = $abs(0); # returns 0
The syntax for the abs function is
retval = abs (value);
value can be any number.
NOTE |
abs is not defined in Perl 4 |
The rand and srand functions enable Perl programs to generate random numbers.
The rand function is passed an integer value and generates a random floating-point number between 0 and the value.
The syntax for the rand function is
retval = rand (num);
num is the integer value passed to rand, and retval is a random floating-point number between 0 and the num.
For example, the following statement generates a number between 0 and 10 and returns it in $retval:
$retval = rand (10);
srand initializes the random-number generator used by rand. This ensures that the random numbers generated are, in fact, random. (If you do not use srand, you'll get the same set of random numbers each time.)
The syntax for the srand function is
srand (value);
srand accepts an integer value as an argument; if no argument is supplied, srand calls the time function and uses its return value as the random-number seed.
For an example that uses rand and srand, see
the section titled "Returning a Value from a Subroutine"
on Day 9.
NOTE |
The following values and functions return numbers that can make useful random-number seeds: For best results, combine two or more of these using the | (bitwise OR) operator |
This section describes the built-in Perl functions that manipulate character strings. These functions enable you to do the following:
The index function provides a way of indicating the location of a substring in a string.
The syntax for the index function is
position = index (string, substring);
string is the character string to search in, and substring is the character string being searched for. position returns the number of characters skipped before substring is located; if substring is not found, position is set to -1.
Listing 13.7 is a program that uses index to locate a
substring in a string.
Listing 13.7. A program that uses the index function.
1: #!/usr/local/bin/perl 2: 3: $input = <STDIN>; 4: $position = index($input, "the"); 5: if ($position >= 0) { 6: print ("pattern found at position $position\n"); 7: } else { 8: print ("pattern not found\n"); 9: }
$ program13 7 Here is the input line I have typed. pattern found at position 8 $
This program searches for the first occurrence of the word the. If it is found, the program prints the location of the pattern; if it is not found, the program prints pattern not found.
You can use the index function to find more than one copy of a substring in a string. To do this, pass a third argument to index, which tells it how many characters to skip before starting to search. For example:
$position = index($line, "foo", 5);
This call to index skips five characters before starting to search for foo in the string stored in $line. As before, if index finds the substring, it returns the total number of characters skipped (including the number specified by the third argument to index). If index does not find the substring in the portion of the string that it searches, it returns -1.
This feature of index enables you to find all occurrences
of a substring in a string. Listing 13.8 is a modified version
of Listing 13.7 that searches for all occurrences of the
in an input line.
Listing 13.8. A program that uses index to search a line repeatedly.
1: #!/usr/local/bin/perl 2: 3: $input = <STDIN>; 4: $position = $found = 0; 5: while (1) { 6: $position = index($input, "the", $position); 7: last if ($position == -1); 8: if ($found == 0) { 9: $found = 1; 10: print ("pattern found - characters skipped:"); 11: } 12: print (" $position"); 13: $position++; 14: } 15: if ($found == 0) { 16: print ("pattern not found\n"); 17: } else { 18: print ("\n"); 19: }
$ program13 8 Here is the test line containing the words. pattern found - characters skipped: 8 33 $
Line 6 of this program calls index. Because the initial value of $position is 0, the first call to index starts searching from the beginning of the string. Eight charact-ers are skipped before the first occurrence of the is found; this means that $position is assigned 8.
Line 7 tests whether a match has been found by comparing $position with -1, which is the value index returns when it does not find the string for which it is looking. Because a match has been found, the loop continues to execute.
When the loop iterates again, line 6 calls index again. This time, index skips nine characters before beginning the search again, which ensures that the previously found occurrence of the is skipped. A total of 33 bytes are skipped before the is found again. Once again, the loop continues, because the conditional expression in line 7 is false.
On the final iteration of the loop, line 6 calls index
and skips 34 characters before starting the search. This time,
the is not found, index returns -1,
and the conditional expression in line 7 is true. At this point,
the loop terminates.
NOTE |
To extract a substring found by index, use the substr function, which is described later in today's lesson |
The rindex function is similar to the index function. The only difference is that rindex starts searching from the right end of the string, not the left.
The syntax for the rindex function is
position = rindex (string, substring);
This syntax is identical to the syntax for index. string is the character string to search in, and substring is the character string being searched for. position returns the number of characters skipped before substring is located; if substring is not found, position is setto -1.
The following is an example:
$string = "Here is the test line containing the words."; $position = rindex($string, "the");
In this example, rindex finds the second occurrence of the. As with index, rindex returns the number of characters between the left end of the string and the location of the found substring. In this case, 33 characters are skipped, and $position is assigned 33.
You can specify a third argument to rindex, indicating the maximum number of characters that can be skipped. For example, if you want rindex to find the first occurrence of the in the preceding example, you can call it as follows:
$string = "Here is the test line containing the words."; $position = rindex($string, "the", 32);
Here, the second occurrence of the cannot be matched, because it is to the right of the specified limit of 32 skipped characters. rindex, therefore, finds the first occurrence of the. Because there are eight characters between the beginning of the string and the occurrence, $position is assigned 8.
Like index, rindex returns -1 if it cannot find the string it is looking for.
The length function returns the number of characters contained in a character string.
The syntax for the length function is
num = length (string);
string is the character string for which you want to determine the length, and num is the returned length.
Here is an example using length:
$string = "Here is a string"; $strlen = length($string);
In this example, length determines that the string in $string is 16 characters long, and it assigns 16 to $strlen.
Listing 13.9 is a program that calculates the average word length
used in an input file. (This is sometimes used to determine the
"complexity" of the text.) Numbers are skipped.
Listing 13.9. A program that demonstrates the use of length.
1: #!/usr/local/bin/perl 2: 3: $wordcount = $charcount = 0; 4: while ($line = <STDIN>) { 5: @words = split(/\s+/, $line); 6: foreach $word (@words) { 7: next if ($word =~ /^\d+\.?\d+$/); 8: $word =~ s/[,.;:]$//; 9: $wordcount += 1; 10: $charcount += length($word); 11: } 12: } 13: print ("Average word length: ", $charcount / $wordcount, "\n");
$ program13 9 Here is the test input. Here is the last line. ^D Average word length: 3.5 $
This program reads a line of input at a time from the standard input file, breaking the input line into words. Line 7 tests whether the word is a number, and skips it if it is. Line 8 strips any trailing punctuation character from the word, which ensures that the punctuation is not counted as part of the word length.
Line 10 calls length to retrieve the number of characters in the word. This number is added to $charcount, which contains the total number of characters in all of the words that have been read so far. To determine the average word length of the file, line 13 takes this value and divides it by the number of words in the file, which is stored in $wordcount.
The tr function provides another way of determining the length of a character string, in conjunction with the built-in system variable $_.
The syntax for the tr function is
tr/sourcelist/replacelist/
sourcelist is the list of characters to replace, and replacelist is the list of characters to replace with. (For details, see the following listing and the explanation provided with it.)
Listing 13.10 shows how tr works.
Listing 13.10. A program that uses tr to retrieve the length of a string.
1: #!/usr/local/bin/perl 2: 3: $string = "here is a string"; 4: $_ = $string; 5: $length = tr/a-zA-Z /a-zA-Z /; 6: print ("the string is $length characters long\n");
$ program13 10 the string is 16 characters long $
Line 3 of this program creates a string named here is a string and assigns it to the scalar variable $string. Line 4 copies this string into a built-in scalar variable, $_.
Line 5 exploits two features of the tr operator that have not yet been discussed:
In line 5, both the search pattern (the set of characters to look for) and the replacement pattern (the characters to replace them with) are the same. This pattern, /a-zA-Z /, tells tr to search for all lowercase letters, uppercase letters, and blank spaces, and then replace them with themselves. This pattern matches every character in the string, which means that every character is being translated.
Because every character is being translated, the number of characters translated is equivalent to the length of the string. This string length is assigned to the scalar variable $length.
tr can be used also to count the number of occurrences
of a specific character, as shown in Listing 13.11.
Listing 13.11. A program that uses tr to count the occurrences of specific characters.
1: #!/usr/local/bin/perl 2: 3: $punctuation = $blanks = $total = 0; 4: while ($input = <STDIN>) { 5: chop ($input); 6: $total += length($input); 7: $_ = $input; 8: $punctuation += tr/,:;.-/,:;.-/; 9: $blanks += tr/ / /; 10: } 11: print ("In this file, there are:\n"); 12: print ("\t$punctuation punctuation characters,\n"); 13: print ("\t$blanks blank characters,\n"); 14: print ("\t", $total - $punctuation - $blanks); 15: print (" other characters.\n");
$ program13 11 Here is a line of input. This line, another line, contains punctuation. ^D In this file, there are: 4 punctuation characters, 10 blank characters, 56 other characters. $
This program uses the scalar variable $total and the built-in function length to count the total number of characters in the input file (excluding the trailing newline characters, which are removed by the call to chop in line 5).
Lines 8 and 9 use tr to count the number of occurrences of particular characters. Line 8 replaces all punctuation characters with themselves; the number of replacements performed, and hence the number of punctuation characters found, is added to the total stored in $punctuation. Similarly, line 9 replaces all blanks with themselves and adds the number of blanks found to the total stored in $blanks. In both cases, tr operates on the contents of the scalar variable $_, because the =~ operator has not been used to specify another value to translate.
Line 14 uses $total, $punctuation, and $blanks
to calculate the total number of characters that are not blank
and not punctuation.
NOTE |
Many other functions and operators accept $_ as the default variable on which to work. For example, lines 4-7 of this program also can be written as follows: while (<STDIN>) { chop(); $total += length(); For more information on $_, refer to Day 17, "System Variables. |
The pos function, defined only in Perl 5, returns the location of the last pattern match in a string. It is ideal for use when repeated pattern matches are specified using the g (global) pattern-matching operator.
The syntax for the pos function is
offset = pos(string);
string is the string whose pattern is being matched. offset is the number of characters already matched or skipped.
Listing 13.12 illustrates the use of pos.
Listing 13.12. A program that uses pos to display pattern match positions.
1: #!/usr/local/bin/perl 2: 3: $string = "Mississippi"; 4: while ($string =~ /i/g) { 5: $position = pos($string); 6: print("matched at position $position\n"); 7: }
$ program13 12 matched at position 2 matched at position 5 matched at position 8 matched at position 11
This program loops every time an i
in Mississippi is matched. The number displayed by line
6 is the number of characters to skip to reach the point at which
pattern matching resumes. For example, the first i is
the second character in the string, so the second pattern search
starts at position 2.
NOTE |
You can also use pos to change the position at which pattern matching is to resume. To do this, put the call to pos on the left side of an assignment: pos($string) = 5; This tells the Perl interpreter to start the next pattern search with the sixth character in the string. (To restart searching from the beginning, use 0. |
The substr function lets you assign a part of a character string to a scalar variable (or to a component of an array variable).
The syntax for calls to the substr function is
substr (expr, skipchars, length)
expr is the character string from which a substring is to be copied; this character string can be the value stored in a variable or the value resulting from the evaluation of an expression. skipchars is the number of characters to skip before starting copying. length is the number of characters to copy; length can be omitted, in which case the rest of the string is copied.
Listing 13.13 provides a simple example of substr.
Listing 13.13. A program that demonstrates the use of substr.
1: #!/usr/local/bin/perl 2: 3: $string = "This is a sample character string"; 4: $sub1 = substr ($string, 10, 6); 5: $sub2 = substr ($string, 17); 6: print ("\$sub1 is \"$sub1\"\n\$sub2 is \"$sub2\"\n");
$ program13 13 $sub1 is "sample" $sub2 is "character string" $
Line 4 calls substr, which copies a portion of the string stored in $string. This call specifies that ten characters are to be skipped before copying starts, and that a total of six characters are to be copied. This means that the substring sample is copied and stored in $sub1.
Line 5 is another call to substr. Here, 17 characters are skipped. Because the length field is omitted, substr copies the remaining characters in the string. This means that the substring character string is copied and stored in $sub2.
Note that lines 4 and 5 do not change the contents of $string.
In Listing 13.13, which you've just seen, calls to substr appear to the right of the assignment operator =. This means that the return value from substr-the extracted substring-is assigned to the variable appearing to the left of the =.
Calls to substr can appear also on the left of the assignment operator =. In this case, the portion of the string specified by substr is replaced by the value appearing to the right of the assignment operator.
The syntax for these calls to substr is basically the same as before:
substr (expr, skipchars, length) = newval;
Here, expr must be something that can be assigned to-for example, a scalar variable or an element of an array variable. skipchars represents the number of characters to skip before beginning the overwriting operation, which cannot be greater than the length of the string. length is the number of characters to be replaced by the overwriting operation. If length is not specified, the remainder of the string is replaced.
newval is the string that replaces the substring specified
by skipchars and length. If newval
is larger than length, the character string automatically
grows to hold it, and the rest of the string is pushed aside (but
not overwritten). If newval is smaller than length,
the character string automatically shrinks. Basically, everything
appears where it is supposed to without you having to worry about
it.
NOTE |
By the way, things that can be assigned to are sometimes known as lvalues, because they appear to the left of assignment statements (the l in lvalue stands for "left"). Things that appear to the right of assignment statements are, similarly, called rvalues. This book does not use the terms lvalue and rvalue, but you might find that knowing them will prove useful when you read other books on programming languages |
Listing 13.14 is an example of a program that uses substr
to replace portions of a string.
Listing 13.14. A program that replaces parts of a string using substr.
1: #!/usr/local/bin/perl 2: 3: $string = "Here is a sample character string"; 4: substr($string, 0, 4) = "This"; 5: substr($string, 8, 1) = "the"; 6: substr($string, 19) = "string"; 7: substr($string, -1, 1) = "g."; 8: substr($string, 0, 0) = "Behold! "; 9: print ("$string\n");
$ program13 14 Behold! This is the sample string. $
This program illustrates the many ways you can use substr to replace portions of a string.
The call to substr in line 4 specifies that no characters are to be skipped before overwriting, and that four characters in the original string are to be overwritten. This means that the substring Here is replaced by This, and that the following is the new value of the string stored in $string:
This is a sample character string
Similarly, the call to substr in line 5 specifies that eight characters are to be skipped and one character is to be replaced. This means that the word a is replaced by the. Now, $string contains the following:
This is the sample character string
Note that the character string is now larger than the original, because the new substring, the, is larger than the substring it replaced.
Line 6 is an example of a call to substr that shrinks the string. Here, 19 characters are skipped, and the rest of the string is replaced by the substring string (because no length field has been specified). Now, the following is the value stored in $string:
This is the sample string
In line 7, the call to substr is passed -1 in the skipchars field and is passed 1 in the length field. This tells substr to replace the last character of the string with the substring g. (g followed by a period). $string now contains
This is the sample string.
NOTE |
If substr is passed a skipchars value of -n, where n is a positive integer, substr skips to n characters from the right end of the string. For example, the following call replaces the last two characters in $string with the string hello: substr($string, -2, 2) = "hello" |
Finally, line 8 specifies that no characters are to be skipped and no characters are to be replaced. This means that the substring "Behold! " (including a trailing space) is added to the front of the existing string and that $string now contains the following:
Behold! This is the sample string.
Line 9 prints this final value of $string.
TIP |
If you are a C programmer and are used to manipulating strings using pointers, note that substr with a length field of 1 can be used to simulate pointer-like behavior in Perl. For example, you can simulate the C statement char = *str++; as follows in Perl: $char = substr($str, $offset++, 1); You'll need to define a counter variable (such as $offset) to keep track of where you are in the string. However, this is no more of a chore than remembering to initialize your C pointer variable. You can simulate the following C statement: *str++ = char; by assigning values using substr in the same way: substr($str, $offset++, 1) = $char; You shouldn't use substr in this way unless you really have to. Perl supplies more powerful and useful tools, such as pattern matching and substitution, to get the job done more efficiently |
The study function is a special function that tells the Perl interpreter that the specified scalar variable is about to be searched many times.
The syntax for the study function is
study (scalar);
scalar is the scalar variable to be "studied." The Perl interpreter takes the value stored in the specified scalar variable and represents it in an internal format that allows faster access.
For example:
study ($myvar);
Here, the value stored in the scalar variable $myvar is about to be repeatedly searched.
You can call study for only one scalar variable at a
time. Previous calls to study are superseded if study
is called again.
TIP |
To check whether study actually makes your program more efficient, use the function times, which displays the user and CPU times for a program or program fragment. (times is discussed earlier today. |
Perl 5 provides functions that perform case conversion on strings. These are
The syntax for the lc and uc functions is
retval = lc(string); retval = uc(string);
string is the string to be converted. retval is a copy of the string, converted to either lowercase or uppercase:
$lower = lc("aBcDe"); # $lower is assigned "abcde" $upper = uc("aBcDe"); # $upper is assigned "ABCDE"
The syntax for the lcfirst and ucfirst functions is
retval = lcfirst(string); retval = ucfirst(string);
string is the string whose first character is to be converted. retval is a copy of the string, with the first character converted to either lowercase or uppercase:
$lower = lcfirst("HELLO"); # $lower is assigned "hELLO" $upper = ucfirst("hello"); # $upper is assigned "Hello"
The quotemeta function, defined only in Perl 5, places a backslash character in front of any non-word character in a string. The following statements are equivalent:
$string = quotemeta($string); $string =~ s/(\W)/\\$1/g;
The syntax for quotemeta is
newstring = quotemeta(oldstring);
oldstring is the string to be converted. newstring is the string with backslashes added.
quotemeta is useful when a string is to be used in a subsequent pattern-matching operation. It ensures that there are no characters in the string which are to be treated as special pattern-matching characters.
The join function has been used many times in this book. It takes the elements of a list and converts them into a single character string.
The syntax for the join function is
join (joinstr, list);
joinstr is the character string that is to be used to glue the elements of list together.
For example:
@list = ("Here", "is", "a", "list"); $newstr = join ("::", @list);
After join is called, the value stored in $newstr becomes the following string:
Here::is::a::list
The join string, :: in this case, appears between each pair of joined elements. The most common join string is a single blank space; however, you can use any value as the join string, including the value resulting from an expression.
The sprintf function behaves like the printf function defined on Day 11, "Formatting Your Output," except that the formatted string is returned by the function instead of being written to a file. This enables you to assign the string to another variable.
The syntax for the sprintf function is
sprintf (string, fields);
string is the character string to print, and fields is a list of values to substitute into the string.
Listing 13.15 is an example that uses sprintf to build
a string.
Listing 13.15. A program that uses sprintf.
1: #!/usr/local/bin/perl 2: 3: $num = 26; 4: $outstr = sprintf("%d = %x hexadecimal or %o octal\n", 5: $num, $num, $num); 6: print ($outstr);
$ program14_9 26 = 1a hexadecimal or 32 octal $
Lines 4 and 5 take three copies of the value stored in $num and include them as part of a string. The field specifiers %d, %x, and %o indicate how the values are to be formatted.
%d Indicates an integer displayed in the usual decimal (base-10) format
%x Indicates an integer displayed in hexadecimal (base-16) format
%o Indicates an integer displayed in octal (base-8) format
The created string is returned by sprintf. Once it has
been created, it behaves just like any other Perl character string;
in particular, it can be assigned to a scalar variable, as in
this example. Here, the string containing the three copies of
$num is assigned to the scalar variable $outstr.
Line 6 then prints this string.
NOTE |
For more information on field specifiers or on how printf works, refer to Day 11, which lists the field specifiers defined and provides a description of the syntax of printf |
Today, you learned about three types of built-in Perl functions: functions that handle process and program control, functions that perform mathematical operations, and functions that manipulate strings.
With the process- and program-control functions, you can start new processes, stop the current program or other processes, or temporarily halt the current program. You also can create a pipe that sends data from one of your created processes to another.
With the functions that perform mathematical operations, you can obtain the sine, cosine, and arctangent of a value. You also can calculate the natural logarithm and square root of a value, or use the value as an exponent of base e.
You also can generate random numbers and define the seed to use when generating the numbers.
Functions that search character strings include index, which searches for a substring starting from the left of a string, and rindex, which searches for a substring starting from the right of a string. You can retrieve the length of a character string using length. By using the translate operator tr in conjunction with the system variable $_, you can count the number of occurrences of a particular character or set of characters in a string. The pos function enables you to determine or set the current pattern-matching location in a string.
The function substr enables you to extract a substring from a string and use it in an expression or assignment statement. substr also can be used to replace a portion of a string or append to the front or back end of the string.
The lc and uc functions convert strings to lowercase or uppercase. To convert the first letter of a string to lowercase or uppercase, use lcfirst or ucfirst.
quotemeta places a backslash in front of every non-word character in a string.
You can create new character strings using join and sprintf. join creates a string by joining elements of a list, and sprintf builds a string using field specifiers that specify the string format.
Q: | How does Perl generate random numbers? |
A: | Basically, by performing arithmetic operations using very
large numbers. If the numbers for these arithmetic operations
are carefully chosen, a sequence of "pseudo-random"
numbers can be generated by repeating the set of arithmetic operations
and returning their results. The random-number seed provided by srand supplies the initial value for one of the numbers used in the set of arithmetic operations. This ensures that the sequence of pseudo-random numbers starts with a different result each time. |
Q: | What programs can be called using system? |
A: | Any program that you can run from your terminal can be run using system. |
Q: | How many processes can a program create using fork? |
A: | Perl provides no limit on how many processes can be created at a time. However, the performance of your system will be adversely affected if you generate too many processes at once. In particular, programs that call fork and wind up in an infinite loop are sometimes called fork bombs, because they generate thousands of processes and grind your machine to an effective halt. (Your system administrator will not be pleased with you if you do this!) |
Q: | How can I send signals to a process without killing it? |
A: | The kill function actually can send any signal
supported by your machine to any running process (that you can
access). Refer to the UNIX system documentation for details on the signals you can send and what their names are. |
Q: | What is the difference between the %d and %ld format specifiers in sprintf? |
A: | %ld defines a "long integer." It refers to the largest number of bits that your local machine can use to store an integer. (This is often 32 bits.) %d, on the other hand, is equivalent to your machine's standard integer format. On some machines, %ld and %d are equivalent. If you are not sure how many bits your machine uses to store integers, or you know you are going to be dealing with large numbers, it's safer to use %ld. (The same holds true for all other integer formats, such as %lx and %lo.) |
Q: | What is the difference between the %c and %s format specifiers in sprintf? |
A: | %c undoes the effect of the ord function. It converts a scalar value into the equivalent ASCII character. (Its behavior is similar to that of the chr function in Pascal.) %s treats a scalar value as a character string and inserts it into the string at the place specified. |
The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.
#!/usr/local/bin/perl $mystring = <STDIN>; $lastfound = length ($mystring); while ($lastfound != -1) { $lastfound = index($mystring, "xyz", $lastfound); }