Today's lesson shows you how to use subroutines to divide your program into smaller, more manageable modules. Today, you learn about the following:
In Perl, a subroutine is a separate body of code designed to perform a particular task. A Perl program executes this body of code by calling or invoking the subroutine; the act of invoking a subroutine is called a subroutine invocation.
Subroutines serve two useful purposes:
Listing 9.1 shows how a subroutine works. This program calls a subroutine that reads a line from the standard input file and breaks it into numbers. The program then adds the numbers together.
Listing 9.1. A program that uses a subroutine.
1: #!/usr/local/bin/perl 2: 3: $total = 0; 4: &getnumbers; 5: foreach $number (@numbers) { 6: $total += $number; 7: } 8: print ("the total is $total\n"); 9: 10: sub getnumbers { 11: $line = <STDIN>; 12: $line =~ s/^\s+|\s*\n$//g; 13: @numbers = split(/\s+/, $line); 14: }
$ program9_1 11 8 16 4 the total is 39 $
Lines 10-14 are an example of a subroutine. The keyword sub tells the Perl interpreter that this is a subroutine definition. The getnumbers immediately following sub is the name of the subroutine; the Perl program uses this name when invoking the subroutine.
The program starts execution in the normal way, beginning with line 3. Line 4 invokes the subroutine getnumbers; the & character tells the Perl interpreter that the following name is the name of a subroutine. (This ensures that the Perl interpreter does not confuse subroutine names with the names of scalar or array variables.)
The Perl interpreter executes line 4 by jumping to the first executable statement inside the subroutine, which is line 11. The interpreter then executes lines 11-13.
Lines 11-13 create the array @numbers as follows:
After line 13 is finished, the Perl interpreter jumps back to the main program and executes the line immediately following the subroutine call, which is line 5.
Lines 5-7 add the numbers together by using the foreach statement to loop through the list stored in @numbers. (Note that this program does not check whether a particular element of @numbers actually consists of digits. Because character strings that are not digits are converted to 0 in expressions, this isn't a significant problem.)
The syntax for a subroutine definition is
sub subname { statement_block }
subname is a placeholder for the name of the subroutine. Like all Perl names, subname consists of an alphabetic character followed by one or more letters, digits, or underscores.
statement_block is the body of the subroutine and consists
of one or more Perl statements. Any statement that can appear
in the main part of a Perl program can appear in a subroutine.
NOTE |
The Perl interpreter never confuses a subroutine name with a scalar variable name or any other name, because it can always tell from the context which name you are referring to. This means that you can have a subroutine and a scalar variable with the same name. For example: $word = 0; Here, when the Perl interpreter sees the & character in the second statement, it realizes that the second statement is calling the subroutine named word. |
When you are defining names for your subroutines, it's best not to use a name belonging to a built-in Perl function that you plan to use. For example, you could, if you want, define a subroutine named split. The Perl interpreter can always distinguish an invocation of the subroutine split from an invocation of the library function split, because the name of the subroutine is preceded by an & when it is invoked, as follows: @words = &split(1, 2); # subroutine However, it's easy to leave off the & by mistake (especially if you are used to programming in C, where subroutine calls do not start with an &). To avoid such problems, use subroutine names that don't correspond to the names of library functions. |
Perl subroutines can appear anywhere in a program, even in the middle of a conditional statement. For example, Listing 9.2 is a perfectly legal Perl program.
Listing 9.2. A program containing a subroutine in the middle of the main program.
1: #!/usr/local/bin/perl 2: 3: while (1) { 4: &readaline; 5: last if ($line eq ""); 6: sub readaline { 7: $line = <STDIN>; 8: } 9: print ($line); 10: } 11: print ("done\n");
$ program9_2 Here is a line of input. Here is a line of input. ^D done $
This program just reads lines of input from the standard input file and writes them straight back out to the standard output file.
Line 4 calls the subroutine readaline. When you examine this subroutine, which is contained in lines 6-8, you can see that it reads a line of input and assigns it to the scalar variable $line.
When readaline is finished, program execution continues
with line 5. When line 5 is executed, the program skips over the
subroutine definition and continues with line 9. The code inside
the subroutine is never directly executed, even if it appears
in the middle of a program; lines 6-8 can be executed only by
a subroutine invocation, such as that found in line 4.
TIP |
Although subroutines can appear anywhere in a program, it usually is best to put all your subroutines at either the beginning of the program or the end. Following this practice makes your programs easier to read. |
As you have seen, the Perl interpreter uses the & character to indicate that a subroutine is being specified in a statement. In Perl 5, you do not need to supply an & character when calling a subroutine if you have already defined the subroutine.
sub readaline { $line = <STDIN>; } ... readaline;
Because the Perl interpreter already knows that readaline is a subroutine, you don't need to specify the & when calling it.
If you prefer to list all your subroutines at the end of your program, you can still omit the & character provided you supply a forward reference for your subroutine, as shown in the following:
sub readaline; # forward reference ... readaline; ... sub readaline { $line = <STDIN>; }
The forward reference tells the Perl interpreter that readaline
is the name of a subroutine. This means that you no longer need
to supply the & when you call readaline.
Occasionally, calling a subroutine without specifying the & character might not behave the way you expect. If your program is behaving strangely, or you are not sure whether or not to use the & character, supply the & character with your call. |
Take another look at the getnumbers subroutine from Listing 9.1.
sub getnumbers { $line = <STDIN>; $line =~ s/^\s+|\s*\n$//g; @numbers = split(/\s+/, $temp); }
Although this subroutine is useful, it suffers from one serious limitation: it overwrites any existing list stored in the array variable @numbers (as well as any value stored in $line or $temp). This overwriting can lead to problems. For example, consider the following:
@numbers = ("the", "a", "an"); &getnumbers; print ("The value of \@numbers is: @numbers\n");
When the subroutine getnumbers is invoked, the value of @numbers is overwritten. If you just examine this portion of the program, it is not obvious that this is what is happening.
To get around this problem, you can employ a useful property of subroutines in Perl: The value of the last expression evaluated by the subroutine is automatically considered to be the subroutine's return value.
For example, in the subroutine getnumbers from Listing 9.1, the last expression evaluated is
@numbers = split(/\s+/, $temp);
The value of this expression is the list of numbers obtained by splitting the line of input. This means that this list of numbers is the return value for the subroutine.
To see how to use a subroutine return value, look at Listing 9.3, which modifies the word-counting program to use the return value from the subroutine getnumbers.
Listing 9.3. A program that uses a subroutine return value.
1: #!/usr/local/bin/perl 2: 3: $total = 0; 4: @numbers = &getnumbers; 5: foreach $number (@numbers) { 6: $total += $number; 7: } 8: print ("the total is $total\n"); 9: 10: sub getnumbers { 11: $line = <STDIN>; 12: $line =~ s/^\s+|\s*\n$//g; 13: split(/\s+/, $line); # this is the return value 14: }
$ program9_3 11 8 16 4 the total is 39 $
Line 4, once again, calls the subroutine getnumbers. As before, the array variable @numbers is assigned the list of numbers read from the standard input file; however, in this program, the assignment is in the main body of the program, not in the subroutine. This makes the program easier to read.
The only other difference between this program and Listing 9.1
is that the call to split in line 13 no longer assigns
anything to @numbers. In fact, it doesn't assign the
list returned by split to any variable at all, because
it does not need to. Line 13 is the last expression evaluated
in getnumbers, so it automatically becomes the return
value from getnumbers. Therefore, when line 4 calls getnumbers,
the list returned by split is assigned to the array variable
@numbers.
NOTE |
If the idea of evaluating an expression without assigning it confuses you, there's nothing wrong with creating a variable inside the subroutine just for the purpose of containing the return value. For example: sub getnumbers { Here, it is obvious that the return value is the contents of @retval. The only drawback to doing this is that assigning the list returned by split to @retval is slightly less efficient. In larger programs, such efficiency costs are worth it, because subroutines become much more comprehensible. Using a special return variable also eliminates an entire class of errors, which you will see in "Return Values and Conditional Expressions," later today. |
You can use a return value of a subroutine any place an expression is expected. For example:
foreach $number (&getnumbers) { print ("$number\n"); }
This foreach statement iterates on the list of numbers returned by getnumbers. Each element of the list is assigned to $number in turn, which means that this loop prints all the numbers in the list, each on its own line.
Listing 9.4 shows another example that uses the return value of a subroutine in an expression. This time, the return value is used as an array subscript.
Listing 9.4. A program that uses a return value as an array subscript.
1: #!/usr/local/bin/perl 2: 3: srand(); 4: print ("Random number tester.\n"); 5: for ($count = 1; $count <= 100; $count++) { 6: $randnum[&intrand] += 1; 7: } 8: print ("Totals for the digits 0 through 9:\n"); 9: print ("@randnum\n"); 10: 11: sub intrand { 12: $num = int(rand(10)); 13: }
$ progam9_4 Random number tester. Totals for the digits 0 through 9: 10 9 11 10 8 8 12 11 9 12 $
This program uses the following three built-in
functions:
srand | Initializes the built-in random-number generator |
rand | Generates a random (non-integral) number greater than zero and less than the value passed to it |
int | Gets rid of the non-integer portion of a number |
The subroutine intrand first calls rand to get a random number greater than 0 and less than 10. The return value from rand is passed to int to remove the fractional portion of the number; this means, for example, that 4.77135 becomes 4. This number becomes the return value returned by intrand.
Line 6 calls intrand. The return value from intrand, an integer between 0 and 9, serves as the subscript into the array variable randnum. If the return value from intrand is 7, $randnum[7] has its value increased by one.
As a consequence, at any given time, the nth value of @randnum contains the number of occurrences of n as a random number.
Line 9 prints out the number of occurrences of each of the 10 numbers. Each number should occur approximately the same number of times (although not necessarily exactly the same number of times).
Because the return value of a subroutine is always the last expression evaluated, the return value might not always be what you expect.
Consider the simple program in Listing 9.5. This program, like the one in Listing 9.3, reads an input line, breaks it into numbers, and adds the numbers. This program, however, attempts to do all the work inside the subroutine get_total.
Listing 9.5. A program illustrating a potential problem with return values from subroutines.
1: #!/usr/local/bin/perl 2: 3: $total = &get_total; 4: print("The total is $total\n"); 5: 6: sub get_total { 7: $value = 0; 8: $inputline = <STDIN>; 9: $inputline =~ s/^\s+|\s*\n$//g; 10: @subwords = split(/\s+/, $inputline); 11: $index = 0; 12: while ($subwords[$index] ne "") { 13: $value += $subwords[$index++]; 14: } 15: }
$ program9_5 11 8 16 4 the total is $
Clearly, this program is supposed to assign the contents of the scalar variable $value to the scalar variable $total. However, when line 4 tries to print the total, you see that the value of $total is actually the empty string. What has happened?
The problem is in the subroutine get_total. In get_total, as in all other subroutines, the return value is the value of the last expression evaluated. However, in get_total, the last expression evaluated is not the last expression in the program.
The last expression to be evaluated in get_total is the conditional expression in line 12, which is
$subwords[$index] ne ""
The loop in lines 12-14 iterates until the value of this expression
is 0. When the value of this expression is 0,
the loop terminates and the subroutine terminates. This means
that the value of the last expression evaluated in the subroutine
is 0 and that the return value of the subroutine is 0.
Because 0 is treated as the null string by print
(0 and the null string are equivalent in Perl), line
4 prints the following, which isn't what the program is supposed
to do:
the total is
Listing 9.6 shows how you can get around this problem.
Listing 9.6. A program that corrects the problem that occurs in Listing 9.5.
1: #!/usr/local/bin/perl 2: 3: $total = &get_total; 4: print("The total is $total.\n"); 5: sub get_total { 6: $value = 0; 7: $inputline = <STDIN>; 8: $inputline =~ s/^\s+|\s*\n$//g; 9: @subwords = split(/\s+/, $inputline); 10: $index = 0; 11: while ($subwords[$index] ne "") { 12: $value += $subwords[$index++]; 13: } 14: $retval = $value; 15: }
$ program9_6 11 8 16 4 the total is 39. $
This program is identical to Listing 9.5 except for one difference: line 15 has been added. This line assigns the total stored in $value to the scalar variable $retval.
Line 15 ensures that the value of the last expression evaluated in the subroutine get_total is, in fact, the total which is supposed to become the return value. This means that line 3 now assigns the correct total to $total, which in turn means that line 4 now prints the correct result.
Note that you don't really need to assign to $retval. The subroutine get_total can just as easily be the following:
sub get_total { $value = 0; $inputline = <STDIN>; $inputline =~ s/^\s+|\s*\n$//g; @subwords = split(/\s+/, $inputline); $index = 0; while ($subwords[$index] ne "") { $value += $subwords[$index++]; } $value; }
Here, the final expression evaluated by the subroutine is simply
$value. The value of this expression is the current value
stored in $value, which is the sum of the numbers in
the line.
TIP |
Subroutines, such as get_total in Listing 9.6, which assign their return value at the very end are known as single-exit modules. Single-exit modules avoid problems like those you saw in Listing 9.5, and they usually are much easier to read. For these reasons, it is a good idea to assign to the return value at the very end of the subroutine, unless there are overwhelming reasons not to do so. |
Another way to ensure that the return value from a subroutine is the value you want is to use the return statement.
The syntax for the return statement is
return (retval);
retval is the value you want your subroutine to return. It can be either a scalar value (including the result of an expression) or a list.
Listing 9.7 provides an example of the use of the return statement.
Listing 9.7. A program that uses the return statement.
1: #!/usr/local/bin/perl 2: 3: $total = &get_total; 4: if ($total eq "error") { 5: print ("No input supplied.\n"); 6: } else { 7: print("The total is $total.\n"); 8: } 9: 10: sub get_total { 11: $value = 0; 12: $inputline = <STDIN>; 13: $inputline =~ s/^\s+|\s*\n$//g; 14: if ($inputline eq "") { 15: return ("error"); 16: } 17: @subwords = split(/\s+/, $inputline); 18: $index = 0; 19: while ($subwords[$index] ne "") { 20: $value += $subwords[$index++]; 21: } 22: $retval = $value; 23: }
$ program9_7 ^D No input supplied. $
This program is similar to the one in Listing 9.6. The only difference is that this program checks whether an input line exists.
If the input line does not exist, the conditional expression in line 14 becomes true, and line 15 is executed. Line 15 exits the subroutine with the return value error; this means that error is assigned to $total in line 3.
This program shows why allowing scalar variables to store either numbers or character strings is useful. When the subroutine get_total detects the error, it can assign a value that is not an integer to $total, which makes it easier to determine that something has gone wrong. Other programming languages, which only enable you to assign either a number or a character string to a particular variable, do not offer this flexibility.
The subroutine get_total in Listing 9.7 defines several variables that are used only inside the subroutine: the array variable @subwords, and the four scalar variables $inputline, $value, $index, and $retval.
If you know for certain that these variables are going to be used only inside the subroutine, you can tell Perl to define these variables as local variables.
In Perl 5, there are two statements used to define local variables:
In Perl 4, the my statement is not defined, so you must use local to define a variable that is not known to the main program.
Listing 9.8 shows how you can use my to define a variable
that exists only inside a subroutine.
NOTE |
If you are using Perl 4, replace my with local in all the remaining examples in this chapter. For example, in Listing 9.8, replace my with local in lines 13 and 14, which produces local ($total, $inputline, @subwords); In Perl, my and local behave identically and use the same syntax. The only difference between them is that variables created using my are not known outside the subroutine. |
Listing 9.8. A program that uses local variables.
1: #!/usr/local/bin/perl 2: 3: $total = 0; 4: while (1) { 5: $linetotal = &get_total; 6: last if ($linetotal eq "done"); 7: print ("Total for this line: $linetotal\n"); 8: $total += $linetotal; 9: } 10: print ("Total for all lines: $total\n"); 11: 12: sub get_total { 13: my ($total, $inputline, @subwords); 14: my ($index, $retval); 15: $total = 0; 16: $inputline = <STDIN>; 17: if ($inputline eq "") { 18: return ("done"); 19: } 20: $inputline =~ s/^\s+|\s*\n$//g; 21: @subwords = split(/\s+/, $inputline); 22: $index = 0; 23: while ($subwords[$index] ne "") { 24: $total += $subwords[$index++]; 25: } 26: $retval = $total; 27: }
$ program9_8 11 8 16 4 Total for this line: 39 7 20 6 1 Total for this line: 34 ^D Total for all lines: 73 $
This program uses two copies of the scalar variable $total. One copy of $total is defined in the main program and keeps a running total of all of the numbers in all of the lines.
The scalar variable $total is also defined in the subroutine
get_total; in this subroutine, $total refers
to the total for a particular line, and line 13 defines it as
a local variable. Because this copy of $total is only
defined inside the subroutine, the copy of $total defined
in the main program is not affected by line 15 (which assigns
0 to $total).
Because a local variable is not known outside the subroutine, the local variable is destroyed when the subroutine is completed. If the subroutine is called again, a new copy of the local variable is defined. This means that the following code does not work: sub subroutine_count { This subroutine does not return the number of times subroutine_count has been called. Because a new copy of $number_of_calls is defined every time the subroutine is called, $number_of_calls is always assigned the value 1. |
Local variables can appear anywhere in a program, provided they are defined before they are used. It is good programming practice to put all your local definitions at the beginning of your subroutine.
If you want, you can assign a value to a local variable when you declare it. For example:
sub my_sub { my($scalar) = 43; my(@array) = ("here's", "a", "list"); # code goes here }
Here, the local scalar variable $scalar is given an initial value of 43, and the local array variable @array is initialized to contain the list ("here's", "a", "list").
You can make your subroutines more flexible by allowing them to accept values passed from the main program; these values passed from the main program are known as arguments.
Listing 9.9 provides a very simple example of a subroutine that accepts three arguments.
Listing 9.9. A program that uses a subroutine to print three numbers and their total.
1: #!/usr/local/bin/perl 2: 3: print ("Enter three numbers, one at a time:\n"); 4: $number1 = <STDIN>; 5: chop ($number1); 6: $number2 = <STDIN>; 7: chop ($number2); 8: $number3 = <STDIN>; 9: chop ($number3); 10: &printnum ($number1, $number2, $number3); 11: 12: sub printnum { 13: my($number1, $number2, $number3) = @_; 14: my($total); 15: print ("The numbers you entered: "); 16: print ("$number1 $number2 $number3\n"); 17: $total = $number1 + $number2 + $number3; 18: print ("The total: $total\n"); 19: }
$ program9_9 Enter three numbers, one at a time: 5 11 4 The numbers you entered: 5 11 4 The total: 20 $
Line 10 calls the subroutine printnum. Three arguments are passed to printnum: the value stored in $number1, the value stored in $number2, and the value stored in $number3. Note that arguments are passed to subroutines in the same way they are passed to built-in library functions.
Line 13 defines local copies of the scalar variables $number1, $number2, and $number3. It then assigns the contents of the system variable @_ to these scalar variables. @_ is created whenever a subroutine is called with arguments; it contains a list consisting of the arguments in the order in which they are passed. In this case, printnum is called with arguments 5, 11, and 4, which means that @_ contains the list (5, 11, 4).
The assignment in line 13 assigns the list to the local scalar
variables that have just been defined. This assignment works just
like any other assignment of a list to a set of scalar variables.
The first element of the list, 5, is assigned to the
first variable, $number1; the second element of the list,
11, is assigned to $number2; and the final element,
4, is assigned to $number3.
NOTE |
After the array variable @_ has been created, it can be used anywhere any other array variable can be used. This means that you do not need to assign its contents to local variables. The following subroutine is equivalent to the subroutine in lines 12-19 of Listing 9.9: sub printnum { Here, $_[0] refers to the first element of the array variable @_, $_[1] refers to the second element, and $_[2] refers to the third element. This subroutine is a little more efficient, but it is harder to read. |
TIP |
It usually is better to define local variables and assign @_ to them because then your subroutines will be easier to understand. |
Listing 9.10 is another example of a program that passes arguments to a subroutine. This program uses the same subroutine to count the number of words and the number of characters in a file.
Listing 9.10. Another example of a subroutine with arguments passed to it.
1: #!/usr/local/bin/perl 2: 3: $wordcount = $charcount = 0; 4: $charpattern = ""; 5: $wordpattern = "\\s+"; 6: while ($line = <STDIN>) { 7: $charcount += &count($line, $charpattern); 8: $line =~ s/^\s+|\s+$//g; 9: $wordcount += &count($line, $wordpattern); 10: } 11: print ("Totals: $wordcount words, $charcount characters\n"); 12: 13: sub count { 14: my ($line, $pattern) = @_; 15: my ($count); 16: if ($pattern eq "") { 17: @items = split (//, $line); 18: } else { 19: @items = split (/$pattern/, $line); 20: } 21: $count = @items; 22: }
$ program9_10 This is a line of input. Here is another line. ^D Totals: 10 words, 47 characters $
This program reads lines from the standard input file until the file is exhausted. Each line has its characters counted and its words counted.
Line 7 determines the number of characters in a line by calling the subroutine count. This subroutine is passed the line of input and the string stored in $charpattern, which is the empty string. Inside the subroutine count, the local variable $pattern receives the pattern passed to it by the call in line 7. This means that the value stored in $pattern is also the empty string.
Lines 16-20 split the input line. The pattern specified in the call to split has the value stored in $pattern substituted into it. Because $pattern currently contains the empty string, the pattern used to split the line is //, which splits the input line into individual characters. As a result, each element of the resulting list stored in @items is a character in the input line.
The total number of elements in the list-in other words, the total number of characters in the input line-is assigned to $count by line 17. Because this is the last expression evaluated in the subroutine, the resulting total number of characters is returned by the subroutine. Line 8 adds this total to the scalar variable $charcount.
Line 8 then removes the leading and trailing white space; this white space is included in the total number of characters-because spaces, tabs, and the trailing newline character count as characters-but is not included when the line is broken into words.
Line 9 calls the subroutine count again, this time with the pattern stored in $wordpattern, which is \s+. (Recall that you need to use two backslashes in a string to represent a single backslash, because the \ character is the escape character in strings.) This value, representing one or more whitespace characters, is assigned to $pattern inside the subroutine, and the pattern passed to split therefore becomes /\s+/.
When split is called with this pattern, @items is assigned a list of words. The total number of words in the list is assigned to $count and is returned; line 11 adds this returned value to the total number of words.
If you want, you can pass a list to a subroutine. For example, the following subroutine adds the elements of a list together and prints the result:
sub addlist { my (@list) = @_; $total = 0; foreach $item (@list) { $total += $item; } print ("The total is $total\n"); }
To invoke this subroutine, pass it an array variable, a list,
or any combination of lists and
scalar values.
&addlist (@mylist); &addlist ("14", "6", "11"); &addlist ($value1, @sublist, $value2);
In each case, the values and lists supplied in the call to addlist are merged into a single list and then passed to the subroutine.
Because values are merged into a single list when a list is passed to a subroutine, you can only define one list as an argument for a subroutine. The subroutine
sub twolists { my (@list1, @list2) = @_; }
isn't useful because it always assigns the empty list to @list2, and because @list1 absorbs all of the contents of @_.
This means that if you want to have both scalar variables and a list as arguments to a subroutine, the list must appear last, as follows:
sub twoargs { my ($scalar, @list) = @_; }
If you call this subroutine using
&twoargs(47, @mylist);
the value 47 is assigned to $scalar, and @mylist is assigned to @list.
If you want, you can call twoargs with a single list, as follows:
&twoargs(@mylist);
Here, the first element of @mylist is assigned to $scalar,
and the rest of @mylist is assigned to @list.
NOTE |
If you find this confusing, it might help to realize that passing arguments to a subroutine follows the same rules as assignment does. For example, you can have ($scalar, @list1) = @list2; because $scalar is assigned the first element of @list2. However, you can't have this: (@list1, $scalar) = @list2; because all of @list1 would be assigned to @list2 and $scalar would be assigned the null string. |
In Perl, you can call subroutines from other subroutines. To call a subroutine from another subroutine, use the same subroutine-invocation syntax you've been using all along. Subroutines that are called by other subroutines are known as nested subroutines (because one call is "nested" inside the other).
Listing 9.11 is an example of a program that contains a nested subroutine. It is a fairly simple modification of Listing 9.10 and counts the number of words and characters in three lines of standard input. It also demonstrates how to return multiple values from a subroutine.
Listing 9.11. An example of a nested subroutine.
1: #!/usr/local/bin/perl 2: 3: ($wordcount, $charcount) = &getcounts(3); 4: print ("Totals for three lines: "); 5: print ("$wordcount words, $charcount characters\n"); 6: 7: sub getcounts { 8: my ($numlines) = @_; 9: my ($charpattern, $wordpattern); 10: my ($charcount, $wordcount); 11: my ($line, $linecount); 12: my (@retval); 13: $charpattern = ""; 14: $wordpattern = "\\s+"; 15: $linecount = $charcount = $wordcount = 0; 16: while (1) { 17: $line = <STDIN>; 18: last if ($line eq ""); 19: $linecount++; 20: $charcount += &count($line, $charpattern); 21: $line =~ s/^\s+|\s+$//g; 22: $wordcount += &count($line, $wordpattern); 23: last if ($linecount == $numlines); 24: }; 25: @retval = ($wordcount, $charcount); 26: } 27: 28: sub count { 29: my ($line, $pattern) = @_; 30: my ($count); 31: if ($pattern eq "") { 32: @items = split (//, $line); 33: } else { 34: @items = split (/$pattern/, $line); 35: } 36: $count = @items; 37: }
$ program9_11 This is a line of input. Here is another line. Here is the last line. Totals for three lines: 15 words, 70 characters $
The main body of this program now consists of only five lines of code, including the special header comment and a blank line. This is because most of the actual work is being done inside the subroutines. (This is common in large programs. Most of these programs call a few main subroutines, which in turn call other subroutines. This approach makes programs easier to read, because each subroutine is compact and concise.)
Line 3 calls the subroutine getcounts, which retrieves the line and character count for the three lines from the standard input file. Because a list containing two elements is returned by getcounts, a standard "list to scalar variable" assignment can be used to assign the returned list directly to $wordcount and $charcount.
The subroutine getcounts is similar to the main body of the program in Listing 9.10. The only difference is that the while loop has been modified to loop only the number of times specified by the argument passed to getcounts, which is stored in the local variable $numlines.
The subroutine getcounts actually does the word and character
counting by calling a nested subroutine, count. This
subroutine is identical to the subroutine of the same name in
List-ing 9.10.
NOTE |
The @_ variable is a local variable that is defined inside the subroutine. When a subroutine calls a nested subroutine, a new copy of @_ is created for the nested subroutine. For example, in Listing 9.11, when getcounts calls count, a new copy of @_ is created for count, and the @_ variable in getcounts is not changed. |
In Perl, not only can subroutines call other subroutines, but subroutines actually can call themselves. A subroutine that calls itself is known as a recursive subroutine.
You can use a subroutine as a recursive subroutine if the following two conditions are true:
When all the variables that a subroutine uses are local, the subroutine creates a new copy of the variables each time it calls itself. This ensures that there is no confusion or overlap.
Listing 9.12 is an example of a program that contains a recursive subroutine. This program accepts a list of numbers and operands that is to be evaluated from right to left, as if the list is a stack whose top is the left end of the list. For example, if the input is
- 955 * 26 + 11 8
this program adds 11 and 8, multiplies the result by 26, and subtracts that result from 955. This is equivalent to the following Perl expression:
955 - 26 * (11 + 8)
Listing 9.12. A program that uses a recursive subroutine to perform arithmetic.
1: #!/usr/local/bin/perl 2: 3: $inputline = <STDIN>; 4: $inputline =~ s/^\s+|\s+$//g; 5: @list = split (/\s+/, $inputline); 6: $result = &rightcalc (0); 7: print ("The result is $result.\n"); 8: 9: sub rightcalc { 10: my ($index) = @_; 11: my ($result, $operand1, $operand2); 12: 13: if ($index+3 == @list) { 14: $operand2 = $list[$index+2]; 15: } else { 16: $operand2 = &rightcalc ($index+2); 17: } 18: $operand1 = $list[$index+1]; 19: if ($list[$index] eq "+") { 20: $result = $operand1 + $operand2; 21: } elsif ($list[$index] eq "*") { 22: $result = $operand1 * $operand2; 23: } elsif ($list[$index] eq "-") { 24: $result = $operand1 - $operand2; 25: } else { 26: $result = $operand1 / $operand2; 27: } 28: }
$ program9_12 - 98 * 4 + 12 11 The result is 6. $
This program starts off by reading a line of input from the standard input file and breaking it into its components, which are stored as a list in the array variable @list.
When given the input
- 98 * 4 + 12 11
lines 3-5 produce the following list, which is assigned to @list:
("-", "98", "*", "4", "+", "12", "11")
Line 6 calls the subroutine rightcalc for the first time. rightcalc requires one argument, an index value that tells the subroutine what part of the list to work on. Because the first argument here is zero, rightcalc starts with the first element in the list.
Line 10 assigns the argument passed to rightcalc to the local variable $index. When rightcalc is called for the first time, $index is 0.
Lines 13-17 are the heart of this subroutine, because they control whether to call rightcalc recursively. The basic logic is that a list such as
("-", "98", "*", "4", "+", "12", "11")
can be broken into three parts: the first operator, -; the first operand, 98; and a sublist (the rest of the list). Note that the sublist
("*", "4", "+", "12", "11")
is itself a complete set of operators and operands; because this program is required to perform its arithmetic starting from the right, this sublist must be calculated first.
Line 13 checks whether there is a sublist that needs to be evaluated first. To do this, it checks whether there are more than three elements in the list. If there are only three elements in the list, the list consists of only one operator and two operands, and the arithmetic can be performed right away. If there are more than three elements in the list, a sublist exists.
To evaluate the sublist when it exists, line 16 calls rightcalc recursively. The index value passed to this second copy of rightcalc is 2; this ensures that the first element of the list examined by the second copy of rightcalc is the element with subscript 2, which is *.
At this point, the following is the chain of subroutine invocations,
their arguments, and the part of the list on which they are working:
Level 1 Main program Level 2 rightcalc(0)-list ("-", "98", "*", "4", "+", "12", "11") Level 3 rightcalc(2)-list ("*", "4", "+", "12", "11")
When this copy of rightcalc reaches line 13, it checks
whether the sublist being worked on has just three elements. Because
this sublist has five elements, line 16 calls yet another copy
of rightcalc, this time setting the value of $index
to 4. The following is the chain of subroutine invocations
after this third call:
Level 1 Main program Level 2 rightcalc(0)-list ("-", "98", "*", "4", "+", "12", "11") Level 3 rightcalc(2)-list ("*", "4", "+", "12", "11") Level 4 rightcalc(4)-list ("+", "12", "11")
When the third copy of this subroutine reaches line 13, it checks whether this portion of the list contains only three elements. Because it does, the conditional expression in line 13 is true. At this point, line 14 is executed for the first time (by any copy of rightcalc); it takes the value stored in $index-in this case, 4, adds 2 to it, and uses the result as the subscript into @list. This assigns 11, the seventh element of @list, to $operand2.
Lines 18-27 perform an arithmetic operation. Line 18 adds one to the value in $index to retrieve the location of the first operand; this operand is assigned to $operand1. In this copy of rightcalc, the subscript is 5 (4+1), and the sixth element of @list, 12, is assigned to $operand1.
Line 19 uses $index as the subscript into the list to access the arithmetic operator for this operation. In this case, the fifth element of $index (subscript 4) is +, and the expression in line 19 is true. Line 20 then adds $operand1 to $operand2, yielding $result, which is 23. This value is returned by this copy of rightcalc.
When the third copy of rightcalc returns, execution continues
with the second copy of rightcalc because the second
copy called the third copy. Line 16 of the second copy assigns
the return value of the third copy, 23, to $operand2.
The following is the state of the program after line 16 has finished
executing:
Level 1 Main program Level 2 rightcalc(0)-list ("-", "98", "*", "4", "+", "12", "11") Level 3 rightcalc(2)-list ("*", "4", "+", "12", "11"), $operand2 is 23
The Perl interpreter now executes lines 18-27. Because $index is 2 in this copy of rightcalc, line 18 assigns the fourth element of @list, 4, to $operand1. Line 21 is true in this case because the operator is *; this means that line 22 multiplies $operand1 (4) by $operand2 (23), yielding 92, which is assigned to $result.
At this point, the second copy of rightcalc is finished, and program execution returns to line 16. This assigns the return value from the second copy, 92, to $operand2.
The following is the state of the program after the second copy
of rightcalc is finished:
Level 1 Main program Level 2 rightcalc(0)-list ("-", "98", "*", "4", "+", "12", "11"), $operand2 is 92
Now you're almost finished; the program is executing only one copy of rightcalc. Because $index is 0 in this copy of rightcalc, line 18 assigns 98 to $operand1. Line 23 is true in this case because the operator here is -; line 24 then takes 98 and subtracts 92 from it, yielding a final result of 6.
This final result of 6 is passed to the main program
and is assigned to $result. (Note that there is no conflict
between $result in the main program and the various copies
of $result in rightcalc because $result
is defined as a local variable in rightcalc.) Line 7,
finally, prints this result.
NOTE |
Recursive subroutines are useful when handling complicated data structures such as trees. You will see examples of such complicated data structures on Day 10, "Associative Arrays." |
As you have seen, Perl enables you to pass an array as an argument to a subroutine.
&my_sub(@array);
When the subroutine my_sub is called, the list stored in the array variable @array is copied to the variable @_ defined in the subroutine.
sub my_sub { my (@subarray) = @_; $arraylength = @subarray; }
If the array being passed is large, it might take some time (and considerable space) to create a copy of the array. If your application is operating under time or space limitations, or you just want to make it more efficient, you can specify that the array is to be passed by name.
The following is an example of a similar subroutine that refers to an array by name:
sub my_sub { my (*subarray) = @_; $arraylength = @subarray; }
The *subarray definition tells the Perl interpreter to
operate on the actual list passed to
my_sub instead of making a copy.
To call this subroutine, specify * instead of @ with the array variable name, as in the following:
@myarray = (1, 2, 3, 4, 5); &my_sub(*myarray);
Specifying *myarray instead of @myarray indicates that the actual contents of @myarray are to be used (and modified if desired) in my_sub. In fact, while the subroutine is being executed, the name @subarray becomes identical to the name @myarray. This process of creating another name to refer to the same variable is known as aliasing. @subarray is now an alias of @myarray.
When my_sub terminates, @subarray stops being an alias of @myarray. When my_sub is called again with a different argument, as in
&my_sub(*anotherarray);
the variable @subarray in my_sub becomes an alias for @anotherarray, which means that you can use the array variable @subarray to access the storage in @anotherarray.
Aliasing arrays in this manner has one distinct advantage and one distinct drawback. The advantage is that your program becomes more efficient. You don't need to copy the entire list from your main program to the subroutine. The disadvantage is that your program becomes more difficult to follow. You have to remember, for example, that changing the contents of @subarray in the subroutine my_sub also changes the contents of @myarray and @anotherarray. It is easy to lose track of which name refers to which variable.
There is also another problem with aliasing: aliasing affects all variables with the same name, not just array variables.
For example, consider Listing 9.13, which defines a scalar variable named $foo and an array named @foo, and then aliases @foo. As you'll see, the program aliases $foo as well.
Listing 9.13. A program that demonstrates aliasing.
1: #!/usr/local/bin/perl 2: 3: $foo = 26; 4: @foo = ("here's", "a", "list"); 5: &testsub (*foo); 6: print ("The value of \$foo is now $foo\n"); 7: 8: sub testsub { 9: local (*printarray) = @_; 10: foreach $element (@printarray) { 11: print ("$element\n"); 12: } 13: $printarray = 61; 14: }
$ program9_13 here's a list The value of $foo is now 61 $
Line 5 calls the subroutine testsub. The argument, *foo, indicates that the array @foo is to be passed to testsub and aliased.
The local variable definition in line 9 indicates that the array variable @printarray is to become an alias of the array variable @foo. This means that the name printarray is defined to be equivalent to the name foo.
As a consequence, the scalar variable $printarray becomes
an alias of the scalar variable $foo. Because of this,
line 13, which seems to assign 61 to $printarray,
actually assigns 61 to $foo. This modified value
is printed by line 6 of the main program.
NOTE |
Aliasing enables you to pass more than one list to a subroutine. @array1 = (1, 2, 3); In this case, the names array1 and array2 are passed to two_array_sub. subarray1 becomes an alias for array1, and subarray2 becomes an alias for array2. |
Perl enables you to use the do statement to invoke a subroutine. For example, the following statements are identical:
&my_sub(1, 2, 3); do my_sub(1, 2, 3);
There is no real reason to use the do statement in this context.
By default, the built-in function sort sorts in alphabetical order. The following is an example:
@list = ("words", "to", "sort"); @list2 = sort (@list);
Here, @list2 is assigned ("sort", "to", "words").
If you want, you can write a subroutine that defines how sorting is to be accomplished. To understand how to do this, first you need to know a little about how sorting works.
When sort is given a list to sort, it determines the sort order of the elements of the list by repeatedly comparing pairs of elements. To compare a pair of elements, sort calls a special internal subroutine and passes it a pair of arguments. Although the subroutine is not accessible from a Perl program, it basically behaves as follows:
sub sort_criteria { if ($a gt $b) { retval = -1; } elsif ($a eq $b) { retval = 0; } else retval = 1; } $retval; }
This subroutine compares two values, which are stored in $a and $b. It returns -1 if the first value is greater, 0 if the values are equal, and 1 if the second value is greater. (This, by the way, is how the cmp operator works; in fact, the preceding subroutine could compare the two values using a single cmp operator.)
To define your own sorting rules, you must write a subroutine
whose behavior is identical to the preceding subroutine. This
subroutine must use two global variables named $a and
$b to represent the two items in the list currently being
compared, and the subroutine must return one of the following
values:
If $a is to appear before $b in the resulting sorted list | |
If $a is to be treated as equal to $b | |
If $a is to appear after $b in the resulting sorted list |
NOTE |
Even though $a and $b are global variables that are used by the sorting subroutine, you still can define global variables of your own named $a and $b without risking their being overwritten. The built-in function sort saves any existing values of $a and $b before sorting, and then it restores them when sorting is completed. |
After you have written the subroutine, you must specify the subroutine name when calling the function sort. For example, if you define a function named foo that provides a set of sorting rules, the following statement sorts a list using the rules defined in foo:
@list2 = sort foo (@list1);
Listing 9.14 shows how you can define your own sort criteria. This program sorts a list in the normal order, except that it puts strings starting with a digit last. (By default, strings starting with a number appear before strings starting with a letter, and before some-but not all-special characters.) Strings that begin with a digit are assumed to be numbers and are sorted in numerical order.
Listing 9.14. A program that defines sort criteria.
1: #!/usr/local/bin/perl 2: 3: @list1 = ("test", "14", "26", "test2"); 4: @list2 = sort num_last (@list1); 5: print ("@list2\n"); 6: 7: sub num_last { 8: my ($num_a, $num_b); 9: 10: $num_a = $a =~ /^[0-9]/; 11: $num_b = $b =~ /^[0-9]/; 12: if ($num_a && $num_b) { 13: $retval = $a <=> $b; 14: } elsif ($num_a) { 15: $retval = 1; 16: } elsif ($num_b) { 17: $retval = -1; 18: } else { 19: $retval = $a cmp $b; 20: } 21: $retval; 22: }
$ program9_14 test test2 14 26 $
Line 4 sorts the program according to the sort criteria defined in the subroutine num_last. This subroutine is defined in lines 7-22.
This subroutine first determines whether the items are strings that begin with a digit. Line 10 sets the local variable $num_a to a nonzero value if the value stored in $a starts with a digit; similarly, line 11 sets $num_b to a nonzero value if the value of $b starts with a digit.
Lines 12 and 13 handle the case in which both $num_a and $num_b are true. In this case, the two strings are assumed to be digits, and the numeric comparison operator <=> compares their values. The result of the <=> operation is -1 if the first number is larger, 0 if they are equal, and 1 if the second number is larger.
If $num_a is true but $num_b is false, line 15 sets the return value for this subroutine to 1, indicating that the string that does not start with a digit, $b, is to be treated as greater. Similarly, line 17 sets the return value to -1 if $b starts with a digit and $a does not.
If neither string starts with a digit, line 19 uses the normal sort criterion-alphabetical order-to determine which value is larger. Here, the cmp operator is useful. It returns -1 if the first string is alphabetically greater, 0 if the strings are equal, and 1 if the second string is alphabetically greater.
Perl 5 defines three special subroutines that are executed at specific times.
NOTE |
These subroutines are not supported in Perl 4. |
Perl 5 enables you to create code that is executed when your program is started. To do this, create a special subroutine named BEGIN. For example:
BEGIN { print("Hi! Welcome to Perl!\n"); }
When your program begins execution, the following line appears on your screen:
Hi! Welcome to Perl!
The BEGIN subroutine behaves just like any other Perl
subroutine. For example, you can define local variables for it
or call other subroutines from it.
NOTE |
If you like, you can define multiple BEGIN subroutines. These subroutines are called in the order in which they appear in the program. |
Perl 5 enables you to create code to be executed when your program terminates execution. To do this, define an END subroutine, as in the following example:
END { print("Thank you for using Perl!\n"); }
The code contained in the END subroutine is always executed by your program, even if the program is terminated using die. For example, the code
die("Prepare to die!\n"); END { print("Ha! You can't kill me!\n"); }
displays the following on your screen:
Prepare to die! Ha! You can't kill me!
NOTE |
You can define multiple END subroutines in your program. In this case, the subroutines are executed in reverse order of appearance, with the last one executed first. |
Perl 5 enables you to define a special subroutine named AUTOLOAD that is called whenever the Perl interpreter is told to call a subroutine that does not exist. Listing 9.15 illustrates the use of AUTOLOAD.
Listing 9.15. A program that uses AUTOLOAD.
1: #!/usr/local/bin/perl 2: 3: ¬here("hi", 46); 4: 5: AUTOLOAD { 6: print("subroutine $AUTOLOAD not found\n"); 7: print("arguments passed: @_\n"); 8: }
$ program9_15 subroutine main::nothere not found arguments passed: hi 46 $
This program tries to call the non-existent subroutine nothere. When the Perl interpreter discovers that nothere does not exist, it calls the AUTOLOAD subroutine.
Line 6 uses a special scalar variable, $AUTOLOAD, which contains the name of the subroutine you tried to call. (The main:: text that appears before the subroutine name, nothere, is the name of the package in which the subroutine is found. By default, all your code is placed in one package, called main, so you normally won't need to worry about packages. For more information on creating other packages, see Day 19, "Object-Oriented Programming in Perl.")
When AUTOLOAD is called, the arguments that were to be
passed to the non-existent subroutine are passed to AUTOLOAD
instead. This means that the @ array variable contains
the list ("hi", 46), because these are the
arguments that were to be passed to nothere.
TIP |
AUTOLOAD is useful if you plan to organize your Perl program into modules, because you can use it to ensure that crucial subroutines from other files actually exist when you need them. For more information on organizing Perl programs into modules, see Day 19. |
Today, you learned about subroutines, which are separated chunks of code intended to perform specific tasks. A subroutine can appear anywhere in your program.
To invoke a subroutine, specify its name preceded by the & character. In Perl 5, the & character is not required if the subroutine exists, or if a forward reference is defined.
A subroutine can return a value (either a scalar value or a list). This return value is the value of the last expression evaluated inside the subroutine. If this last expression is at the end of the subroutine, the subroutine is a single-exit module.
You can define local variables for use inside subroutines. These local variables exist only while the subroutine is being executed. When a subroutine finishes, its local variables are destroyed; if it is invoked again, new copies of the local variables are defined.
You can pass values to subroutines; these values are called arguments. You can pass as many arguments as you like, but only one of these arguments can be a list. If a list is passed to a subroutine, it must be the last argument passed.
The arguments passed to a subroutine are converted into a list and assigned to a special system variable, @_. One copy of @_ exists for each list of arguments passed to a subroutine (that is, @_ is a local variable).
Subroutines can call other subroutines (nested subroutines) and even can call themselves (recursive subroutines).
You can pass an array variable to a subroutine by name by defining an alias for the variable name. This alias affects all variables of that name.
You can use the do statement to invoke a subroutine, although there is no real reason to do so.
You can define a subroutine that specifies the order in which the elements of a list are to be sorted. To use the sort criteria defined by a subroutine, include its name with the call to sort.
The BEGIN subroutine is always executed before your program begins execution. The END subroutine is always executed when your program terminates, even if it was killed off using die. The AUTOLOAD subroutine is executed if your program tries to call a subroutine that does not exist.
Q: | How many levels of nested subroutines can a program have? |
A: | This depends on the amount of memory in your machine. Normally, it is large enough to only be an issue when you are using recursive subroutines. |
Q: | Which is better: passing entire lists or passing array variables by name? |
A: | As with so many
issues in programming, this depends on the situation. If your program
needs to be space-efficient or to run as quickly as possible, passing
array variables by name might be the best choice. Another option is to use the global array variable both inside and outside the subroutine. This works well if the array variable is the central repository for program data. |
Q: | When are global variables a good idea? When is it better to pass the contents of a variable to a subroutine? |
A: | If your subroutine is
a general-purpose subroutine that performs a task such as breaking a
scalar value into words, it's a good idea to pass the value as an
argument. For example: sub breakline { local ($line) = @_; @words = split(/\s+/, $line); } If you do not pass the line as an argument, breakline will be able to work only with the line stored in a particular scalar variable, which makes it less useful. On the other hand, if your program stores information in a central array, there's no reason to pass the array or the array name to a subroutine that processes the array. For example, if you are using the array @occurs to count all the occurrences of the digits 0 through 9 in a file, there's no reason to pass @occurs to a subroutine. For example: sub printcount { for ($count = 0; $count <= 9; $count++) { print ("$occurs[$count]\n"); } } Because printcount is not likely to be used with any array except @occurs, there's no need to pass it as an argument. |
Q: | When Perl defines an alias for an array-variable name in a subroutine, such as @localname for @name in a subroutine, why does it also define the alias $localname for $name? |
A: | Strictly speaking, the * character in an alias represents any character that precedes a variable name (such as @ or $).
For example, consider the following subroutine and the corresponding statement that calls it: sub arraybyname { local (*localname) = @_; } arraybyname (*name); When the Perl interpreter sees the reference to *localname in the subroutine, it replaces the alias following the * with the name for which the alias is defined. In this case, the Perl interpreter replaces *localname with *name. The Perl interpreter then determines, from context, whether *name is an array variable, a scalar variable, or something else. In this case, *name is intended to be an array variable, which means that *name becomes @name. |
The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.