Chapter 20

Miscellaneous Features of Perl


CONTENTS

Today's lesson describes the features of Perl that have not been covered in previous chapters:

Today's lesson also provides a brief overview of the following topics:

The require Function

The require function provides a way to break your program into separate files and create libraries of functions. For example, if you have stored Perl statements in the file myfile.pl, you can include them as part of your program by adding the following statement:

require ("myfile.pl");

When the Perl interpreter sees this require statement, it searches the directories specified by the built-in array variable @INC for a file named myfile.pl. If such a file is found, the statements in the file are executed; if no such file exists, the program terminates and prints the error message

Can't find myfile.pl in @INC

on your screen (by writing it to the standard error file STDERR). (For more details on the @INC array, refer to Day 17, "System Variables.")

As in a subroutine call, the last expression evaluated inside a file included by require becomes the return value. The require function checks whether this value is zero, and terminates if it is. For example, suppose that the file myfile.pl contains the following statements:

print ("hello, world!\n");

$var = 14;

If the statements in this file are executed by

require ("myfile.pl");

the return value of myfile.pl is the following expression, which has the value 14:

$var = 14

Because this value is not zero, the program continues execution with the statement following the require.

If myfile.pl contains the following statements, the return value of myfile.pl is 0:

print ("hello, world!\n");

$var = 0;

Because this value is zero, the Perl interpreter prints the following error message along with the name and current line number of your program; then it exits:

myfile.pl did not return true value

TIP
By convention, files containing Perl statements normally have the suffix .pl. This makes it easy to determine which files in a directory contain Perl programs or code included in Perl programs using require.

You can pass any scalar value to require, including those stored in scalar variables or array elements:

@reqlist = ("file1.pl", "file2.pl", "file3.pl");



require ($reqlist[$0]);

require ($reqlist[$1]);

require ($reqlist[$2]);

Here, the successive calls to require include the contents of file1.pl, file2.pl, and file3.pl.

You can also specify no filename, as in the following:

require;

In this case, the value of the scalar variable $_ is the filename whose contents are to be executed.

One limitation Perl imposes on the require statement is that the contents of a particular file can be included only once in a program. To repeat a block of code many times, your only alternative is to put it in a separate program and call it using the system function or the eval function.
Also, if two directories in @INC contain a file named by require, only the first one is included.

The require Function and Subroutine Libraries

The require function enables you to create libraries of subroutines that can be used in all your Perl programs. To create a subroutine library, you need only take the following steps:

  1. Decide on a directory in which to store your subroutine library.
  2. Move your subroutines to separate files, and move these files to your subroutine directory.
  3. To each file, add an executable statement that contains an expression with a nonzero value. This step is necessary because files executed by require must return a nonzero value, and an empty program is assumed to have the value zero. The easiest way to perform this task is to add the following statement to the bottom of each file:
    1;
  4. This statement is just a simple expression (the number 1) with a nonzero value.
  5. In your main program, use require to refer to one or more of the files that contain your library subroutines, as needed.
  6. When you start your main program, use the -I option to specify the name of the subroutine directory. Alternatively, add the subroutine directory to the @INC array before calling require.

For example, suppose that the directory /u/jqpublic/perldir contains your Perl subroutine library and that the subroutine mysub is stored in the file mysub.pl in that directory. (Naming the file after the subroutine is an easy way to remember where the subroutine is located.) Now, to include mysub as part of your program, add the following statements:

unshift (@INC, "/u/jqpublic/perldir");

require ("mysub.pl");

The call to unshift adds the directory /u/jqpublic/perldir to the @INC array, which ensures that any subsequent calls to require will search this directory. The call to require then includes the contents of mysub.pl as part of your program, which means that mysub now is included.

TIP
You should use unshift, not push, to add to the @INC array. The push function adds to the end of the list stored in @INC, which means that your subroutine library directory will be searched last.
As a consequence, if your subroutine file has the same name as a file contained in /usr/local/lib/perl, your file will not be included, because require includes only the first file matching the specified name.
You can control the search order of @INC by creating or reshuffling it yourself before calling require.

Using require to Specify a Perl Version

Perl 5 enables you to use a require statement to specify the version of Perl needed to run your program. When Perl sees a require statement with a numeric associated value, it only runs the program if the version of Perl is greater than or equal to the number. For example, the following statement indicates that the program is to be run only if the Perl interpreter is version 5.001 or higher:

require 5.001;

If it is not, the program terminates.

This is useful if your program uses a feature of Perl that you know does not work properly in earlier versions of the language.

NOTE
Because Perl 4 does not understand
require 5.001;
it detects an error and terminates when it sees this statement. This is basically what you want to have happen

The $#array Variables

For each array variable defined in your program, a variable named $#array, in which array is the name of your array, is also defined. This variable contains the subscript of the last element of the array. For example:

@myarray = ("goodbye", "cruel", "world");

$lastsub = $#myarray;

Here, there are three elements in @myarray, which are referenced by the subscripts 0, 1, and 2. Because the subscript of the last element of the array is 2, $#myarray contains the value 2.

NOTE
Because the value of the maximum subscript is affected by the system variable $[, the value of each $#array variable is also affected by $[. For example:
$[ = 1;
@myarray = ("goodbye", "cruel", "world");
$lastsub = $#myarray;
Here, the first subscript of the array is 1, because $[ is set to that value. This means that the maximum subscript is 3 and the value of $#myarray is also 3

Any $#array variable that does not correspond to a defined array has the value -1. For example:

$sublength = $#notdefined;

Here, if the array @notdefined does not exist, $sublength is assigned -1.

A $#array variable is also defined for each built-in array variable. This means, for example, that the $#ARGV variable contains the number of elements included on the command line. You can use this variable to check whether files have been specified on the command line:

if ($#ARGV == -1) {

        die ("No files specified.\n");

}

If there are no "holes" (undefined elements) in the array, you can use a $#array variable in a loop. Listing 20.1 shows how you can carry out this action.


Listing 20.1. A program that uses a $#array variable in a loop.
1:  #!/usr/local/bin/perl

2:  

3:  @myarray = ("testing", 98.6, "Olerud", 47);

4:  for ($i = 0; $i <= $#myarray; $i++) {

5:          print ("$myarray[$i]\n");

6:  }


$ program20_1

testing

98.599999999999994

Olerud

47

$

Line 3 assigns a four-element list to the array variable @myarray. Therefore, the largest subscript used in the array is 3; this value is automatically assigned to the variable $#myarray.

The for statement in line 4 terminates when $i is greater than $#myarray. This technique ensures that each element of @myarray is printed, in turn, by line 5.

Using $#myarray to terminate the loop isn't as useful if the array contains undefined elements as in the following:
@myarray = ("test1", "test2");
$myarray[5] = "test3";
for ($i = 0; $i <= $#myarray; $i++) {
print ("$myarray[$i]\n");
}
This loop iterates six times, because the largest subscript of the array is 5. Therefore, three blank lines are printed, because the elements of @myarray with the subscripts 2, 3, and 4 have not been defined. (You can get around this by using the defined function.)

Controlling Array Length Using $#array

You can use $#array to control the length of an array variable.

If a $#array variable is assigned a value that is larger than the current largest subscript of the corresponding array, the missing elements are created and initialized to the special internal undefined value (equivalent to the null string). For example:

@myarray = ("hi", "there");

$#myarray = 4;

This code sets the maximum subscript of $#myarray to 4. Because the subscript of the last defined element is 1, three empty elements are created with subscripts 2, 3, and 4.

You can use this technique to create a large array all at once:

$#bigarray = 9999;

This statement creates an array large enough to hold 10,000 values (or fails trying). If this statement executes successfully, you know that your machine has enough space to store @bigarray before actually assigning to all or part of it.

In Perl 5, if the value you assign to a $#array variable is less than the current maximum subscript, the leftover array values are destroyed. For example:

@myarray = ("hello", "there", "Dave!");

$#myarray = 1;

Here, @myarray is originally assigned a three-element list, which means that its maximum subscript is 2. Assigning 1 to $#myarray sets the maximum subscript to 1, which means that @myarray now contains ("hello", "there"). The third element, Dave!, is destroyed.

NOTE
This is one instance in which Perl 5 and Perl 4 behave differently. In Perl 4, array elements are not destroyed when $#array is assigned a value less than the current maximum subscript.
In Perl 4, array elements that have been "removed" by assigning to the $#array variable can be restored to existence by resetting $#array to its original value.

Alternative String Delimiters

As you've seen, Perl enables you to enclose character strings in either single quotation marks or double quotation marks. Strings in double quotation marks are searched for variable names, which are replaced with their values when found; strings in single quotation marks are not searched.

Consider the following example:

$var = 5;

print ("$var\n");

print ('$var\n');

The first call to print prints 5 followed by a newline character; the second prints the string $var\n as is.

Perl enables you to use any delimiter you want in place of either single quotation marks or double quotation marks. To specify a string that-like a single-quoted string-is not searched for variable names, use q followed by the delimiter you want to use. For example, the following strings are equivalent:

q!hello $there!

'hello $there'

A useful trick is to use newline characters as delimiters:

q

this is my string

This example is equivalent to the following because the newline after the q indicates the beginning of the string, and the newline after string indicates the end of the string:

'this is my string'

To define a string that is searched for variable names, use qq:

qq/This string contains $var./

The / characters delimit the string

This string contains $var.

which is then searched for variable names. This means that $var is replaced by its current value.

NOTE
If you use a left parenthesis as the opening delimiter for a string defined using q or qq, the Perl interpreter expects a right parenthesis as the closing delimiter. This method of operation enables you to treat q and qq as if they were functions:
q(Here is a single quoted string);
qq(Here is a double quoted string);
These are equivalent to both of the following:
'Here is a single quoted string'
"Here is a double quoted string"
Be careful not to leave a space between the q or qq and the left parenthesis; if you do, the Perl interpreter will assume that the space character, not the (, is the delimiter

qw, defined in Perl 5, provides a convenient way of breaking a string into words. The following statements are equivalent:

@words = qw/this is a list of words/;

@words = split(' ', q/this is a list of words/);

In each case, @words is assigned the list

("this", "is", "a", "list", "of", "words")

qw supports any alternative string delimiter supported by q and qq.

Defining Strings Using <<

You can use << (two left angle brackets) to indicate the beginning of a string. This string continues until the next blank line. The following is an example:

$longstring = <<

Here is the first part of the string.

Here is the last part of the string.



# here is the next statement

This example defines a string consisting of the two input lines

Here is the first part of the string.

Here is the last part of the string.

and assigns it to $longstring. The newline characters are included as part of the string.

You can specify the characters that indicate "end of string" by including them after the <<. For example:

$longstring = <<END

Here is the first part of the string.

Here is the last part of the string.

END

# here is the next statement.

Here, END indicates the end of the string.

You can enclose the end-of-string characters in either single or double quotation marks. Single-quoted end-of-string characters behave like normal end-of-string characters:

$longstring = <<'END'

Here is the first part of the string.

Here is the last part of the string.

END

# here is the next statement

Double-quoted end-of-string characters are searched for variable names, which are replaced by their values if found.

$endchars = "END";

$longstring = <<"$endchars"

Here is the first part of the string.

Here is the last part of the string.

END

# here is the next statement

Here, $endchars is replaced by its value, END, which is used to indicate the end of the string.

A string created using << can be used wherever a string is expected. For example, the statement

print <<END

Hello there!

This is a test!

END

writes the following to the standard output file:

Hello there!

This is a test!

(This is one place where omitting the parentheses when you pass an argument to a function becomes useful.)

You can use the x operator to write a string more than once:

print <<END x 2

Hello there!

END

This sends the following to the standard output file:

Hello there!

Hello there!

You can supply more than one << at a time. If you do, they are processed in the order in which they are received. For example, the statement

$longstring = <<END1 <<END2

This is the first part.

END1

This is the second part.

END2

assigns the following (including the trailing newlines) to $longstring:

This is the first part.

This is the second part.

DON'T leave a space between the << and the end-of-string characters. (If you do, the Perl interpreter will terminate the string when it sees the next blank line.)
DON'T put anything else in the line containing the end-of-string characters.

Special Internal Values

Perl defines three special internal values your program can use: __LINE__, __FILE__, and __END__.

__LINE__ and __FILE__ contain, respectively, the current line number and current filename of the program you are running. These are the values that die and warn use when printing the line number and filename on which an error or a warning occurs.

__END__ is a special value that indicates "end of file." Everything after __END__ is treated as data. If the program is contained in a file, you can read the data after __END__ by reading from the DATA file variable:

$data = <DATA>;

NOTE
__LINE__ and __FILE__ cannot be substituted into double-quoted strings.
You can use the ^D or ^Z character (Ctrl+D or Ctrl+Z) in place of __END__

__END__ does not need to appear on a line by itself as long as some white space separates it from the next item in the file. However, the first line of the file represented by DATA is always the line immediately following the __END__. For example:
__END__ Here is some input.
Here is some more input.
In this case, the first line read by <DATA> is
Here is some more input.
The information immediately following the __END__ is lost.

Using Back Quotes to Invoke System Commands

Perl provides a way to treat the value printed by a system command as a string. To do this, enclose the system command in back quote characters (the ` character).

For example, here is a way to include your user name in a Perl program:

$myname = `whoami`;

chop ($myname);

The first statement calls the system command whoami, which prints the name of the person logged on. This name is assigned to $myname. (The call to chop is necessary because whoami appends a newline character to the name, which enables it to appear on its own line on the screen.)

The Perl interpreter performs variable substitution on the string enclosed in back quotes before treating it as a system command.

$command = "whoami";

$myname = `$command`;

chop ($myname);

Here, the value of $command, whoami, is substituted into the string enclosed in back quotes, and it becomes the system command that is called.

When a system command is executed, the return code from the command is stored in the system variable $?. To determine whether the system command has executed properly, check this system variable. (Normally, a value of zero indicates successful execution, and any other value indicates an error. The actual error value depends on the command.)

To use a character other than a back quote as a delimiter, use qx:

$myname = qx#whoami#;

chop ($myname);

As with q and qq, described previously, the first character after qx is treated as the string delimiter. The string continues until another string delimiter-in this case, #-is seen.

NOTE
If ( is used as an opening string delimiter, ) becomes the closing string delimiter:
$myname = qx(whoami);

Pattern Matching Using ?? and the reset Function

The ?? pattern matching operator is identical to the // pattern-matching operator you have been using all along, except that it matches only once, even if it is inside a loop. For example, the following statement loops only once, because the pattern ?abc? is not matched the second time it is executed:

while ($line =~ ?abc?) {

        # stuff goes here

}

To make the ?? pattern matching operator match again, call the reset function. This function tells the Perl interpreter that a particular ?? operator can be used to match a pattern again. Listing 20.2 is an example of a program that uses ?? and reset.


Listing 20.2. A demonstration of ?? and the reset function.
1:  #!/usr/local/bin/perl

2:  

3:  while ($line = <STDIN>) {

4:          last unless ($line =~ ?\bthe\b?);

5:          print ("$$");

6:          reset;

7:  }


$ program20_2

this is the first line

this is  first line

the next line of input

 next line of input

last line-not matched

$

Line 4 of this program uses the ?? pattern matching operator to check whether the word the appears in the current input line. If it does not, the program terminates. If it does, line 5 uses the $` and $_ variables to print the parts of the line not matched.

Line 6 calls reset, which resets the ?? operator in line 4. If reset is not called, line 4 will not match even if the new input line contains the word the.

The ?? operator is deprecated in Perl version 5. This means that the operator is still supported but is considered obsolete. Future versions of Perl might not support this operator

Using reset with Variables

You also can use the reset function to clear all variables whose name begins with a specified character. The following statement assigns the null string to all scalar variables whose names begin with the letter w (such as, for instance, $which) and assigns the empty list to all array variables whose names begin with this letter:

reset ("w");

The following statement assigns the null string or the empty list to all variables whose names begin with a or e:

reset ("ae");

You can use ranges of letters with reset:

reset ("a-d");

This example resets all variables whose names begin with a, b, c, or d.

Be careful with reset because it resets all variables whose names begin with the specified letters, including built-in variables such as @ARGV.

Other Features of the <> Operator

As you've seen, the <> operator reads from the file specified by the enclosed file variable. For example, the following statement reads a line from the file represented by MYFILE:

$line = <MYFILE>;

The following sections describe how to use <> with scalar variable substitution and how to use <> to create a list of filenames.

Scalar Variable Substitution and <>

If a scalar variable is contained in the <> operator, the value of the variable is assumed to be the name of a file variable. For example:

$filename = "MYFILE";

$line = <$filename>;

Here, the value of $filename, MYFILE, is assumed to be the file variable associated with the input file to read from. When you change the value of $filename, you change the input file.

Creating a List of Filenames

UNIX commands that manipulate files, such as mv and cp, enable you to supply a pattern to generate a list of filenames. Any filename matching this pattern is included as part of the list. For example, the following command copies every file whose name ends in .pl to the directory /u/jqpublic/srcdir:

$ cp *.pl /u/jqpublic/srcdir

In Perl, if the <> operator encloses something other than a file variable or a scalar variable containing a file variable, it is assumed to be a pattern that matches a list of files. For example, the following statement assigns a list of the filenames ending in .pl to the array variable @filelist:

@filelist = <*.pl>;

You can use filename patterns in loops:

while ($line = <*.pl>) {

        print ("$line\n");

}

This code prints each filename ending in .pl on a separate line.

Global Indirect References and Aliases

On Day 9, "Using Subroutines," you learned that you can pass the name of an array to a subroutine using an alias. For example:

sub my_sub {

       local (*subarray) = @_;

       $arraylength = @subarray;

}

The *subarray definition in my_sub tells the Perl interpreter to operate on the actual list instead of making a copy. When this subroutine is called by a statement such as the following, the Perl interpreter realizes that myarray and subarray refer to the same array variable:

&my_sub(*myarray);

When a name is given an alias, all variables with that name can be referred to using the alias. This means, in this example, that the @subarray variable and the @myarray variable refer to the same array. If the program also defines variables named $subarray and %subarray, you can use $myarray and %myarray, respectively, to refer to these variables.

In the earlier example, the following two statements:

my_sub (*myarray);

local (*subarray) = @_;

are equivalent to the assignment

local (*subarray) = *myarray;

In each case, the name subarray is defined to be an alias of the name myarray. Because *subarray is contained inside a local definition in a subroutine, subarray and myarray are equivalent only while the subroutine is being executed.

If desired, you can define an alias for a name that remains in force throughout your program. For example:

*subarray = *myarray;

If this statement is part of your main program, subarray becomes an alias for myarray in all parts of your program, including all subroutines. The values of $subarray, @subarray, and %subarray, if they are defined, are lost.


Listing 20.3 is a simple example of a program that defines and uses a global alias.
Listing 20.3. An example of a global alias.
1:  #!/usr/local/bin/perl

2:  

3:  *name2 = *name1;

4:  $name1 = 14;

5:  print ("$name2\n");


$ program20_3

14

$

Line 3 of this program defines name2 as an alias for name1. Every variable named name1 can therefore be referred to using the name name2. As a result, $name1 and $name2 are really the same scalar variable; this means that line 5 prints the value assigned in line 4.

DON'T use aliases unless you absolutely must, because they can become very confusing.
DO, instead, substitute the variable name into a string and then execute it using eval. This is a better way to reference a variable indirectly. For example:
$name2 = '$name1';
eval ("$name2 = 14;");
The string $name1 is substituted for the variable name $name2, yielding the string
$name1 = 14;
eval then executes this statement, which assigns 14 to $name1.

Packages

A Perl program keeps track of the variables and subroutines defined within it by storing their names in a symbol table. In Perl, the collection of names in a symbol table is called a package. The following sections describe packages and how to use them.

Defining a Package

Perl enables you to define more than one package for a program, with each package contained in a separate symbol table. To define a package, use the package statement.

package mypack;

This statement creates a new package named mypack. All variable and subroutine names defined from this point on in the program are stored in the symbol table associated with the new package. This process continues until another package statement is encountered.

Each symbol table contains its own set of variable and subroutine names, and each set of names is independent. This means that you can use the same variable name in more than one package.

$var = 14;

package mypack;

$var = 6;

The first statement creates a variable named $var and stores it in the main symbol table. The statement following the package statement creates another variable named $var and stores it in the symbol table for the mypack package.

Switching Between Packages

You can switch back and forth between packages at any time. Listing 20.4 shows how you can carry out this action.


Listing 20.4. A program that switches between packages.
1:  #!/usr/local/bin/perl

2:  

3:  package pack1;

4:  $var = 26;

5:  package pack2;

6:  $var = 34;

7:  package pack1;

8:  print ("$var\n");


$ program20_4

26

$

Line 3 defines a package named pack1. Line 4 creates a variable named $var, which is then stored in the symbol table for the pack1 package. Line 5 then defines a new package, pack2. Line 6 creates another variable named $var, which is stored in the symbol table for the pack2 package. Two separate copies of $var now exist, one in each package.

Line 7 specifies the pack1 package again. Because pack1 has already been defined, this statement just sets the current package to be pack1; therefore, all variable and subroutine references and definitions refer to names stored in the symbol table for this package.

As a consequence, when line 8 refers to $var, it refers to the $var stored in the pack1 package. The value stored in this variable, 26, is retrieved and printed.

The main Package

The default symbol table, in which variable and subroutine names are normally stored, is associated with the package named main. If you have defined a package using the package statement and you want to switch back to using the normal default symbol table, specify the main package as shown here:

package main;

When this statement is executed, your program resumes behaving as though no package statements have ever been seen. Subroutine and variable names are stored as they normally are.

Referring to One Package from Another

To refer to a variable or subroutine defined in one package from inside another package, precede the variable name with the package name followed by a single quotation-mark character. For example:

package mypack;

$var = 26;

package main;

print ("$mypack'var\n");

Here, $mypack'var refers to the variable named $var located in the package mypack.

Do not put any spaces between the quotation-mark character and either the package name or the variable name. The following examples are not correct:
$mypack ' var
$mypack' var
$mypack 'var

NOTE
In Perl 5, the package name and variable name are separated by a pair of colons instead of a quotation mark:
$mypack::var
The quotation-mark character is supported for now but might not be understood in future versions of Perl.

Specifying No Current Package

Perl 5 enables you to state that there is to be no current package. To do this, specify a package statement without a package name, as in the following:

package;

This tells the Perl interpreter that all variables must have their package names explicitly specified in order for a statement to be valid.

$mypack::var = 21;    # OK

$var = 21;            # error - no current package

This restriction remains in effect until a current package is explicitly defined by another package statement.

Packages and Subroutines

A package definition affects all the statements in a program, including subroutine definitions. For example:

package mypack;

subroutine mysub {

        local ($myvar);

        # stuff goes here

}

Here, the names mysub and myvar are both part of the mypack package. To call the subroutine mysub from outside the package mypack, specify &mypack'mysub.

You can change packages in the middle of a subroutine:

package pack1;

subroutine mysub {

        $var1 = 1;

        package pack2;

        $var1 = 2;

}

This code creates two copies of $var1, one in pack1 and one in pack2.

NOTE
Local variables that are part of packages can be referenced only in the subroutine or statement block in which they are defined. (In other words, they behave just like ordinary local variables do.)

Defining Private Data Using Packages

The most common use of packages is in files containing subroutines and global variables that are used in these subroutines. By defining a package for these subroutines, you can ensure that the global variables used in the subroutines are used nowhere else; such variables are called private data.

Better still, you can ensure that the package name itself is used nowhere else. Listing 20.5 is an example of a file containing a package name and variable names that are used nowhere else.


Listing 20.5. A file that contains private data.
1:  package privpack;

2:  $valtoprint = 46;

3:  

4:  package main;

5:  # This function is the link to the outside world.

6:  sub printval {

7:          &privpack'printval();

8:  }

9:

10: package privpack;

11: sub printval {

12:         print ("$valtoprint\n");

13: }

14:

15: package main;

16: 1;   # return value for require


This subroutine, by itself, cannot generate its output until printval is called.

This file can be divided into two parts: the part that communicates with the outside world and the part that does the work. The part that communicates is in the main or default package, and the part that does the work is in a special package named privpack. This package is defined only in this file.

The subroutine printval, defined in lines 6-8, is designed to be called from programs and subroutines defined elsewhere. Its only task is to call the version of printval defined in the privpack package.

The version of printval in the privpack package prints the number by retrieving it from the scalar variable $valtoprint. This variable is also part of the privpack package, and it is defined only inside it.

Lines 15 and 16 ensure that this file behaves properly if it is included in a program by require. Line 15 sets the current package to the default package, and line 16 is a nonzero return value to ensure that require does not generate an error.

Packages and System Variables

The following variables are assumed to be in the main package, even when referenced from inside another package:

Accessing Symbol Tables

To actually look in a symbol table from within a program, use the associative array %_package, in which package is the name of the package whose symbol table you want to access. For example, the variable %_main contains the default symbol table.

Normally, you will not need to look in the symbol table yourself.

Modules

Most large programs are divided into components, each of which performs a specific task or set of tasks. Each component normally contains one or more executable functions, plus the variables needed to make these functions work. The collection of functions and variables in a component is known as a program module. One module can appear in a variety of programs.

Creating a Module

Perl 5 enables you to use packages to define modules. To define a module in Perl 5, create the package and store it in a file of the same name. For example, a package named Mymodule would be stored in the file Mymodule.pm. (The .pm suffix indicates that the file is a Perl module.)

Listing 20.6 creates a module named Mymodule, containing subroutines myfunc1 and myfunc2, and variables $myvar1 and $myvar2. This code should be stored in the file Mymodule.pm.


Listing 20.6. Code that creates a Perl module.
1:  #/usr/local/bin/perl

2:

3:  package Mymodule;

4:  require Exporter;

5:  @ISA = qw(Exporter);

6:  @EXPORT = qw(myfunc1 myfunc2);

7:  @EXPORT_OK = qw($myvar1 $myvar2);

8:

9:  sub myfunc1 {

10:     $myvar1 += 1;

11: }

12:

13: sub myfunc2 {

14:     $myvar2 += 2; 

15: }


Lines 3-7 use the standard Perl module definition conventions. Line 3 defines the package. Line 4 includes a built-in Perl module, Exporter, which provides information about these definition conventions. Lines 6 and 7 define the subroutines and variables that are to be made available to the outside world.

Line 6 creates a special array named @EXPORT. This array lists the subroutines that can be called by other programs. Here, the subroutines myfunc1 and myfunc2 are accessible. Any subroutine defined inside a module that is not included in the list assigned to @EXPORT is a private subroutine, and can only be called inside the module.

Line 7 creates another special array, called @EXPORT_OK, that lists the variables that can be accessed by other programs. Here, the variables $myvar1 and $myvar2 are accessible from the outside world.

Importing Modules Into Your Program

To import a module into your Perl program, use the use statement. For example, the following statement imports the Mymodule module into a program:

use Mymodule;

The subroutines and variables in Mymodule can now be used in your program.

To undefine a previously imported module, use the no statement. For example, the following statement undefines the Mymodule module:

no Mymodule;

Listing 20.7 is an example of a program that imports and undefines a module. The integer module referenced here specifies that all arithmetic operations are to be on integers. Floating-point numbers are converted to integers before the arithmetic operations are performed.


Listing 20.7. A program that uses the use and no statements.
1:  #/usr/local/bin/perl

2:

3:  use integer;

4:  $result = 2.4 + 2.4;

5:  print ("$result\n");

6:

7:  no integer;

8:  $result = 2.4 + 2.4;

9:  print ("$result\n");


$ program20_7

4

4.8

$

Line 3 of this program imports the integer module. As a consequence, Line 4 converts 2.4 to 2 before performing the addition, yielding the result 4.

Line 7 undefines the integer module. This tells the Perl interpreter to revert to using floating-point numbers in arithmetic operations.

If a use or no statement appears inside a statement block, it remains in effect only for the duration of that block. For example:
use integer;
$result1 = 2.4 + 2.4;
if ($result1 == 4) {
no integer;
$result2 = 3.4 + 3.4;
}
$result3 = 4.4 + 4.4;
Here, the no statement is only in effect inside the if statement. In the statement after the if, the integer module is still in use, which means that 4.4 is converted to 4 before the addition is performed.

Using Predefined Modules

Perl 5 provides a variety of predefined modules that perform useful tasks. Each module can be imported by the use statement and removed by the no statement.

The following are some of the most useful modules in this library:

integerAs you have seen, this module tells Perl to use integer arithmetic instead of floating-point arithmetic.
DiagnosticsTells the Perl interpreter to print more diagnostic messages (warnings) when running your program.
EnglishAllows the use of English names as synonyms for system variables.
EnvA Perl module that imports environment variables.
POSIXThe Perl interface to the POSIX standard (IEEE 1003.1).
SocketLoads the C programming language's socket handling mechanisms.

A complete list of the predefined modules included with Perl 5 can be found in your Perl documentation.

TIP
Perl 5 users all over the world write useful modules and make them available to the Perl community through the Internet. The Comprehensive Perl Archive Network (CPAN) of Perl archives provides a complete list of these modules. More information on the CPAN network is available at the Web site located at http://www.perl.com/perl/CPAN/README.html.

Using Perl in C Programs

Perl 5 enables you to call Perl subroutines from within C programs. To add this capability, you need to do two things: add references to Perl to your program source, and then link the Perl library when you compile your program.

See the Perl documentation for more details on how to use Perl subroutines in C programs.

Perl and CGI Scripts

The Common Gateway Interface (CGI) is a standard for interfacing external applications with information servers (such as those found on the World Wide Web).

For more information on CGI, go to the Web page located at http://hoohoo.ncsa.uiuc.edu/cgi. A library of CGI scripts written in Perl can be found at http://www.bio.cam.ac.uk/web/cgi-lib.pl.txt.

Translators and Other Supplied Code

The Perl distribution provides programs that translate the following items into Perl:

For information on these translation programs, refer to the documentation supplied with your Perl distribution.

Summary

Today you learned about features of Perl that were not discussed on previous days.

Q&A

Q:Why does a file included by require need to execute a statement? Why does require check a return code?
A:Because files included by require can contain statements that are immediately executed, checking for a return code enables programs to determine whether code included by require generated any errors.
Q:Is a $#array variable defined for system array variables such as @ARGV?
A:Yes. For example, $#ARGV contains the largest subscript of the @ARGV array; you can test this to determine whether your program was passed enough arguments.
Q:Are $#array variables defined for associative arrays?
A:No, because there is no concept of a "largest subscript" in associative arrays.
Q:What happens to system variables when reset is called? For example, is @ARGV reset when reset is passed "A"?
A:The reset function affects all variables, including system variables. For this reason, you should be careful when you use reset.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.

Quiz

  1. What do these constants contain?
    1. __LINE__
    2. __FILE__
    3. __END__

  1. What is the value of each of the following strings? (Assume that $var has the value hello.)
    1. q(It's time to say $var)
    2. qq "It's time to say $var"; # a comment
    3. qx/echo $var/

  1. What is stored in @array after the following statements have been executed?
    @array = ("one", "two", "three", "four");
    $#array = 2;
    $array[4] = "five";
  2. How can you include code from another file in your program?

Exercises

  1. Write a program that uses the <> operator to list all the files in a directory in alphabetical order.
  2. Write a program that uses a subroutine named sum to add the numbers in a list and return the total. Read the list from standard input (one per line). Assume that the subroutine is contained in the file /u/jqpublic/perlfiles/sum.pl. Print the total returned by sum.
  3. Write a program that creates two packages named pack1 and pack2. For each package, read a line from standard input and assign it to the variable $var. Assume that each $var contains a number, add the two numbers together, and print the total.
  4. BUG BUSTER: What is wrong with the following statements?
    print ("Perl files in this directory:\n");
    $filepattern = "*.pl";
    while ($name = <$filepattern>) {
    print ("$name\n");
    }
  5. BUG BUSTER: What is wrong with the following statement?
    print << EOF
    Here is part of my string.
    Here is the rest of my string.
    EOF