Chapter 16

Command-Line Options


CONTENTS

Today's lesson describes the options you can specify to control how your Perl program operates. These options provide many features, including those that perform the following tasks:

Today's lesson begins with a description of how to supply options to your Perl program.

Specifying Options

There are two ways to supply options to a Perl program:

The following sections describe these methods of supplying options.

Specifying Options on the Command Line

One way to specify options for a Perl program is to enter them on the command line when you enter the command that starts your program.

The syntax for specifying options on the command line is

perl options program

Here, program is the name of the Perl program you want to run, and options is the list of options you want to supply to the program.

For example, the following command runs the Perl program named test1 and passes it the options -s and -w. (You'll learn about these and other options later today.)

$ perl -s -w test1

Some options need to be specified along with a value. For example, the -0 option requires an integer to be passed with it:

$ perl -0 26 test1

Here, the integer 26 is associated with the option -0.

If you want, you can omit the space between the option and its associated value, as in the following:

$ perl -026 test1

As before, this command associates 26 with the -0 option. In either case, the value associated with an option must always immediately follow the option.

NOTE
If an option does not require an associated value, you can put another option immediately after it without specifying an additional - character or space. For example, the following commands are equivalent:
$ perl -s -w test1
$ perl -sw test1
You can put an option that requires a value as part of a group of options, provided that it is last in the group. For example, the following commands are equivalent:
$ perl -s -w -0 26 test1
$ perl -sw026 test

Specifying an Option in the Program

Another way to specify a command option is to include it as part of the header comment for the program. For example, suppose that the first line of your Perl program is this:

#!/usr/local/bin/perl -w

In this case, the -w option is automatically specified when you start the program.

Perl 4 enables you to specify only one option (or group of options) on the header comment line. This means that the following line generates an "unrecognized switch" error message:
#!/usr/local/bin/perl -w -s
Perl 5 enables as many switches as you like on the command line. However, some operating systems chop the header line after 32 characters, so be careful if you are planning to use a large number of switches


NOTE
Options specified on the command line override options specified in the header comment. For example, if your header comment is
#!/usr/local/bin/perl -w
and you start your program with the command
$ perl -s test1
the program will run with the -s option specified but not the -w option

The -v Option: Printing the Perl Version Number

The -v option enables you to find out what version of Perl is running on your machine. When the Perl interpreter sees this option, it prints information on itself and then exits without running your program.

This means that if you supply a command such as the following, the file test1 is not executed:

$ perl -v test1

Here is sample output from the -v command:

This is perl, version 5.001



        Unofficial patch level 1m



Copyright (c) 1987-1994, Larry Wall



Perl may be copied only under the terms of either the Artistic License 

or the GNU General Public License, which may be found in the Perl 5.0 

source kit.

The only really useful things here, besides the copyright notice, are the version number of the Perl you are running-in this case, 4.0-and the patch level, which indicates how many repairs, or patches, have been made to this version. Here, the patch level is 36 (which, at this writing, is the latest release of Perl version 4.0).

No other options should be specified if you specify the -v option, because none of them would do anything in this case anyway.

The -c Option: Checking Your Syntax

The -c option tells the Perl interpreter to check whether your Perl program is correct without actually running it. If it is correct, the Perl interpreter prints the following message (in which filename is the name of your program) and then exits without executing your program:

filename syntax OK

If the Perl interpreter detects errors, it displays them just as it normally does. After printing the error messages, it prints the following message, in which filename is the name of your program:

filename had compilation errors

Again, there is no point in supplying other options if you specify the -c option because the Perl interpreter isn't actually running the program; the only exception is the -w option, which prints warnings. This option is described in the following section.

The -w Option: Printing Warnings

As you have seen on the preceding days, some mistakes are easy to make when you are writing a Perl program, such as accidentally typing the wrong variable name, or using == when you really mean to use eq. Because certain mistakes crop up frequently, the Perl interpreter provides an option that checks for them.

This option, the -w option, prints a warning every time the Perl interpreter sees something that might cause a problem. For example, if the interpreter sees the statement

$y = $x;

and hasn't seen $x before (which means that $x is undefined), it prints a warning message in the following form if you are running Perl 4:

Possible typo: "x" at filename line linenum.

Here, filename is the name of your Perl program, and linenum is the number of the line on which the interpreter has detected a potential problem.

If you are running Perl 5, the message is similar, but also includes the name of the current package:

Identifier "main::x" used only once: possible typo at filename line linenum.

For more information on packages, see Day 19, "Object-Oriented Programming in Perl."

The following sections provide a partial list of the potential problems detected by the -w option. (If you are running Perl 5, the -w option provides dozens of useful warnings. Consult the Perl manual pages for a complete list.)

NOTE
The -w option can be combined with the -c option to provide a means of checking your syntax for errors and problems before you actually run the program

Checking for Possible Typos

As you have seen, a statement such as the following one leads to a warning message if $x has not been previously defined:

$y = $x;

The "possible typo" error message also appears in the following circumstances, among others:

Of course, the possible-typo message might flag lines that don't actually contain typos. Following are two of the most common situations in which a possible typo actually is correct code:

format BLANK =

.

Possible typo: "BLANK" at file1 line 26.

$~ = "BLANK";

($d1, $d2, $groupid) = getgrnam ($groupname);

Checking for Redefined Subroutines

One useful feature of the -w option is that it checks whether two subroutines of the same name have been defined in the program. (Normally, if the Perl interpreter sees two subroutines of the same name, it quietly replaces the first subroutine with the second one and carries on.)

If, for example, two subroutines named x are defined in a program, the -w option prints a message similar to the following one:

Subroutine x redefined at file1 line 46.

The line number specified is the line that starts the second subroutine.

When the -w option has detected this problem, you can decide which subroutine to rename or throw away.

Checking for Incorrect Comparison Operators

Another really helpful feature of the -w option is that it checks whether you are trying to compare a string using the == operator.

In a statement such as the following:

if ($x == "humbug") {

        ...

}

the conditional expression

$x == "humbug"

is equivalent to the expression

$x == 0

because all character strings are converted to 0 when used in a numeric context (a place where a number is expected). This is correct in Perl, but it is not likely to be what you want.

If the -w option is specified and the Perl interpreter sees a statement such as this one, it prints a message similar to the following if you are running Perl 4:

Possible use of == on string value at file1 line 26.

In Perl 5, the following warning is printed:

Argument "humbug" isn't numeric for numeric eq at file1 line 26.

In either case, this warning enables you detect these incorrect == operators and replace them with eq operators, which compare strings.

The -w operator doesn't detect the opposite problem, namely:
if ($x eq 46) {
...
}
In this case, the Perl interpreter converts 46 to the string 46 and performs a string comparison.
Because a number and its string equivalent usually mean the same thing, this normally doesn't cause a problem. Watch out, though, for octal numbers in string comparisons, as in the following example:
if ($x eq 046) {
...
}
Here, the octal value 046 is converted to the number 38 before being converted to a string. If you really want to compare $x to 046, this code will not produce the results you expect.
Another thing to watch out for is this: In Perl 4, the -w option does not check for conditional expressions such as the following:
if ($x = 0) {
...
}
because there are many cases in Perl in which the = assignment operator belongs inside a conditional expression. You will have to manually check that you are not specifying = (assignment) when you really mean to use == (equality comparison).
Perl 5 flags this with the following message:
Found = in conditional, should be == at filename line filenum

The -e Option: Executing a Single-Line Program

The -e option enables you to execute a Perl program from your shell command line. For example, the command

$ perl -e "print ('Hello');"

prints the following string on your screen:

Hello

You can also specify multiple -e options. In this case, the Perl statements are executed left to right. For example, the command

$ perl -e "print ('Hello');" -e "print (' there');"

prints the following string on your screen:

Hello there

By itself, the -e option is not all that useful. It becomes useful, however, when you use it in conjunction with some of the other options you'll see in today's lesson.

You can leave off the closing semicolon in a Perl statement passed via the -e option, if you want to:
$ perl -e "print ('Hello')"
If you are supplying two or more -e options, however, the Perl interpreter strings them together and treats them as though they are a single Perl program. This means that the following command generates an error because there must be a semicolon after the statement specified with the first -e option:
$ perl -e "print ('Hello')" -e "print (' there')

The -s Option: Supplying Your Own Command-Line Options

As you can see from this chapter, you can control the behavior of Perl by specifying various command-line options. You can control the behavior of your own Perl programs by spec-ifying command-line options for them too. To do this, specify the -s option when you call the program.

Here's an example of a command that passes an option to a Perl program:

$ perl -s testfile -q

This command starts the Perl program testfile and passes it the -q option.

To be able to pass options to your program, you must specify the Perl-s option. The following command does not pass -q as an option:
$ perl testfile -q
In this case, -q is just an ordinary argument that is passed to your program and stored in the built-in array variable @ARGV.
The easiest way to remember to include -s is to specify it as part of your header comment:
#!/usr/local/bin/perl -s
This ensures that your program always will check for options. (Unless, of course, you override the option check by providing other Perl options on the command line when you invoke the program.

If an option is specified when you invoke your Perl program, the scalar variable whose name is the same as the option is automatically set to 1 before program execution begins. For example, if a Perl program named testfile is called with the -q option, as in the following, the scalar variable $q is automatically set to 1:

$ perl -s testfile -q

You then can use this variable in a conditional expression to test whether the option has been set.

NOTE
If -q is treated as an option, it does not appear in the system variable @ARGV. A command-line argument either sets an option or is added to @ARGV

Options can be longer than a single character. For example, the following command sets the value of the scalar variable $potato to 1:

$ perl -s testfile -potato

You also can set an option to a value other than 1 by specifying = and the desired value on the command line:

$ perl -s testfile -potato="hot"

This line sets the value of $potato to hot.

Listing 16.1 is a simple example of a program that uses command-line options to control its behavior. This program prints information about the user currently logged in.


Listing 16.1. An example of a program that uses command-line options.
1:  #!/usr/local/bin/perl -s

2:  

3:  # This program prints information as specified by

4:  # the following options:

5:  # -u: print numeric user ID

6:  # -U: print user ID (name)

7:  # -g: print group ID

8:  # -G: print group name

9:  # -d: print home directory

10: # -s: print login shell

11: # -all: print everything (overrides other options)

12: 

13: $u = $U = $g = $G = $d = $s = 1 if ($all);

14: $whoami = "whoami";

15: chop ($whoami);

16: ($name, $d1, $userid, $groupid, $d2, $d3, $d4,

17:         $homedir, $shell) = getpwnam ($whoami);

18: print ("user id: $userid\n") if ($u);

19: print ("user name: $name\n") if ($U);

20: print ("group id: $groupid\n") if ($g);

21: if ($G) {

22:         ($groupname) = getgrgid ($groupid);

23:         print ("group name: $groupname\n");

24: }

25: print ("home directory: $homedir\n") if ($d);

26: print ("login shell: $shell\n") if ($s);


$ program16_1 -U -d

user name: dave

home directory: /ag1/dave

$

The header comment in line 1 specifies that the -s option is to be automatically specified when this Perl program is invoked. This ensures that options can always be passed to this program (unless, of course, you override the -s option on the command line, as described earlier).

The comments in lines 3-11 provide information on what options the program supports. This information is useful when someone is reading or modifying the program because there is no other way to tell which scalar variables are used to test options.

The option -all indicates that the program is to print everything; if this option is specified, the scalar variable $all is set to 1. To cut down on the number of comparisons later, line 13 checks whether $all is 1; if it is, the other scalar variables corresponding to command-line options are set to 1. This technique ensures that the following commands are equivalent (assuming that your program is named program16_1):

$ program16_1 -all

$ program16_1 -u -U -g -G -d -s

The scalar variables listed in line 13 can be assigned to, even though they correspond to possible command-line options, because they behave just like other Perl scalar variables.

Lines 14-17 provide the raw material for the various print operations in this program. To start, when the Perl interpreter sees the string 'whoami', it calls the system command whoami, which returns the name of the user running the program. This name is then passed to getpwnam, which searches the password file /etc/passwd and retrieves the entry for this particular user.

Line 18 checks whether the -u option has been specified. To do this, it checks whether $u has a nonzero value. If it does, the user ID is printed. (The user ID is also printed if -all has been specified because line 13 sets $u to a nonzero value in this case.)

Similarly, line 19 prints the user name if -U has been specified, line 20 prints the group ID if -g has been specified, line 25 prints the home directory if -d has been specified, and line 26 prints the filename of the login shell if -s has been specified.

Lines 21-24 check whether to print the group name. If -g has been specified, $g is nonzero, and line 22 calls getgrid to retrieve the group name.

NOTE
Because command-line options can change the initial values of scalar variables, it is a good idea to always assign a value to a scalar variable before you use it. Consider the following example:
#!/usr/local/bin/perl
while ($count < 10) {
print ("$count\n");
$count++;
}
This program normally prints the numbers from 0 to 9 because $count is assumed to have an initial value of 0. However, if this program is called with the -count option, the initial value of $count becomes something other than 0, and the program behaves differently.
If you add the following statement before the while loop, the program always prints the numbers 0 to 9 regardless of what options are specified on the command line:
$count = 0

The -s Option and Other Command-Line Arguments

You can supply both options and command-line arguments to your program (provided that you supply the -s option to Perl). These are the rules that the Perl interpreter follows:

This means, for example, that the following command treats -w as an option to testfile, and foo and -e as ordinary arguments:

$ perl -s testfile -w foo -e

The special argument -- also indicates "end of options." For example, the following command treats -w as an option and -e as an ordinary argument. The -- is thrown away.

$ perl -s testfile -w - -e

The -P Option: Using the C Preprocessor

The C preprocessor is a program that takes code written in the C programming language and searches for special preprocessor statements. In Perl, the -P option enables you to use this preprocessor with your Perl program:

$ perl -P myprog

Here, the Perl program myprog is first run through the C preprocessor. The resulting output is then passed to the Perl interpreter for execution.

NOTE
Perl provides no way to just run the C preprocessor on a Perl program. To do this, you'll need a C compiler that provides an option which specifies "preprocessor only."
Refer to the documentation for your C compiler for details about how to do this

The following sections describe some of the most commonly used C preprocessor commands.

The C Preprocessor: A Quick Overview

C preprocessor statements always employ the following syntax:

#command value

Each C preprocessor statement starts with a # character. command is the preprocessor operation to perform, and value is the (optional) value associated with this operation.

Macro Substitution: The #define Operator

The most common preprocessor statement is #define. This statement tells the preprocessor to replace every occurrence of a particular character string with a specified value.

The syntax for #define is

#define macro     value

This statement replaces all occurrences of the character string macro with the value specified by value. This operation is known as macro substitution. macro can contain letters, digits, or underscores.

The value specified in a #define statement can be any character string or number. For example, the following statement replaces all occurrences of USERNAME with the string "dave" (including the quotation marks):

#define USERNAME   "dave"

This statement replaces EXPRESSION with the string (14+6), including the parentheses:

#define EXPRESSION  (14+6)

NOTE
When you are using #define with a value that is an expression, it is usually a good idea to enclose the value in parentheses. For example, consider the following Perl statement:
$result = EXPRESSION * 5;
If your preprocessor command is
#define EXPRESSION 14+6
the resulting Perl statement becomes
$result = 14 + 6 * 5;
which assigns 44 to $result (because the multiplication is performed first). If you enclose the preprocessor expression in parentheses, as in
#define EXPRESSION (14+6)
the statement becomes
$result = (14 + 6) * 5;
which yields the result 100, which is likely what you want.
Also, you always should enclose any parameters (described in the following section) in parentheses, for the same reason

Passing Arguments Using #define

You can specify one or more parameters with your #define statement. This capability enables you to treat the preprocessor command like a simple function that accepts arguments. For example, the following preprocessor statement takes a specified value and uses it as an exponent:

#define POWEROFTWO(val)  (2 ** (val))

In the Perl statement

$result = POWEROFTWO(1.3 + 2.6) + 4;

the preprocessor substitutes the expression 1.3 + 2.6 for val and produces this:

$result = (2 ** (1.3 + 2.6)) + 4;

You can supply more than one parameter with a #define statement. For example, consider the following statement:

#define EXPONENT (base, exp) ((base) ** (exp))

Now, the statement

$result = EXPONENT(4, 11);

yields the following result after preprocessing:

$result = ((4) ** (11));

The Perl interpreter ignores the extra parentheses.

TIP
By convention, macros defined using #define normally use all uppercase letters (plus occasional digits and underscores). This makes it easier to distinguish macros from other variable names or character strings

Listing 16.2 is an example of a Perl program that uses a #define statement to perform macro substitution. This listing is just Listing 15.4 with the preprocessor statement added.


Listing 16.2. A program that uses a #define statement.
1:  #!/usr/local/bin/perl -P

2:  

3:  #define AF_INET   2

4:  print ("Enter an Internet address:\n");

5:  $machine = <STDIN>;

6:  $machine =~ s/^\s+|\s+$//g;

7:  @addrbytes = split (/\./, $machine);

8:  $packaddr = pack ("C4", @addrbytes);

9:  if (!(($name, $altnames, $addrtype, $len, @addrlist) =

10:         gethostbyaddr ($packaddr, AF_INET))) {

11:         die ("Address $machine not found.\n");

12: }

13: print ("Principal name: $name\n");

14: if ($altnames ne "") {

15:         print ("Alternative names:\n");

16:         @altlist = split (/\s+/, $altnames);

17:         for ($i = 0; $i < @altlist; $i++) {

18:                 print ("\t$altlist[$i]\n");

19:         }

20: }


$ program16_2

Enter an Internet address:

128.174.5.59

Principal name: ux1.cso.uiuc.edu

$

Line 3 defines the macro AF_INET and assigns it the value 2. When the C preprocessor sees AF_INET in line 10, it replaces it with 2, which is the value of AF_INET on the current machine (as specified in the header file /usr/include/netdb.h or /usr/include/bsd/netdb.h).

If this program is moved to a machine that defines a different value for AF_INET, all you need to do to get this program to work is change line 3 to use the value on that machine.

Using Macros in #define Statements

You can use a previously defined macro as the value in another #define statement. The following is an example:

#define FIRST     1

#define SECOND    FIRST

$result = 43 + SECOND;

Here, the macro FIRST is defined to be equivalent to the value 1, and SECOND is defined to be equivalent to FIRST. This means that the statement following the macro definitions is equivalent to the following statement:

$result = 43 + 1;

Conditional Execution Using #ifdef and #endif

The #ifdef and #endif statements control whether a given group of statements is to be included as part of your program.

The syntax for the #ifdef and #endif statements is

#ifdef macro

code

#endif

Here, macro is any character string that can appear in a #define statement. code is one or more lines of your Perl program.

When the C preprocessor sees an #ifdef statement, it checks whether the macro has been defined using the #define statement. If it has, the code specified by code is included as part of the program. If it has not, the code specified by code is skipped.

NOTE
The code enclosed by #ifdef and #endif does not have to be a complete Perl statement. For example, the following code is legal:
$result = 14 * 2
#ifdef PLUSONE
+ 1
#endif
;
Here, $result is assigned 17 if PLUSONE is defined, 16 if it's not.
Be careful, though: If you abuse #ifdef, the resulting program might become difficult to read

The #ifndef and #else Statements

The #ifndef and #else statements provide additional control over when parts of your program are to be executed.

The #ifndef statement enables you to define code that is to be executed when a particular macro is not defined.

The syntax for #ifndef is the same as for #ifdef:

#ifndef macro

code

#endif

For example:

#ifndef MYMACRO

$result = 26;

#endif

The assignment is performed only if MYMACRO has not appeared in a #define statement.

The #else statement enables you to specify code to be executed if a macro is defined and an alternative to choose if the macro is not defined. For example:

#ifdef MYMACRO

$result = 47;

#else

print ("Hello, world!\n");

#endif

Here, if MYMACRO has been defined by a #define statement, the following statement is exe-cuted:

$result = 47;

If MYMACRO has not been defined, the following statement is executed:

print ("Hello, world!\n");

You can use #else with #ifndef, as in the following:

#ifndef MYMACRO

print ("Hello, world!\n");

#else

$result = 47;

#endif

This code is identical to the #ifdef-#else-#endif sequence shown earlier in this section.

The #if Statement

The #if statement enables you to specify that certain lines of your program are to be included only if the expression included with the statement is nonzero.

The syntax for the #if statement is

#if expr

code

#endif

Here, expr is the expression to be evaluated, and code is the code to be executed if expr is nonzero.

For example, the following statement is executed only if the expression 14 + 3 is nonzero (which it always is, of course):

#if 14 + 3

$result = 26;

#endif

You can use a macro definition as part of an #if statement. If the macro is defined, it has a nonzero value in an #if expression; if it is not defined, it has the value zero. Consider the following example:

#if MACRO1 || MACRO2

$result = 47;

#endif

When the preprocessor sees the #if statement, it evaluates the expression MACRO1 || MACRO2. This expression has a nonzero value if either MACRO1 or MACRO2 is nonzero. Therefore, the following statement is executed if either MACRO1 or MACRO2 is defined:

$result = 47;

The #if statement provides a quick way to remove lines of code from your program temporarily:

#if 0

$result = 46;

print ("This line is not printed right now.\n");

#endif

Here, the expression included with the #if statement is always zero, which means that the statements between #if and #endif are always skipped.

You can use #else with #if, as in the following example:

#if MACRO1 || MACRO2

print ("MACRO1 or MACRO2 is defined.\n");

#else

print ("MACRO1 and MACRO2 are not defined.\n");

#endif

This code includes the first print statement if MACRO1 or MACRO2 has been defined using #define, and it includes the second print statement if neither has been defined.

You cannot use the ** (exponentiation) operator in an #if statement because ** is not supported in the C programming language

Nesting Conditional Execution Statements

You can put one #ifdef-#else-#endif construct inside another. For example:

#ifdef MACRO1

#ifdef MACRO2

print ("MACRO1 yes, MACRO2 yes\n");

#else

print ("MACRO1 yes, MACRO2 no\n");

#endif

#else

#ifdef MACRO2

print ("MACRO1 no, MACRO2 yes\n");

#else

print ("MACRO1 no, MACRO2 no\n");

#endif

#endif

You also can put an #if-#else-#endif construct or an #ifndef-#else-#endif construct inside an #ifdef-#else-#endif construct, or vice versa. The only restriction is that the inner construct must be completely contained in one part of the outer construct.

Including Other Files Using #include

Another preprocessor command that is quite useful is the #include command. This command tells the C preprocessor to include the contents of the specified file as part of the program.

The syntax for the #include command is

#include filename

filename is the name of the file to be included.

For example, the following command includes the contents of myincfile.h as part of the program:

#include <myincfile.h>

When an #include statement is found in a Perl program, the C preprocessor searches for the file in the current directory and the /usr/local/lib/perl directory. (The -I option, described in the following section, enables you to search in other directories.) To instruct the C preprocessor to search only the current directory, enclose the filename in double quotation marks rather than angle brackets.

#include "myincfile.h"

This command limits the search for myincfile.h to the current directory.

You can specify an entire pathname in an #include statement, as in the following example:

#include "/u/dave/myincfile.h"

This command retrieves the contents of /u/dave/myincfile.h and adds them to the program.

NOTE
Perl also enables you to include other files as part of a program using the require statement. For more information on require, refer to
Day 19, "Object-Oriented Programming in Perl.

The -I Option: Searching for C Include Files

You use the -I option with the -P option. It enables you to specify where to look for include files to be processed by the C preprocessor. For example:

perl -P -I /u/dave/myincdir testfile

This command tells the Perl interpreter to search the directory /u/dave/myincdir for include files (as well as the default directories).

To specify multiple directories to search, repeat the -I option:

perl -P -I /u/dave/dir1 -I /u/dave/dir2 testfile

This command searches in both /u/dave/dir1 and /u/dave/dir2.

NOTE
The directories specified in the -I option also are added to the system variable @INC. This technique ensures that the require function can search in the same directories as the C preprocessor.
For more information on @INC, refer to Day 17, "System Variables." For more information on require, refer to Day 19

The -n Option: Operating on Multiple Files

One of the most common tasks in Perl programs and in UNIX commands is to read the contents of several input files one line at a time and process each input line as it is read. In these programs and commands, the names of the input files are supplied on the command line. A simple example is the UNIX command cat:

$ cat file1 file2 file3 ...

This command reads one line of input at a time and writes it to the standard output file.

In Perl, one way to read the contents of several input files, one line at a time, is to enclose the <> operator in a while loop:

while ($line = <>) {

        # process $line in here

}

Another method is to specify the -n option. This option takes your program and executes it once for each line of input in each of the files specified on the command line.

Listing 16.3 is a simple example of a program that uses the -n option. It puts asterisks around each input line and then prints it.


Listing 16.3. A simple program that uses the -n option.
1:  #!/usr/local/bin/perl -n

2:  

3:  # input line is stored in the system variable $_

4:  $line = $_;

5:  chop ($line);

6:  printf ("* %-52s *\n", $line);


$ program16_3

* This test file has only one line in it.              *

$

The -n option encloses the program shown here in an invisible while loop. Each time the program is executed, the next line of input from one of the input files is read and is stored in the system variable $_. Line 4 takes this line and copies it into another scalar variable, $line; line 5 then removes the last character-the trailing newline character-from this line.

Line 6 uses printf to write the input line to the standard output file. Because printf is formatting the input, the asterisks all appear in the same columns (column 1 and column 56) on your screen.

NOTE
The previous program is equivalent to the following Perl program (which does not use the -n option):
#!/usr/local/bin/perl
while (< >) {
# input line is stored in the system variable $_
$line = $_;
chop ($line);
printf ("* %-72s *\n", $line);
}

The -n and -e options work well together. For example, the following command is equivalent to the cat command:

$ perl -n -e "print $_;" file1 file2 file3

The print $_; argument supplied with the -e option is a one-line Perl program. Because the -n option executes the program once for each input line and reads each input line into the system variable $_, the statement

print $_;

prints each input line in turn, which is exactly what the cat command does. (Note that the parentheses that normally enclose the argument passed to print have been omitted in this case.)

The previous command can be made even simpler:

$ perl -n -e "print" file1 file2 file3

By default, if no argument is supplied, print assumes that it is to print the contents of $_. And, if the program consists of a single statement, there is no need to include the closing semicolon.

The pattern matching and substitution operators also operate on $_ by default. For example, the following statement examines the contents of $_ and searches for a digit:

$found = /[0-9]/;

This default behavior makes it easy to include a search or a substitution in a single-line command. For example:

$ perl -n -e "print if /[0-9]/" file1 file2 file3

This command reads each line of the files file1, file2, and file3. If an input line contains a digit, it is printed.

NOTE
Several other functions use $_ as the default scalar variable to operate on, which makes those functions ideal for use with the -n and -e options. A full list of these functions is provided in the description of the $_ system variable, which is contained in Day 17

The -p Option: Operating on Files and Printing

The -p option is similar to the -n option: it reads each line of its input files in turn. However, the -p option also prints each line it reads.

This means, for example, that you can simulate the behavior of the UNIX cat command with the following command:

$ perl -p -e ";" file1 file2 file3

Here, the ; is a Perl program consisting of one statement that does nothing.

The -p option is designed for use with the -i option, described in the following section.

NOTE
If both the -p and the -n options are specified, the -n option is ignored

The -i Option: Editing Files

As you have seen, the -n and -p options read lines from the files specified on the command line. The -i option, when used with the -p option, takes the input lines being read and writes them back out to the files from which they came. This process enables you to edit files using commands similar to those used in the UNIX sed command.

For example, consider the following command:

$ perl -p -i -e "s/abc/def/g;" file1 file2 file3

This command contains a one-line Perl program that examines the scalar variable $_ and changes all occurrences of abc into def. (Recall that the substitution operator operates on $_ if the =~ operator is not specified.) The -p option ensures that $_ is assigned each line of each input file in turn and that the program is executed once for each input line. Thus, this command changes all occurrences of abc in the files file1, file2, and file3 to def.

Do not use the -i option with the -n option unless you know what you're doing. The following command also changes all occurrences of abc to def, but it doesn't write out the input lines after it changes them:
$ perl -n -i -e "s/abc/def/g;" file1 file2 file3
Because the -i option specifies that the input files are to be edited, the result is that the contents of file1, file2, and file3 are completely destroyed

The -i option also works on programs that do not use the -p option but do contain the <> operator inside a loop. For example, consider the following command:

$ perl -i file1 file2 file3

In this case, the Perl interpreter copies the first file, file1, to a temporary file and opens the temporary file for reading. Then, it opens file1 for writing and sets the default output file (the file used by calls to print, write, and printf) to be file1.

After the program finishes reading the temporary file to which file1 was copied, it then copies file2 to a temporary file, opens it for reading, opens file2 for writing, and sets the default output file to be file2. This process continues until the program runs out of input files.

Listing 16.4 is a simple example of a program that edits using the -i option and the < > operator. This program evaluates any arithmetic expressions (containing integers) it sees on a single line and replaces them with their results.


Listing 16.4. A program that edits files using the -i option.
1:  #!/usr/local/bin/perl -i

2:  

3:  while ($line = <>) {

4:          while ($line =~

5:                  s#\d+\s*[*+-/]\s*\d+(\s*[*+-/]\s*\d+)*#<x>#) {

6:                  eval ("\$result = $&;");

7:                  $line =~ s/<x>/$result/;

8:          }

9:          print ($line);

10: }


This program produces no output because output is written to the files specified on the command line.

The <> operator at the beginning of the while loop (line 3) reads a line at a time from the input file or files. Each line is searched using the pattern shown in line 5. This pattern matches any substring containing the following elements (in the order given):

  1. One or more digits
  2. Zero or more spaces
  3. An *, +, -, or / character
  4. Zero or more spaces
  5. One or more digits
  6. Zero or more of the preceding four subpatterns (which matches the last part of expressions such as 4 + 7 - 3)

This pattern is replaced by a placeholder substring, <x>.

Lines 6 and 7 are executed once for each pattern matched in the input line. The matched pattern, an arithmetic expression, is automatically stored in the system variable $&; line 6 substitutes this expression into a character string and passes this character string to the function eval. The call to eval creates a subprogram that evaluates the expression and returns the result in the scalar variable $result. Line 7 replaces the placeholder, <x>, with the result returned in $result.

When all the arithmetic expressions have been evaluated and substituted for, the inner while loop terminates, and line 9 calls print. Because the -i option has been set, the line is written back to the original input file from which it came.

NOTE
Even though you do not know the name of the file variable that represents the file being edited, you can still set the default output
file variable to some other file and change it back later.
To perform this task, recall that the select function returns the file variable associated with the current default file:
$editfile = select (MYFILE); # change default file
# do your write operations here
select ($editfile); # change default file back
After the second select call has been performed, the default output file is, once again, the file being edited

Backing Up Input Files Using the -i Option

By default, the -i option overwrites the existing input files. If you wish, you can save a copy of the original input file or files before overwriting them. To do this, specify a file extension with the -i option:

$ perl -i .old file1 file2 file3

Here, the .old file extension specified with the -i option tells the Perl interpreter to copy file1 to file1.old before overwriting it. Similarly, the interpreter copies file2 to file2.old, and file3 to file3.old.

The file extension specified with the -i option can be any character string. By convention, file extensions usually begin with a period; this convention makes it easier for you to spot them when you list the files in your directory.

TIP
If you are using the -i option with a program you are not familiar with, it is a good idea to specify a file extension. Doing so ensures that your files are not damaged if the program does not work the way you expect

The -a Option: Splitting Lines

The -a option is used with the -n or -p option. If the -a option is set, each input line that is read is automatically split into a list of "words" (sequences of characters that are not white space); this list of words is stored in a special system array variable named @F.

For example, if your input file contains the line

This    is    a   test.

and if a program that is called with the -a option reads this line, the array @F contains
the list

("This", "is", "a", "test.")

The -a option is useful for extracting information from files. Suppose that your input files contain records of the form

company_name      quantity_ordered     total_cost

such as, for example,

JOHN H. SMITH    10      47.32

Listing 16.5 shows how you can use the -a option to easily produce a program that extracts the quantity and total cost fields from these files.


Listing 16.5. An example of the -a option.
1:  #!/usr/local/bin/perl

2:  

3:  # This program is called with the -a and -n options.

4:  while ($F[0] =~ /[^\d.]/) {

5:          shift (@F);

6:          next if (!defined($F[0]));

7:  }

8:  print ("$F[0] $F[1]\n");


$ perl -a -n program16_5

10 47.32

106 11.54

$

Because the program is called with the -a option, the array variable @F contains a list, each element of which is a word from the current input line.

Because the company name in the input file might consist of more than one word (such as JOHN H. SMITH), the while loop in lines 4-7 is needed to get rid of everything that isn't a quantity field or a total cost field. After these fields have been eliminated, line 8 can print the useful fields.

Note that this program just skips over any nonstandard input lines.

The -F Option: Specifying the Split Pattern

The -F option, defined only in Perl 5, is designed to be used in conjunction with the -a option, and specifies the pattern to use when you split input lines into words. For example, suppose Listing 16.5 is called as follows:

$ perl -a -n -F:: program16_5

In this case, the words in the input file are assumed to be separated by a pair of colons, which means that the program is expecting to read lines such as the following:

JOHN H. SMITH::10::47.32

NOTE
The -F option ignores opening and closing slashes if they are present because it interprets them as pattern delimiters. This means that the following program invocations are identical:
$ perl -a -n -F:: program16_5
$ perl -a -n -F/::/ program16_

The -0 Option: Specifying Input End-of-Line

In all the programs you have seen so far, when the Perl interpreter reads a line from an input file or from the keyboard, it reads until it sees a newline character. You can tell Perl that you want the "end-of-line" input character to be something other than the newline character by specifying the -0 option. (The 0 here is the digit zero, not the letter O.)

With the -0 option, you specify which character is to be the end-of-line character for your input file by providing its ASCII representation in base 8 (octal). For example, the command

$ perl -0 040 prog1 infile

calls the Perl program named prog1 and specifies that it is to use the space character (ASCII 32, or 40 octal) as the end-of-line character when it reads the input file infile (or any other input file).

This means, for example, that if this program reads an input file containing the following:

Test input.

Here's another line.

it will read a total of four input lines:

The -0 option provides a quick way to read an input file one word at a time, assuming that each line ends with at least one blank character. (If it doesn't, you can quickly write a Perl program that uses the -i and -p options to add a space to the end of each line in each file.) Listing 16.6 is an example of a program that uses -0 to read an input file one word at a time.


Listing 16.6. A program that uses the -0 option.
1:  #!/usr/local/bin/perl -0040

2:  

3:  while ($line = <>) {

4:          $line =~ s/\n//g;

5:          next if ($line eq "");

6:          print ("$line\n");

7:  }


$ program16_6 file1

This

line

contains

five

words.

$

The header comment (line 1) specifies that the -0 option is to be used and that the space character is to become the end-of-line character. (Recall that you do not need a space between an option and the value associated with an option.) This means that line 3 reads from the input file until it sees a blank space.

Not everything read by line 3 is a word, of course. There are two types of lines that are not particularly useful that the program must check for:

Line 4 checks whether any newline characters are contained in the current input line. The substitution in this line is a global substitution, because an input line can contain two or more newline characters. (This occurs when an input file contains a blank line.)

After all the newline characters have been eliminated, line 5 checks whether the resulting input line is empty. If it is, the program continues with the next input line. If the resulting input line is not empty, the input line must be a useful word, and line 6 prints it.

NOTE
If you specify the value 00 (octal zero) with the -0 option, the Perl interpreter reads until it sees two newline characters. This enables you to read an entire paragraph at a time.
If you specify no value with the -0 option, the null character (ASCII 0) is assumed

The -l Option: Specifying Output End-of-Line

The -l option enables you to specify an output end-of-line character for use in print statements.

Like the -0 option, the -l option accepts a base-8 (octal) integer that indicates the ASCII representation of the character you want to use.

When the -l option is specified, the Perl interpreter does two things:

If you do not specify a value with the -l option, the Perl interpreter uses the character specified by the -0 option, if it is defined. If -0 has not been specified, the end-of-line character is defined to be the newline character.

If you are using both the -l and the -0 option and you do not provide a value with the -l option, the order of the options becomes significant because the options are processed from left to right.
If the -l option appears first, the output end-of-line character is set to the newline character. If the -0 option appears first, the output end-of-line character (set by -l) becomes the same as the input end-of-line character (set by -0)

Listing 16.7 is a simple example of a program that uses -l.


Listing 16.7. A program that uses the -l option.
1:  #!/usr/local/bin/perl -l014

2:  

3:  print ("Hello!");

4:  print ("This is a very simple test program!");


$ program16_7

Hello!

      This is a very simple test program!

$

The -l014 option in the header comment in line 1 sets the output line character to the newline character. This means that every print statement in the program will have a newline character added to it. As a consequence, the output from lines 3 and 4 appear on separate lines.

NOTE
You can control the input and output end-of-line characters also by using the system variables $/ and $\. For a description of these system variables, refer to Day 17

The -x Option: Extracting a Program from a Message

The -x option enables you to process a Perl program that appears in the middle of a file (such as a file containing an electronic mail message, which usually contains some mail routing information). When the -x option is specified, the Perl interpreter ignores every line in the program until it sees a header comment (a comment beginning with the #! characters).

If you are using Perl 5, the header comment must also contain the word "perl.

After the Perl interpreter sees the header comment, it then processes the program as usual until one of the following three conditions occurs:

_ _END_ _

If the Perl interpreter reads one of the end-of-program lines (the second and third conditions listed previously), it ignores everything appearing after that line in the file.

Listing 16.8 is a simple example of a program that works if run with the -x option.


Listing 16.8. A Perl program contained in a file.
1:  Here is a Perl program that appears in the middle

2:  of a file.

3:  The stuff up here is junk, and the Perl interpreter

4:  will ignore it.

5:  The next line is the start of the actual program.

6:  #!/usr/local/bin/perl

7:  

8:  print ("Hello, world!\n");

9:  _ _END_ _

10: This line is also ignored, because it is not part

11: of the program.


$ program16_8

Hello, world!

$

If this program is started with the -x option, the Perl interpreter skips over everything until it sees line 6. (Needless to say, if you try to run this program without specifying the -x option, the Perl interpreter will complain.) Line 8 then prints the message Hello, world.

Line 9 is the special end-of-program line. When the Perl interpreter sees this line, it skips the rest of the program.

NOTE
Of course, you can't specify the -x option in the header comment itself because the Perl interpreter has to know in advance that the program contains lines that must be skipped

Miscellaneous Options

The following sections describe some of the more exotic options you can pass to the Perl interpreter. You are not likely to need any of these options unless you are doing something unusual (and you really know what you are doing).

The -u Option

The -u option tells the Perl interpreter to generate a core dump file. This file can then be examined and manipulated.

The -U Option

The -U option tells the Perl interpreter to enable you to perform "unsafe" operations in your program. (Basically, you'll know that an operation is considered unsafe when the Perl interpreter doesn't let you perform it without specifying the -U option!)

The -S Option

The -S option tells the Perl interpreter that your program might be contained in any of the directories specified by your PATH environment variable. The Perl interpreter checks each of these directories in turn, in the order in which they are specified, to see whether your program is located there. (This is the normal behavior of the shell for commands in the UNIX environment.)

NOTE
You need to use -S only if you are running your Perl program using the perl command, as in
$ perl myprog
If you are running the program using a command such as
$ myprog
your shell (normally) treats it like any other command and searches the directories specified in your PATH environment variable even if you don't specify the -S option

The -D Option

The -D option sets the Perl interpreter's internal debugging flags. This option is specified with an integer value (for example, -D 256).

For details on this option, refer to the online manual page for Perl.

NOTE
The internal debugging flags specified by -D have nothing to do with the Perl debugger, which is specified by the -d option.
The debugging flags specified by -D provide information on how Perl itself works, not on how your program works

The -T Option: Writing Secure Programs

The -T option specifies that data obtained from the outside world cannot be used in any command that modifies your file system. This feature enables you to write secure programs for system administration tasks.

This option is only available in Perl 5. If you are running Perl 4, use a special version of Perl named taintperl. For details on taintperl, see the online documentation supplied with your Perl distribution.

The -d Option: Using the Perl Debugger

One final option that is quite useful is -d. This option tells the Perl interpreter to run your program using the Perl debugger. For a complete description of the Perl debugger and how to use it, refer to Day 21, "The Perl Debugger."

NOTE
If you are specifying the -d option, you still can use other options

Summary

Today you learned how to specify options when you run your Perl programs. An option is a dash followed by a single letter, and optionally followed by a value to be associated with the option. Options lacking associated values can be grouped together.

You can specify options in two ways: on the command line and in the header comment. Only one option or group of options can be supplied in the header comment.

Available options include those that list the Perl version number, check your syntax, display warnings, allow single-line programs on the command line, invoke the C preprocessor, automatically read from the input files, and edit files in place.

Q&A

Q:Why can you specify only one option in the header comment?
A:This is a restriction imposed by the UNIX operating system.
Q:Why does v display the Perl version number without running the program?
A:This option enables you to check whether the version of Perl you are running is capable of running your program. If an old copy of Perl is running on your machine, your program might not work properly.
Q:What options enable me to write a program that edits every line of a file?
A:Use the -i (edit in place) and -p (print each line) options. (These options are often used with the -e option to perform an editing command similar to those used by the UNIX sed command.)
Q:I have a program that needs to run on two or more different machines. Is there a way of writing the program that ensures that I don't have to change the program each time I change machines?
A:Here's how to carry out this task:
  1. On each machine, define a file that is to be used to store system-dependent constants. Give the file the same name on each machine. For example, you could call the file perldef.h. The location of the file doesn't matter as long as it's a different directory name on each type of machine.
  2. In each perldef.h, use #define to define one constant for each type of machine you run. For example, if you are running this program on UNIX 4.3BSD and System V machines, you could define constants named M_BSD and M_SYSV.
  3. After you have defined the constants, set the value of each constant to 0, except for the one corresponding to the machine on which you are running. For example, on your 4.3BSD machines, set M_BSD to 1, and set all the other constants to 0.
  4. Add the following statement to your program:
    #include <perldef.h>
  5. In your program, use #if and #endif to enclose any system-dependent information. For example, if a group of statements is to be executed only on 4.3BSD machines, enclose the statements with the statements
    #if BSD
    #endif
  6. When you run your program, use the -P option to specify C preprocessing, and use the -I option to tell the Perl interpreter to search for the directory corresponding to the perldef.h file for this machine. For example, if you are running your program on a 4.3BSD machine and the perldef.h file for 4.3BSD machines is in the /usr/local/include/bsdperl directory, include the following option when you start your program:
    -I /usr/local/include/bsdperl
Q:Why does the -p option override the -n option?
A:The -p option tells the Perl interpreter that you want to print each input line that you read, and the -n option tells it that you don't want to do so. These options basically contradict one another. -p overrides -n because -p is safer; if you really want -n, you can throw away the output from -p. If you really want -p and get -n, you won't get the output you want.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.

Quiz

  1. What do the following options do?
    a.    -0
    b.    -s
    c.    -w
    d.    -x
    e.    -n
  2. What happens when -l and -0 are both specified, and
    a.    -l appears first?
    b.    -0 appears first?
  3. Why do the -i and -n options destroy input files when included together?
  4. How does the C preprocessor distinguish between preprocessor commands and Perl comments?
  5. How does the Perl interpreter distinguish options for the interpreter from options for the program itself?

Exercises

  1. Write a program that replaces all the newline characters in the file testfile with colons. Use only command-line options to do this.
  2. Write a one-line program that prints only the lines containing the word the.
  3. Write a one-line program that prints the second word of each input line.
  4. Write a program that prints Hello! if you pass the -H switch to it and that prints Goodbye! if you pass the -G switch.
  5. Write a one-line program that converts all lowercase letters to uppercase.
  6. BUG BUSTER: What is wrong with this command line?
    $ perl -i -n -e "s/abc/def/g";
  7. BUG BUSTER: What is wrong with this command line?
    $ perl -ipe "s/abc/def/g";