Chapter 14

Scalar-Conversion and List-Manipulation Functions


CONTENTS

Today, you learn about the built-in Perl functions that convert scalar values from one form to another, and the Perl functions that deal with variables that have not had values defined for them.

You also learn about the built-in Perl functions that manipulate lists and array variables. These functions are divided into two groups:

Many of the functions described in today's lesson use features of the UNIX operating system. If you are using Perl on a machine that is not running UNIX, some of these functions might not be defined or might behave differently.
Check the documentation supplied with your version of Perl for details on which functions are supported or emulated on your machine

The chop Function

The chop function was first discussed on Day 3, "Understanding Scalar Values." It removes the last character from a scalar value.

The syntax for the chop function is

chop (var);

var can be either a scalar value or a list, as described in the following paragraphs.

For example:

$mystring = "This is a string";

chop ($mystring);

# $mystring now contains "This is a strin";

chop is used most frequently to remove the trailing newline character from an input line, as follows:

$input = <STDIN>;

chop ($input);

The argument passed to chop can also be a list. In this case, chop removes the last character from every element of the list. For example, to read an entire input file into an array variable and remove all of the trailing newline characters, use the following statements:

@input = <STDIN>;

chop (@input);

chop returns the character chopped. For example:

$input = "12345";

$lastchar = chop ($input);

This call to chop assigns 5 to the scalar variable $lastchar.

If chop is passed a list, the last character from the last element of the list is returned:

@array = ("ab", "cd", "ef");

$lastchar = chop(@array);

This assigns f, the last character of the last element of @array, to $lastchar.

The chomp Function

The chomp function, defined only in Perl 5, checks whether the last characters of a string or list of strings match the input line separator defined by the $/ system variable. If they do, chomp removes them.

The syntax for the chomp function is

result = chomp(var)

As in the chop function, var can be either a scalar variable or a list. If var is a list, each element of the list is checked for the input end-of-line string. result is the total number of characters removed by chomp.

Listing 14.1 shows how chomp works.


Listing 14.1. A program that uses the chomp function.
1:  #!/usr/local/bin/perl

2: 

3:  $/ = "::";   # set input line separator

4:  $scalar = "testing::";

5:  $num = chomp($scalar);

6:  print ("$scalar $num\n");

7:  @list = ("test1::", "test2", "test3::");

8:  $num = chomp(@list);

9:  print ("@list $num\n"); 


$ program14_1

testing 2

test1 test2 test3 4

$

This program uses chomp to remove the input line separator from both a scalar variable and an array variable. The call to chomp in line 5 converts the value of $scalar from testing:: to testing. The number of characters removed, 2, is returned by chomp and assigned to $num.

The call to chomp in line 8 checks each element of @list. The first element is converted from test1:: to test1, and the last element is converted from test3:: to test3. (The second element is ignored, because it is not terminated by the end-of-line specifier.) The total number of characters removed, 4 (two from the first element and two from the last), is returned by chomp and assigned to $num.

NOTE
For more information on the $/ system variable, refer to Day 17, "System Variables.

The crypt Function

The crypt function encrypts a string using the NBS Data Encryption Standard (DES) algorithm.

The syntax for the crypt function is

result = crypt (original, salt);

original is the string to be encrypted, and salt is a character string of two characters that defines how to change the DES algorithm (to make it more difficult to decode). These two characters can be any letter or digit, or one of the . and / characters. After the algorithm is changed, the string is encrypted using the resulting key.

result is the encrypted string. The first two characters of result are the two characters specified in salt.

You can use crypt to set up a password checker similar to those used by the UNIX login. Listing 14.2 is an example of a program that prompts the user for a password and compares it with a password stored in a special file.


Listing 14.2. A program that asks for and compares a password.
1:  #!/usr/local/bin/perl

2:  

3:  open (PASSWD, "/u/jqpublic/passwd") ||

4:          die ("Can't open password file");

5:  $passwd = <PASSWD>;

6:  chop ($passwd);

7:  close (PASSWD);

8:  print ("Enter the password for this program:\n");

9:  system ("stty -echo");

10: $mypasswd = <STDIN>;

11: system ("stty echo");

12: chop ($mypasswd);

13: if (crypt ($mypasswd, substr($passwd, 0, 2)) eq $passwd) {

14:         print ("Correct! Carry on!\n");

15: } else {

16:         die ("Incorrect password: goodbye!\n");

17: }


$ program14_2

Enter the password for this program:

bluejays

Correct! Carry on!

$

Note that the password you type is not displayed on the screen.

Lines 3-7 retrieve the correct password from the file /u/jqpublic/passwd. This password can be created by another call to crypt. For example, if the correct password is sludge, the call that creates the string now stored in $passwd could be the following, where $salt contains some two-character string:

$retval = crypt ("sludge", $salt);

After the correct password has been retrieved, the next step is line 8, which asks the user to type a password. By default, anything typed in at the keyboard is immediately displayed on the screen; this behavior is called input echoing. Input echoing is not desirable if a password is being typed in, because someone looking over the user's shoulder can read the password and break into the program.

To make the password-checking process more secure, line 9 calls the UNIX command stty -echo, which turns off input echoing; now the password is not displayed on the screen when the user types it. After the password has been entered, line 11 calls the UNIX command stty echo, which turns input echoing back on.

Line 13 calls crypt to check the password the user has entered. Because the first two characters of the actual encrypted password contain the two-character salt used in encryption, substr is used to retrieve these two characters and use them as the salt when encrypting the user's password. If the value returned by crypt is identical to the encrypted password, the user's password is correct; otherwise, the user has gotten it wrong, and die terminates the program. (A gentler password-checking program usually gives the user two or three chances to type a password before terminating the program.)

This password checker is secure because the actual password does not appear in the program in unencrypted form. (In fact, because the password is in a separate file, it does not appear in the program at all.) This makes it impossible to obtain the password by simply examining the text file.

NOTE
The behavior of crypt is identical to that of the UNIX library function crypt. See the crypt(3) manual page for more information on DES encryption

The hex Function

The hex function assumes that a character string is a number written in hexadecimal format, and it converts it into a decimal number (a number in standard base-10 format).

The syntax for the hex function is

decnum = hex (hexnum);

hexnum is the hexadecimal character string, and decnum is the resulting decimal number.

The following is an example:

$myhexstring = "1ff";

$num = hex ($myhexstring);

This call to hex assigns the decimal equivalent of 1ff to $num, which means that the value of $num is now 511. The value stored in $myhexstring is not changed.

The value passed to the string can contain either uppercase or lowercase letters (provided the letters are between a and f, inclusive). This value can be the result of an expression, as follows:

$num = hex ("f" x 2);

Here, the expression "f" x 2 is equivalent to ff, which is converted to 255 by hex.

NOTE
To convert a string from a decimal value to a hexadecimal value, use sprintf and specify either %x (hexadecimal integer) or %lx (long hexadecimal integer)


hex does not handle hexadecimal strings that start with the characters 0x or 0X. To handle these strings, either get rid of these characters using a statement such as
$myhexstring =~ s/^0[xX]//;
or call the oct function, which is described later in today's lesson

The int Function

The int function turns a floating-point number into an integer by getting rid of everything after the decimal point.

The syntax for the int function is

intnum = int (floatnum);

floatnum is the floating-point number, and intnum is the resulting integer.

The following is an example:

$floatnum = 45.6;

$intnum = int ($floatnum);

This call to int converts 45.6 to 45 and assigns it to $intnum. The value stored in $floatnum is not changed.

int can be used in expressions as well; for example:

$intval = int (68.3 / $divisor) + 1;

int does not round up when you convert from floating point to integer. To round up when you use int, add 0.5 first, as follows:
$intval = int ($mynum + 0.5);
Even then, you still might need to watch out for round-off errors. For example, if 4.5 is actually stored in the machine as, say, 4.499999999, adding 0.5 might still result in a number less than 5, which means that int will truncate it to 4

The oct Function

The oct function assumes that a character string is a number written in octal format, and it converts it into a decimal number (a number in standard base-10 format).

The syntax for the oct function is

decnum = oct (octnum);

octnum is the octal character string, and decnum is the resulting decimal number.

The following is an example:

$myoctstring = "177";

$num = oct ($myoctstring);

This call to oct assigns the decimal equivalent of 177 to $num, which means that the value of $num is now 127. The value stored in $myoctstring is not changed.

The value passed to oct can be the result of an expression, as shown in the following example:

$num = oct ("07" x 2);

Here, the expression "07" x 2 is equivalent to 0707, which is converted to 455 by oct.

NOTE
To convert a string from a decimal value to an octal value, use sprintf and specify either %o (octal integer) or %lo (long octal integer)

The oct Function and Hexadecimal Integers

The oct function also handles hexadecimal integers whose first two characters start with 0x or 0X:

$num = oct ("0xff");

This call treats 0xff as the hexadecimal number ff and converts it to 255. This feature of oct can be used to convert any non-standard Perl integer constant.

Listing 14.3 is a program that reads a line of input and checks whether it is a valid Perl integer constant. If it is, it converts it into a standard (base-10) integer.


Listing 14.3. A program that reads any kind of integer.
1:  #!/usr/local/bin/perl

2:  

3:  $integer = <STDIN>;

4:  chop ($integer);

5:  if ($integer !~ /^[0-9]+$|^0[xX][0-9a-fa-F]+$/) {

6:          die ("$integer is not a legal integer\n");

7:  }

8:  if ($integer =~ /^0/) {

9:          $integer = oct ($integer);

10: }

11: print ("$integer\n");


$ program14_3

077

63

$

The pattern in line 5 matches one of the following:

The first case matches any standard base-10 integer or octal integer (because octal integers start with 0 and consist of the numbers 0 to 7). The second case matches any legal hexadecimal integer. In both cases, the pattern matches only if there are no extraneous characters (blank spaces, or other words or numbers) on the line. Of course, it is easy to use the substitution operator to get rid of these first, if you like.

Line 8 tests whether the integer is either an octal or hexadecimal integer by searching for the pattern /^0/. If this pattern is found, oct converts the integer to decimal, placing the converted integer back in $integer. Note that line 8 does not need to determine which type of integer is contained in $integer because oct processes both octal and hexadecimal integers.

The ord and chr Functions

The ord and chr functions are similar to the Pascal function of the same name. ord converts a single character to its numeric ASCII equivalent, and chr converts a number to its ASCII character equivalent.

The syntax for the ord function is

asciival = ord (char);

char is the string whose first character is to be converted, and asciival is the resulting ASCII value.

For example, the following statement assigns the ASCII value for the / character, 47, to $ASCIIval:

$ASCIIval = ord("/");

If the value passed to ord is a character string that is longer than one character in length, ord converts the first character in the string:

$mystring = "/ignore the rest of this string";

$charval = ord ($mystring);

Here, the first character stored in $mystring, /, is converted and assigned to $charval.

The syntax for the chr function is

charval = chr (asciival);

asciival is the value to be converted, and charval is the one-character string representing the character equivalent of asciival in the ASCII character set.

For example, the following statement assigns / to $slash, because 47 is the numeric equivalent of / in the ASCII character set:

$slash = chr(47);

NOTE
The ASCII character set contains 256 characters. As a consequence, if the value passed to chr is greater than 256, only the bottom eight bits of the value are used.
This means, for example, that the following statements are equivalent:
$slash = chr(47);
$slash = chr(303);
$slash = chr(559);
In each case, the value of $slash is /


The chr function is defined only in Perl 5. If you are using Perl 4, you will need to call sprintf to convert a number to a character:
$slash = sprintf("%c", 47);
This assigns / to $slash

The scalar Function

In Perl, some functions or expressions behave differently when their results are assigned to arrays than they do when assigned to scalar variables. For example, the assignment

@var = @array;

copies the list stored in @array to the array variable @var, and the assignment

$var = @array;

determines the number of elements in the list stored in @array and assigns that number to the scalar variable $var.

As you can see, @array has two different meanings: an "array meaning" and a "scalar meaning." The Perl interpreter determines which meaning to use by examining the rest of the statement in which @array occurs. In the first case, the array meaning is intended, because the statement is assigning to an array variable. Statements in which the array meaning is intended are called array contexts.

In the second case, the scalar meaning of @array is intended, because the statement is assigning to a scalar variable. Statements in which the scalar meaning is intended are called scalar contexts.

The scalar function enables you to specify the scalar meaning in an array context.

The syntax for the scalar function is

value = scalar (list);

list is the list to be used in a scalar context, and value is the scalar meaning of the list.

For example, to create a list consisting of the length of an array, you can use the following statement:

@array = ("a", "b", "c");

@lengtharray = scalar (@array);

Here, the number of elements of @array, 3, is converted into a one-element list and assigned to @lengtharray.

Another useful place to use scalar is in conjunction with the <> operator. Recall that the statement

$myline = <MYFILE>;

reads one line from the input file MYFILE, and

@mylines = <MYFILE>;

reads all of MYFILE into the array variable @mylines. To read one line into the array variable @mylines (as a one-element list), use the following:

@mylines = scalar (<MYFILE>);

Specifying scalar with <MYFILE> ensures that only one line is read from MYFILE.

The pack Function

The pack function enables you to take a list or the contents of an array variable and convert (pack) it into a scalar value in a format that can be stored in actual machine memory or used in programming languages such as C.

The syntax for the pack function is

formatstr = pack(packformat, list);

Here, list is a list of values; this list of values can, as always, be the contents of an array variable. formatstr is the resulting string, which is in the format specified by packformat.

packformat consists of one or more pack-format characters; these characters determine how the list is to be packed. These pack formats are listed in Table 14.1.

Table 14.1. Format characters for the pack function.

Character
Description
a
ASCII character string padded with null characters
A
ASCII character string padded with spaces
b
String of bits, lowest first
B
String of bits, highest first
c
A signed character (range usually -128 to 127)
C
An unsigned character (usually 8 bits)
d
A double-precision floating-point number
f
A single-precision floating-point number
h
Hexadecimal string, lowest digit first
H
Hexadecimal string, highest digit first
i
A signed integer
I
An unsigned integer
l
A signed long integer
L
An unsigned long integer
n
A short integer in network order
N
A long integer in network order
p
A pointer to a string
s
A signed short integer
S
An unsigned short integer
u
Convert to uuencode format
v
A short integer in VAX (little-endian) order
V
A long integer in VAX order
x
A null byte
X
Indicates "go back one byte"
@
Fill with nulls (ASCII 0)

One pack-format character must be supplied for each element in the list. If you like, you can use spaces or tabs to separate pack-format characters, because pack ignores white space.

The following is a simple example that uses pack:

$integer = pack("i", 171);

This statement takes the number 171, converts it into the format used to store integers on your machine, and returns the converted integer in $integer. This converted integer can now be written out to a file or passed to a program using the system or exec functions.

To repeat a pack-format character multiple times, specify a positive integer after the character. The following is an example:

$twoints = pack("i2", 103, 241);

Here, the pack format i2 is equivalent to ii.

To use the same pack-format character for all of the remaining elements in the list, use * in place of an integer, as follows:

$manyints = pack("i*", 14, 26, 11, 83);

Specifying integers or * to repeat pack-format characters works for all formats except a, A, and @. With the a and A formats, the integer is assumed to be the length of the string to create.

$mystring = pack("a6", "test");

This creates a string of six characters (the four that are supplied, plus two null characters).

NOTE
The a and A formats always use exactly one element of the list, regardless of whether a positive integer is included following the character. For example:
$mystring = pack("a6", "test1", "test2");
Here, test1 is packed into a six-character string and assigned to $mystring. test2 is ignored.
To get around this problem, use the x operator to create multiple copies of the a pack-format character, as follows:
$strings = pack ("a6" x 2, "test1", "test2");
This packs test1 and test2 into two six-character strings (joined together)

The @ format is a special case. It is used only when a following integer is specified. This integer indicates the number of bytes the string must contain at this point; if the string is smaller, null characters are added. For example:

$output = pack("a @6 a", "test", "test2");

Here, the string test is converted to ASCII format. Because this string is only four characters long, and the pack format @6 specifies that the packed scalar value must be six characters long at this point, two null characters are added to the string before test2 is packed.

The pack Function and C Data Types

The most frequent use of pack is to create data that can be used by C programs. For example, to create a string terminated by a null character, use the following call to pack:

$Cstring = pack ("ax", $mystring);

Here, the a pack-format character converts $mystring into an ASCII string, and the x character appends a null character to the end of the string. This format-a string followed by null-is how C stores strings.

Table 14.2 shows the pack-format characters that have equivalent data types in C.

Table 14.2. Pack-format characters and their C equivalents.

Character
C equivalent
C
char
d
double
f
float
I
int
I
unsigned int (or unsigned)
l
long
L
unsigned long
s
short
S
unsigned short

In each case, pack stores the value in your local machine's internal format.

TIP
You usually won't need to use pack unless you are preparing data for use in other programs

The unpack Function

The unpack function reverses the operation performed by pack. It takes a value stored in machine format and converts it to a list of values understood by Perl.

The syntax for the unpack function is

list = unpack (packformat, formatstr);

Here, formatstr is the value in machine format, and list is the created list of values.

As in pack, packformat is a set of one or more pack format characters. These characters are basically the same as those understood by pack. Table 14.3 lists these characters.

Table 14.3. The pack-format characters, as used by unpack.

Character
Description
a
ASCII character string, unstripped
A
ASCII character string with trailing nulls and spaces stripped
b
String of bits, lowest first
B
String of bits, highest first
c
A signed character (range usually -128 to 127)
C
An unsigned character (usually 8 bits)
d
A double-precision floating-point number
f
A single-precision floating-point number
h
Hexadecimal string, lowest digit first
H
Hexadecimal string, highest digit first
I
A signed integer
I
An unsigned integer
l
A signed long integer
L
An unsigned long integer
n
A short integer in network order
N
A long integer in network order
p
A pointer to a string
s
A signed short integer
S
An unsigned short integer
u
Convert (uudecode) a uuencoded string
v
A short integer in VAX (little-endian) order
V
A long integer in VAX order
x
Skip forward a byte
X
Indicates "go back one byte"
@
Go to specified position

In almost all cases, a call to unpack undoes the effects of an equivalent call to pack. For example, consider Listing 14.4, which packs and unpacks a list of integers.


Listing 14.4. A program that demonstrates the relationship between pack and unpack.
1:  #!/usr/local/bin/perl

2:  

3:  @list_of_integers = (11, 26, 43);

4:  $mystring = pack("i*", @list_of_integers);

5:  @list_of_integers = unpack("i*", $mystring);

6:  print ("@list_of_integers\n");


$ program14_4

11 26 43

$

Line 4 calls pack, which takes all of the elements stored in @list_of_integers, converts them to the machine's integer format, and stores them in $mystring.

Line 5 calls unpack, which assumes that the string stored in $mystring is a list of values stored in the machine's integer format; it takes this string, converts each integer in the string to a Perl value, and stores the resulting list of values in @list_of_integers.

Unpacking Strings

The only unpack operations that do not exactly mirror pack operations are those specified by the a and A formats. The a format converts a machine-format string into a Perl value as is, whereas the A format converts a machine-format string into a Perl value and strips any trailing blanks or null characters.

The A format is useful if you want to convert a C string into the string format understood by Perl. The following is an example:

$perlstring = unpack("A", $Cstring);

Here, $Cstring is assumed to contain a character string stored in the format used by the C programming language (a sequence of bytes terminated by a null character). unpack strips the trailing null character from the string stored in $Cstring, and stores the resulting string in $perlstring.

Skipping Characters When Unpacking

The @ pack-format character tells unpack to skip to the position specified with the @. For example, the following statement skips four bytes in $packstring, and then unpacks a signed integer and stores it in $skipnum.

$skipnum = unpack("@4i", $packstring);

NOTE
If unpack is unpacking a single item, it can be stored in either an array variable or a scalar variable. If an array variable is used to store the result of the unpack operation, the resulting list consists of a single element

If an * character appears after the @ pack-format character, unpack skips to the end of the value being unpacked. This can be used in conjunction with the X pack-format character to unpack the right end of the packed value. For example, the following statement treats the last four bytes of a packed value as a long unsigned integer and unpacks them:

$longrightint = unpack("@* X4 L", $packstring);

In this example, the @* pack format specifier skips to the end of the value stored in $packstring. Then, the X4 specifier backs up four bytes. Finally, the L specifier treats the last four bytes as a long unsigned integer, which is unpacked and stored in $longrightint.

The number of bytes unpacked by the s, S, i, I, l, and L formats depends on your machine. Many UNIX machines store short integers in two bytes of memory, and integer and long integer values in four bytes. However, other machines might behave differently. In general, you cannot assume that programs that use pack and unpack will behave in the same way on different machines

The unpack Function and uuencode

The unpack function enables you to decode files that have been encoded by the uuencode encoding program. To do this, use the u pack-format specifier.

NOTE
uuencode, a coding mechanism available on most UNIX systems, converts all characters (including unprintable characters) into printable ASCII characters. This ensures that you can safely transmit files across remote networks

Listing 14.5 is an example of a program that uses unpack to decode a uuencoded file.


Listing 14.5. A program that decodes a uuencoded file.
1:  #!/usr/local/bin/perl

2:  

3:  open (CODEDFILE, "/u/janedoe/codefile") ||

4:          die ("Can't open input file");

5:  open (OUTFILE, ">outfile") ||

6:          die ("Can't open output file");

7:  while ($line = <CODEDFILE>) {

8:          $decoded = unpack("u", $line);

9:          print OUTFILE ($decoded);

10: }

11: close (OUTFILE);

12: close (CODEDFILE);


The file variable CODEDFILE represents the file that was previously encoded by uuencode. Lines 3 and 4 open the file (or die trying). Lines 5 and 6 open the output file, which is represented by the file variable OUTFILE.

Lines 7-10 read and write one line at a time. Line 7 starts off by reading a line of encoded input into the scalar variable $line. As with any other input file, the null string is returned if CODEDFILE is exhausted.

Line 8 calls unpack to decode the line. If the line is a special line created by uuencode (for example, the first line, which lists the filename and the size, or the last line, which marks the end of the file), unpack detects it and converts it into the null string. This means that the program does not need to contain special code to handle these lines.

Line 9 writes the decoded line to the output file represented by OUTFILE.

NOTE
You can use pack to uuencode lists of elements, as in the following:
@encoded = pack ("u", @decoded);
Here, the elements in @decoded are encoded and stored in the array variable @encoded. The list in @encoded can then be decoded using unpack, as follows:
@decoded = unpack ("u", @encoded);
Although pack uses the same uuencode algorithm as the UNIX uuencode utility, you cannot use the UNIX uudecode program on data encoded using pack because pack does not supply the header and footer (beginning and ending) lines expected by uudecode.
If you really need to use uudecode with a file created by writing out the output from pack, you'll need to write out the header and footer files as well. (See the UNIX manual page for uuencode for more details.

The vec Function

The vec function enables you to treat a scalar value as a collection of chunks, with each chunk consisting of a specified number of bits; this collection is known as a vector. Each call to vec accesses a particular chunk of bits in the vector (known as a bit vector).

The syntax for the vec function is

retval = vec (vector, index, bits);

vector is the scalar value that is to be treated as a vector. It can be any scalar value, including the value of an expression.

index behaves like an array subscript. It indicates which chunk of bits to retrieve. An index of 0 retrieves the first chunk, 1 retrieves the second, and so on. Note that retrieval is from right to left. The first chunk of bits retrieved when the index 0 is specified is the chunk of bits at the right end of the vector.

bits specifies the number of bits in each chunk; it can be 1, 2, 4, 8, 16, or 32.

retval is the value of the chunk of bits. This value is an ordinary Perl scalar value, and it can be used anywhere scalar values can be used.

Listing 14.6 shows how you can use vec to retrieve the value of a particular chunk of bits.


Listing 14.6. A program that illustrates the use of vec.

1:  #!/usr/local/bin/perl

2:  

3:  $vector = pack ("B*", "11010011");

4:  $val1 = vec ($vector, 0, 4);

5:  $val2 = vec ($vector, 1, 4);

6:  print ("high-to-low order values: $val1 and $val2\n");

7:  $vector = pack ("b*", "11010011");

8:  $val1 = vec ($vector, 0, 4);

9:  $val2 = vec ($vector, 1, 4);

10: print ("low-to-high order values: $val1 and $val2\n");


$ program14_6

high-to-low order values: 3 and 13

low-to-high order values: 11 and 12

$

The call to pack in line 3 assumes that each character in the string 11010011 is a bit to be packed. The bits are packed in high-to-low order (with the highest bit first), which means that the vector stored in $vector consists of the bits 11010011 (from left to right). Grouping these bits into chunks of four produces 1101 0011, which are the binary representations of 13 and 3, respectively.

Line 4 retrieves the first chunk of four bits from $vector and assigns it to $val1. This is the chunk 0011, because vec is retrieving the chunk of bits at the right end of the bit vector. Similarly, line 5 retrieves 1101, because the index 1 specifies the second chunk of bits from the right; this chunk is assigned to $val2. (One way to think of the index is as "the number of chunks to skip." The index 1 indicates that one chunk of bits is to be skipped.)

Line 7 is similar to line 3, but the bits are now stored in low-to-high order, not high-to-low. This means that the string 11010011 is stored as the following (which is 11010011 reversed):

11001011

When this bit vector is grouped into chunks of 4 bits, you get the following, which are the binary representations of 12 and 11, respectively:

1100 1011

Lines 8 and 9, like lines 4 and 5, retrieve the first and second chunk of bits from $vector. This means that $val1 is assigned 11 (the first chunk), and $val2 is assigned 12 (the second chunk).

NOTE
You can use vec to assign to a chunk of bits by placing the call to vec to the left of an assignment operator. For example:
vec ($vector, 0, 4) = 11;
This statement assigns 11 to the first chunk of bits in $vector. Because the binary representation of 11 is 1011, the last four bits of $vector become 1011

The defined Function

By default, all scalar variables and elements of array variables that have not been assigned to are assumed to contain the null string. This ensures that Perl programs don't crash when using uninitialized scalar variables.

In some cases, a program might need to know whether a particular scalar variable or array element has been assigned to or not. The built-in function defined enables you to check for this.

The syntax for the defined function is

retval = defined (expr);

Here, expr is anything that can appear on the left of an assignment statement, such as a scalar variable, array element, or an entire array. (An array is assumed to be defined if at least one of its elements is defined.) retval is true (a nonzero value) if expr is defined, and false (0) if it is not.

Listing 14.7 is a simple example of a program that uses defined.


Listing 14.7. A program that illustrates the use of defined.
1:  #!/usr/local/bin/perl

2:  

3:  $array[2] = 14;

4:  $array[4] = "hello";

5:  for ($i = 0; $i <= 5; $i++) {

6:          if (defined ($array[$i])) {

7:                  print ("element ", $i+1, " is defined\n");

8:          }

9:  }


$ program14_7

element 3 is defined

element 5 is defined

$

This program assigns values to two elements of the array variable @array: the element with subscript 2 (the third element), and the element with subscript 4 (the fifth element).

The loop in lines 5-9 checks each element of @array to see whether it is defined. Because the third and fifth elements-$array[2] and $array[4], respectively-are defined, defined returns true when $i is 2 and when $i is 4.

NOTE
Many functions that return the null string actually return a special "undefined" value that is treated as if it is the null string. If this undefined value is passed to defined, defined returns false.
Functions that return undefined include the read function (discussed on Day 12, "Working with the File System") and fork (introduced on Day 13, "Process, String, and Mathematical Functions"). Many functions discussed today and on Day 15, "System Functions," also return the special undefined value when an error occurs.
The general rule is: A function that returns the null string when an error or exceptional condition occurs is usually really returning the undefined value

The undef Function

The undef function undefines a scalar variable, array element, or an entire array.

The syntax of the undef function is

retval = undef (expr);

As in calls to defined, expr can be anything that can appear to the left of a Perl assignment statement. retval is always the special undefined value discussed in the previous section, "The defined Function"; this undefined value is equivalent to the null string.

The following are some examples of undef:

undef ($myvar);

undef ($array[3]);

undef (@array);

In the first case, the scalar variable $myvar becomes undefined. The Perl interpreter now treats $myvar as if it has never been assigned to. Needless to say, any value previously stored in $myvar is now lost.

In the second example, the fourth element of @array is marked as undefined. Its value, if any, is lost. Other elements of @array are unaffected.

In the third and final example, all the elements of @array are marked as undefined. This lets the Perl interpreter free up any memory used to store the values of @array, which might be useful if your program is working with large arrays. For example, if you have used an array to read in an entire file, as in the following:

@bigarray = <STDIN>;

you can use the following statement to tell the Perl interpreter that you don't need the contents of the input file and that the interpreter can throw them away:

undef (@bigarray);

Calls to undef can omit expr. In this case, undef does nothing and just returns the undefined value. Listing 14.8 shows how this can be useful.


Listing 14.8. A program that illustrates the use of undef to represent an unusual condition.
1:  #!/usr/local/bin/perl

2:  

3:  print ("Enter the number to divide:\n");

4:  $value1 = <STDIN>;

5:  chop ($value1);

6:  print ("Enter the number to divide by:\n");

7:  $value2 = <STDIN>;

8:  chop ($value2);

9:  $result = &safe_division($value1, $value2);

10: if (defined($result)) {

11:         print ("The result is $result.\n");

12: } else {

13:         print ("Can't divide by zero.\n");

14: }

15: 

16: sub safe_division {

17:         local ($dividend, $divisor) = @_;

18:         local ($result);

19: 

20:         $result = ($divisor == 0) ? undef :

21:                 $dividend / $divisor;

22: }


$ program14_8

Enter the number to divide:

26

Enter the number to divide by:

0

Can't divide by zero.

$

Lines 20 and 21 illustrate how you can use undef. If $divisor is 0, the program is attempting to divide by 0. In this case, the subroutine safe_division calls undef, which returns the special undefined value. This value is assigned to $result and passed back to the main part of the program.

Line 10 tests whether safe_division has returned the undefined value by the calling defined function. If defined returns false, $result contains the undefined value, and an attempted division by 0 has been detected.

NOTE
You can use undef to undefine an entire subroutine, if you like. The following example:
undef (&mysub);
frees the memory used to store mysub; after this, mysub can no longer be called.
You are not likely to need to use this feature of undef, but it might prove useful in programs that consume a lot of memory

Array and List Functions

The following functions manipulate standard array variables and the lists that they store:

The grep Function

The grep function provides a convenient way of extracting the elements of a list that match a specified pattern. (It is named after the UNIX search utility of the same name.)

The syntax for the grep function is

foundlist = grep (pattern, searchlist);

pattern is the pattern to search for. searchlist is the list of elements to search in. foundlist is the list of elements matched.

Here is an example:

@list = ("This", "is", "a", "test");

@foundlist = grep(/^[tT]/, @list);

Here, grep examines all the elements of the list stored in @list. If a list element contains the letter t (in either uppercase or lowercase), the element is included as part of @foundlist. As a result, @foundlist consists of two elements: This and test.

Listing 14.9 is an example of a program that uses grep. It searches for all integers on an input line and adds them together.


Listing 14.9. A program that demonstrates the use of grep.
1:  #!/usr/local/bin/perl

2:  

3:  $total = 0;

4:  $line = <STDIN>;

5:  @words = split(/\s+/, $line);

6:  @numbers = grep(/^\d+[.,;:]?$/, @words);

7:  foreach $number (@numbers) {

8:          $total += $number;

9:  }

10: print ("The total is $total.\n");


$ program14_9

This line of input contains 8, 11 and 26.

The total is 45.

$

Line 5 splits the input line into words, using the standard pattern /\s+/, which matches one or more tabs or blanks. Some of these words are actually numbers, and some are not.

Line 6 uses grep to match the words that are actually numbers. The pattern /^\d+[.,;:]?$/ matches if a word consists of one or more digits followed by an optional punctuation character. The words that match this pattern are returned by grep and stored in @numbers. After line 6 has been executed, @numbers contains the following list:

("8,", "11", "26.")

Lines 7-9 use a foreach loop to total the numbers. Note that the totaling operation works properly even if a number being added contains a closing punctuation character: when the Perl interpreter converts a string to an integer, it reads from left to right until it sees a character that is not a digit. This means that the final word, 26., is converted to 26, which is the expected number.

Because split and grep each return a list and foreach expects a list, you can combine lines 5-9 into a single loop if you want to get fancy.

foreach $number (grep (/^\d+[.,;:]?$/, split(/\s+/, $line))) {

        $total += $number;

}

As always, there is a trade-off of speed versus readability: this code is more concise, but the code in Listing 14.9 is more readable.

Using grep with the File-Test Operators

A useful feature of grep is that it can be used to search for any expression, not just patterns. For example, grep can be used in conjunction with readdir and the file-test operators to search a directory.

Listing 14.10 is an example of a program that searches all the readable files of the current directory for a particular word (which is supplied on the command line). Files whose names begin with a period are ignored.


Listing 14.10. A program that uses grep with the file-test operators.
1:  #!/usr/local/bin/perl

2:  

3:  opendir(CURRDIR, ".") ||

4:          die("Can't open current directory");

5:  @filelist = grep (!/^\./, grep(-r, readdir(CURRDIR)));

6:  closedir(CURRDIR);

7:  foreach $file (@filelist) {

8:          open (CURRFILE, $file) ||

9:                  die ("Can't open input file $file");

10:         while ($line = <CURRFILE>) {

11:                 if ($line =~ /$ARGV[0]/) {

12:                         print ("$file:$line");

13:                 }

14:         }

15:         close (CURRFILE);

16: }


$ program14_10 pattern

file1:This line of this file contains the word "pattern".

myfile:This file also contains abcpatterndef.

$

Line 3 of this program opens the current directory. If it cannot be opened, line 4 calls die, which terminates the program.

Line 5 is actually three function calls in one, as follows:

  1. readdir retrieves a list of all of the files in the directory.
  2. This list of files is passed to grep, which uses the -r file test operator to search for all files that the user has permission to read.
  3. This list of readable files is passed to another call to grep, which uses the expression !/^\./ to match all the files whose names do not begin with a period.

The resulting list-all the files in the current directory that are readable and whose names do not start with a period-is assigned to @filelist.

The rest of the program contains nothing new. Line 6 closes the open directory, and lines
7-16 read each file in turn, searching for the word specified on the command line. (Recall that the built-in array @ARGV lists all the arguments supplied on the command line and that the first word specified on the command line is stored in $ARGV[0].) Line 11 prints any lines containing the word to search for, using the format employed by the UNIX grep command (the filename, followed by :, followed by the line itself).

The splice Function

The splice function enables you to modify the list stored in an array variable. By passing the appropriate arguments to splice, you can add elements to the middle of a list, delete a portion of a list, or replace a portion of a list.

The syntax for the splice function is

retval = splice (array, skipelements, length, newlist)

array is the array variable containing the list to be spliced. skipelements is the number of elements to skip before splicing. length is the number of elements to be replaced. newlist is the list to be spliced in; this list can be stored in an array variable or specified explicitly.

If length is greater than 0, retval is the list of elements replaced by splice.

The following sections provide examples of what you can do with splice.

Replacing List Elements

You can use splice to replace a sublist (a set of elements in a list) with another sublist. The following is an example:

@array = ("1", "2", "3", "4");

splice (@array, 1, 2, ("two", "three"));

This call to splice takes the list stored in @array, skips over the first element, and replaces the next two elements with the list ("two", "three"). The new value of @array is the list

("1", "two", "three", "4")

If the replacement list is longer than the original list, the elements to the right of the replaced list are pushed to the right. For example:

@array = ("1", "2", "3", "4");

splice (@array, 1, 2, ("two", "2.5", "three"));

After this call, the new value of @array is the following:

("1", "two", "2.5", "three", "4")

Similarly, if the replacement list is shorter than the original list, the elements to the right of the original list are moved left to fill the resulting gap. For example:

@array = ("1", "2", "3", "4");

splice (@array, 1, 2, "twothree");

After this call to splice, @array contains the following list:

("1", "twothree", "4")

NOTE
You do not need to put parentheses around the list you pass to splice. For example, the following two statements are equivalent:
splice (@array, 1, 2, ("two", "three"));
splice (@array, 1, 2, "two", "three")
When the Perl interpreter sees the second form of splice, it assumes that the fourth and subsequent arguments are the replacement list.

Listing 14.11 is an example of a program that uses splice to replace list elements. It reads a file containing a form letter, and replaces the string <name> with a name read from the standard input file. It then writes out the new letter.

The output shown assumes that the file form contains

Hello <name>!

This is your lucky day, <name>!


Listing 14.11. A program that uses splice to replace list elements.
1:  #!/usr/local/bin/perl

2:  

3:  open (FORM, "form") || die ("Can't open form letter");

4:  @form = <FORM>;

5:  close (FORM);

6:  $name = <STDIN>;

7:  @nameparts = split(/\s+/, $name);

8:  foreach $line (@form) {

9:          @words = split(/\s+/, $line);

10:         $i = 0;

11:         while (1) {

12:                 last if (!defined($words[$i]));

13:                 if ($words[$i] eq "<name>") {

14:                         splice (@words, $i, 1, @nameparts);

15:                         $i += @nameparts;

16:                 } elsif ($words[$i] =~ /^<name>/) {

17:                         $punc = $words[$i];

18:                         $punc =~ s/<name>//;

19:                         @temp = @nameparts;

20:                         $temp[@temp-1] .= $punc;

21:                         splice (@words, $i, 1, @temp);

22:                         $i += @temp;

23:                 } else {

24:                         $i++;

25:                 }

26:         }

27:         $line = join (" ", @words);

28: }

29: $i = 0;

30: while (1) {

31:         if (!defined ($form[$i])) {

32:                 $~ = "FLUSH";

33:                 write;

34:                 last;

35:         }

36:         if ($form[$i] =~ /^\s*$/) {

37:                 $~ = "FLUSH";

38:                 write;

39:                 $~ = "BLANK";

40:                 write;

41:                 $i++;

42:                 next;

43:         }

44:         if ($writeline ne "" &&

45:                 $writeline !~ / $/) {

46:                 $writeline .= " ";

47:         }

48:         $writeline .= $form[$i];

49:         if (length ($writeline) < 60) {

50:                 $i++;

51:                 next;

52:         }

53:         $~ = "WRITELINE";

54:         write;

55:         $i++;

56: }

57: format WRITELINE =

58: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~

59: $writeline

60: .

61: format FLUSH =

62: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~

63: $writeline

64: .

65: format BLANK =

66: 

67: .


$ program14_11

Fred

Hello Fred! This is your lucky day, Fred!

$

This program starts off by reading the entire form letter from the file named form into the array variable @form. This makes it possible to format the form letter output later on.

Lines 6 and 7 read the name from the standard input file and break into individual words. This list of words is stored in the array variable @nameparts.

The loop in lines 8-28 reads each line in the form letter and looks for occurrences of the string <name>. First, line 9 breaks the line into individual words. This list of words is stored in the array variable @words.

The while loop starting in line 11 then examines each word of @words in turn. Line 12 checks whether the loop has reached the end of the list by calling defined; if the loop is past the end of the list, defined will return false, indicating that the array element is not defined.

Lines 13-15 check whether a word consists entirely of the string <name>. If it does, line 14 calls splice; this call replaces the word <name> with the words in the name list @nameparts.

If a word is not equal to the string <name>, it might still contain <name> followed by a punctuation character. To test for this, line 16 tries to match the pattern /^<name>/. If it matches, lines 17 and 18 isolate the punctuation in a single word. This punctuation is stored in the scalar variable $punc.

Lines 19 and 20 create a copy of the name array @nameparts and append the punctuation to the last element of the array. This ensures that the punctuation will appear in the form letter where it is supposed to-right after the last character of the substituted name. Line 21 then calls splice as in line 14.

After the words in @words have been searched and the name substituted for <name>, line 27 joins the words back into a single line. As an additional benefit, the multiple spaces and tabs in the original line have now been replaced by a single space, which will make the eventual formatted output look nicer.

Lines 30-56 write out the output. The string to be written is stored in the scalar variable $writeline. The program ensures that the form-letter output is formatted by doing the following:

  1. First, the print format WRITELINE is defined to use the ^<<<< value-field format. This format fits as much of the contents of $writeline into the line as possible and then deletes the part of $writeline that has been written out.
  2. Lines 36-43 enable you to add paragraphs to your form letter. Line 36 tests whether an input line is blank. If it is, the FLUSH print format is used to write out any output from previous lines that has not yet been printed. (Because the output line specified by FLUSH starts with ~~, the line is printed only if it is not blank-in other words, if $writeline actually contains some leftover text.) Then, the BLANK print format writes a blank line.
  3. Lines 44-47 check whether a space needs to be placed between the end of one input line and the beginning of the next when formatting.
  4. Lines 49-52 ensure that $writeline is always long enough to fill the value field specified by WRITELINE. This guarantees that there will be no unnecessary space in any of the output lines.
  5. When @form has been completely read, lines 32-34 ensure that all of the output from previous lines has been written by using the FLUSH print format.

(For more information on the print formats used in this example, refer to Day 11, "Formatting Your Output.")

NOTE
You can use splice to splice the contents of a scalar variable into an array. For example:
splice (@array, 8, 1, $name);
This creates a one-element list consisting of the contents of $name and adds it to the list stored in @array (as the eighth element)

Appending List Elements

You can use splice to add a sublist anywhere in a list. To do this, specify a length field of 0. For example:

splice (@array, 5, 0, "Hello", "there");

This call to splice adds the list ("Hello", "there") to the list stored in @array. Hello becomes the new sixth element of $list, and there becomes the new seventh element; the existing sixth and seventh elements, if they exist, become the new eighth and ninth elements, and every other element is also pushed to the right.

To add a new element to the end of an existing array, specify a skipelements value of -1, as shown in the following:

splice (@array, -1, 0, "Hello");

This adds Hello as the last element of the list stored in @array.

Listing 14.12 is an example of a program that uses splice to insert an element into a list. This program inserts a word count after every tenth word in a file.


Listing 14.12. A program that uses splice to insert array elements.
1:  #!/usr/local/bin/perl

2:  

3:  $count = 0;

4:  while ($line = <STDIN>) {

5:          chop ($line);

6:          @words = split(/\s+/, $line);

7:          $added = 0;

8:          for ($i = 0; $i+$added < @words; $i++) {

9:                  if ($count > 0 && ($count + $i) % 10 == 0) {

10:                         splice (@words, $i+$added, 0,

11:                                 $count + $i);

12:                         $added += 1;

13:                 }

14:         }

15:         $count += @words - $added;

16:         $line = join (" ", @words);

17:         print ("$line\n");

18: }


$ program14_12

Here is a line with some words on it.

Here are some more test words to count.

A B C D E F G H I J K L M N O P

^D

Here is a line with some words on it.

Here 10 are some more test words to count.

A B C 20 D E F G H I J K L M 30 N O P

$

This program, like many of the others you have seen, reads one line at a time and breaks the line into words; the array variable @words contains the list of words for a particular line.

The scalar variable $count contains the number of words in the lines previously read. Lines 8 through 14 read each word in the current input line in turn; at any given point, the counting variable $i lists the number of words read in the line, and the sum of $count and $i lists the total number of words read in all input lines.

Line 9 adds the value stored in $count to the value stored in $i; if this value, the current word number, is a multiple of ten, lines 10 and 11 call splice and insert the current word number into the list. As a result, every tenth word is followed by its word number.

The scalar variable $added counts the number of elements added to the list; this ensures that the word numbers added by lines 10 and 11 are not included as part of the word count.

After the word numbers have been inserted into the list, line 16 rebuilds the input line by joining the elements of @words; this new input line includes the word numbers. Line 17 then prints the rebuilt line.

Deleting List Elements

You can use splice to delete list elements without replacing them. To do this, call splice and omit the newlist argument. For example:

@deleted = splice (@array, 8, 2);

This call to splice deletes the ninth and tenth elements of the list stored in @array. If @array contains subsequent elements, these elements are shifted left to fill the gap. The list of deleted elements is returned and stored in @deleted.

Listing 14.13 reads an input file, uses splice to delete all words greater than five characters long, and writes out the result.


Listing 14.13. A program that uses splice to delete words.
1:  #!/usr/local/bin/perl

2:  

3:  while ($line = <STDIN>) {

4:          @words = split(/\s+/, $line);

5:          $i = 0;

6:          while (defined($words[$i])) {

7:                  if (length($words[$i]) > 5) {

8:                          splice(@words, $i, 1);

9:                  } else {

10:                         $i++;

11:                 }

12:         }

13:         $line = join (" ", @words);

14:         print ("$line\n");

15: }


$ program14_13

this is a test of the program which removes long words

^D

this is a test of the which long words

$

This program reads one line of input at a time and breaks each input line into words. Line 7 calls length to determine the length of a particular word. If the word is greater than five characters in length, line 8 calls splice to remove the word from the list.

NOTE
You also can omit the length argument when you call splice. If you do, splice deletes everything after the element specified by skipelements:
splice (@array, 7);
This deletes the seventh and all subsequent elements of the list stored in @array.
To delete the last element of a list, specify -1 as the skipelements argument.
splice (@array, -1);
In all cases, splice returns the list of deleted elements

The shift Function

One list operation that is frequently needed in a program is to remove an element from the front of a list. Because this operation is often performed, Perl provides a special function, shift, that handles it.

shift removes the first element of the list and moves (or "shifts") every remaining element of the list to the left to cover the gap. shift then returns the removed element.

The syntax for the shift function is

element = shift (arrayvar);

shift is passed one argument: an array variable that contains a list. element is the returned element.

NOTE
shift returns the undefined value (equivalent to the null string) if the list is empty

Here is a simple example using shift:

@mylist = ("1", "2", "3");

$firstval = shift(@mylist);

This call to shift removes the first element, 1, from the list stored in @mylist. This element is assigned to $firstval. @mylist now contains the list ("2", "3").

If you do not specify an array variable when you call shift, the Perl interpreter assumes that shift is to remove the first element from the system array variable @ARGV. This variable lists the arguments supplied on the command line when the program is started up. For example, if you call a Perl program named foo with the following command:

foo arg1 arg2 arg3

@ARGV contains the list ("arg1", "arg2", "arg3").

This default feature of shift makes it handy for processing command-line arguments. Listing 14.14 is a simple program that prints out its arguments.


Listing 14.14. A program that uses shift to process the command-line arguments.
1:  #!/usr/local/bin/perl

2:  

3:  while (1) {

4:          $currarg = shift;

5:          last if (!defined($currarg));

6:          print ("$currarg\n");

7:  }


$ program14_14 arg1 arg2 arg3

arg1

arg2

arg3

$

When this program is called, the array variable @ARGV contains a list of the values supplied as arguments to the program. Line 4 calls shift to remove the first argument from the list and assign it to $currarg.

If there are no elements (or none remaining), shift returns the undefined value, and the call to defined in line 5 returns false. This ensures that the loop terminates when there are no more arguments to read.

NOTE
The shift function is equivalent to the following call to splice:
splice (@array, 0, 1)

The unshift Function

To undo the effect of a shift function, call unshift.

The syntax for the unshift function is

count = unshift (arrayvar, elements);

arrayvar is the list (usually stored in an array variable) to add to, and elements is the element or list of elements to add. count is the number of elements in the resulting list.

The following is an example of a call to unshift:

unshift (@array, "newitem");

This adds the element newitem to the front of the list stored in @array. The other elements of the list are moved to the right to accommodate the new item.

You can use unshift to add more than one element to the front of an array. For example:

unshift (@array, @sublist1, "newitem", @sublist2);

This adds a list consisting of the list stored in @sublist1, the element newitem, and the list stored in @sublist2 to the front of the list stored in @array.

unshift returns the number of elements in the new list, as shown in the following:

@array = (1, 2, 3);

$num = unshift (@array, "newitem");

This assigns 4 to $num.

NOTE
The unshift function is equivalent to calling splice with a skipelements value of 0 and a length value of 0. For example, the following statements are equivalent:
unshift (@array, "item1", "item2");
splice (@array, 0, 0, "item1", "item2")

The push Function

As you have seen, the unshift function adds an element to the front of a list. To add an element to the end of a list, call the push function.

The syntax for the push function is

push (arrayvar, elements);

arrayvar is the list (usually stored in an array variable) to add to, and elements is the element or list of elements to add.

The following is an example that uses push:

push (@array, "newitem");

This adds the element newitem to the end of the list.

The end of the list is always assumed to be the last defined element. For example, consider the following statements:

@array = ("one", "two");

$array[3] = "four";

push (@array, "five");

Here, the first statement creates a two-element list and assigns it to @array. The second statement assigns four to the fourth element of @array. Because the fourth element is now the last element of @array, the call to push creates a fifth element, even though the third element is undefined. @array now contains the list

("one", "two", "", "four", "five");

The undefined third element is, as always, equivalent to the null string.

As with unshift, you can use push to add multiple elements to the end of a list, as in this example:

push (@array, @sublist1, "newitem", @sublist2);

Here, the list consisting of the contents of @sublist1, the element newitem, and the contents of @sublist2 is added to the end of the list stored in @array.

NOTE
push is equivalent to a call to splice with the skiparguments argument set to the length of the array. This means that the following statements are equivalent:
push (@array, "newitem");
splice (@array, @array, 0, "newitem")

The pop Function

The pop function undoes the effect of push. It removes the last element from the end of a list. The removed element is returned.

The syntax for the pop function is

element = pop (arrayvar);

arrayvar is the array element from which an element is to be removed. element is the returned element.

For example, the following statement removes the last element from the list stored in @array and assigns it to the scalar variable $popped:

$popped = pop (@array);

If the list passed to pop is empty, pop returns the undefined value.

NOTE
pop is equivalent to a call to splice with a skipelements value of -1 (indicating the last element of the array). This means that the following statements behave in the same way:
$popped = pop (@array);
$popped = splice (@array, -1)

Creating Stacks and Queues

The functions you have just seen are handy for constructing two commonly used data structures: stacks and queues. The following sections provide examples that use a stack and a queue.

Creating a Stack

A stack is a data structure that behaves like a stack of plates in a cupboard: the last item added to the stack is always the first item removed. Data items that are added to the stack are said to be pushed onto the stack; items which are removed from the stack are popped off the stack.

As you might have guessed, the functions push and pop enable you to create a stack in a Perl program. Listing 14.15 is an example of a program that uses a stack to perform arithmetic operations. It works as follows:

  1. Two numbers are pushed onto the stack.
  2. The program reads an arithmetic operator, such as + or -. The two numbers are popped off the stack, and the operation is performed.
  3. The result of the operation is pushed onto the stack, enabling it to be used in further arithmetic operations.

After all the arithmetic operations have been performed, the stack should consist of a single element, which is the final result.

The numbers and operators are read from the standard input file.

Note that Listing 14.15 is the "inverse" of Listing 9.12. In the latter program, the arithmetic operators appear first, followed by the values.


Listing 14.15. A program that uses a stack to perform arithmetic.
1:  #!/usr/local/bin/perl

2:  

3:  while (defined ($value = &read_value)) {

4:          if ($value =~ /^\d+$/) {

5:                  push (@stack, $value);

6:          } else {

7:                  $firstpop = pop (@stack);

8:                  $secondpop = pop (@stack);

9:                  push (@stack,

10:                    &do_math ($firstpop, $secondpop, $value));

11:         }

12: }

13: $result = pop (@stack);

14: if (defined ($result)) {

15:         print ("The result is $result.\n");

16: } else {

17:         die ("Stack empty when printing result.\n");

18: }

19: 

20: sub read_value {

21:         local ($retval);

22:         $input =~ s/^\s+//;

23:         while ($input eq "") {

24:                 $input = <STDIN>;

25:                 return if ($input eq "");

26:                 $input =~ s/^\s+//;

27:         }

28:         $input =~ s/^\S+//;

29:         $retval = $&;

30: }

31: 

32: sub do_math {

33:         local ($val2, $val1, $operator) = @_;

34:         local ($result);

35: 

36:         if (!defined($val1) || !defined($val2)) {

37:                 die ("Missing operand");

38:         }

39:         if ($operator =~ m.^[+-/*]$. ) {

40:                 eval ("\$result = \$val2 $operator \$val1");

41:         } else {

42:                 die ("$operator is not an operator");

43:         }

44:         $result;  # ensure the proper return value

45: }


$ program14_15

11 4 + 26 -

^D

The result is 11.

$

Before going into details, let's first take a look at how the program produces the final result, which is 11:

  1. The program starts off by reading the numbers 11 and 4 and pushing them onto the stack. If the stack is listed from the top down, it now looks like this:
    4
    11
    Another way to look at the stack is this: At present, the list stored in @stack is (11, 4).
  2. The program then reads the + operator, pops the 4 and 11 off the stack, and performs the addition, pushing the result onto the stack. The stack now contains a single value:
    15
  3. The next value, 26, is pushed onto the stack, which now looks like this:
    26
    15
  4. The program then reads the - operator, pops 15 and 26 off the stack, and subtracts 15 from 26. The result, 11, is pushed onto the stack.
  5. Because there are no more operations to perform, 11 becomes the final result.

This program delegates to the subroutine read_value the task of reading values and operators. This subroutine reads a line of the standard input file and extracts the non-blank items on the line. Each call to read_value extracts one item from an input line; when an input line is exhausted, read_value reads the next one. When the input file is exhausted and there are no more items to return, $input becomes the undefined value, which is equivalent to the null string; the call to defined in line 3 tests for this condition.

If an item returned by read_value is a number, line 5 calls push, which pushes the number onto the stack. If an item is not a number, the program assumes it is an operator. At this point, pop is called twice to remove the last two numbers from the stack, and do_math is called to perform the arithmetic operation.

The do_math subroutine uses a couple of tricks. First, defined is called to see whether there are, in fact, two numbers to add. If one or both of the numbers does not exist, the program terminates.

Next, the subroutine uses the pattern m.^[+-*/]$. to check whether the character string stored in $operator is, in fact, a legal arithmetic operator. (Recall that you can use a pattern delimiter other than / by specifying m followed by the character you want to use as the delimiter. In this case, the period character is the pattern delimiter.)

Finally, the subroutine calls eval to perform the arithmetic operation. eval replaces the name $operator with its current value, and then treats the resulting character string as an executable statement; this performs the arithmetic operation specified by $operator. Using eval here saves space; the only alternative is to use a complicated if-elseif structure.

The result of the operation is returned in $result. Lines 9 and 10 then pass this value to push, which pushes the result onto the stack. This enables you to use the result in subsequent operations.

When the last arithmetic operation has been performed, the final result is stored as the top element of the stack. Line 13 pops this element, and line 15 prints it.

Note that this program always assumes that the last element pushed onto the stack is to be on the left of the arithmetic operation. To reverse this, all you need to do is change the order of $val1 and $val2 in line 33. (Some programs that manipulate stacks also provide an operation which reverses the order of the top two elements of a stack.)

The pop function returns the undefined value if the stack is empty. Because the undefined value is equivalent to the null string, and the null string is treated as 0 in arithmetic operations, your program will not complain if you try to pop a number from an empty stack.
To ensure that you get the result you want, always call defined after you call pop to ensure that a value has actually been popped from the stack

Creating a Queue

A queue is a data structure that processes data in the order in which it is entered; such data structures are known as first-in, first-out (or FIFO) structures. (A stack, on the other hand, is an example of a last-in, first-out, or LIFO, structure.)

To create a queue, use the function push to add items to the queue, and call shift to remove elements from it. Because push adds to the right of the list and shift removes from the left, elements are processed in the order in which they appear.

Listing 14.16 is an example of a program that uses a queue to add a set of numbers retrieved via a pipe. Each input line can consist of more than one number, and the numbers are added in the order listed.

The input/output example shown for this listing assumes that the numbers retrieved via the pipe are 11, 12, and 13.


Listing 14.16. A program that illustrates the use of a queue.
1:  #!/usr/local/bin/perl

2:  

3:  open (PIPE, "numbers|") ||

4:          die ("Can't open pipe");

5:  $result = 0;

6:  while (defined ($value = &readnum)) {

7:          $result += $value;

8:  }

9:  print ("The result is $result.\n");

10: 

11: sub readnum {

12:         local ($line, @numbers, $retval);

13:         while ($queue[0] eq "") {

14:                 $line = <PIPE>;

15:                 last if ($line eq "");

16:                 $line =~ s/^\s+//;

17:                 @numbers = split (/\s+/, $line);

18:                 push (@queue, @numbers);

19:         }

20:         $retval = shift(@queue);

21: }


$ program14_16

The result is 36.

$

This program assumes that a program named numbers exists, and that its out-put is a stream of numbers. Multiple numbers can appear on a single line of this output. Lines 3 and 4 associate the file variable PIPE with the output from the numbers command.

Lines 6-8 call the subroutine readnum to obtain a number and then add it to the result stored in $result. This subroutine reads input from the pipe, breaks it into individual numbers, and then calls push to add the numbers to the queue stored in @queue. Line 20 then calls shift to retrieve the first element in the queue, which is returned to the main program.

If an input line is blank, the call to split in line 17 produces the empty list, which means that nothing is added to @queue. This ensures that input is read from the pipe until a non-blank line is read or until the input is exhausted.

The split Function

The split function was first discussed on Day 5, "Lists and Array Variables." It splits a character string into a list of elements.

The usual syntax for the split function is

list = split (pattern, value);

Here, value is the character string to be split. pattern is a pattern to be searched for. A new element is started every time pattern is matched. (pattern is not included as part of any element.) The resulting list of elements is returned in list.

For example, the following statement breaks the character string stored in $line into elements, which are stored in @list:

@list = split (/:/, $line);

A new element is started every time the pattern /:/ is matched. If $line contains This:is:a:string, the resulting list is ("This", "is", "a", "string").

If you like, you can specify the maximum number of elements of the list produced by split by specifying the maximum as the third argument. For example:

$line = "This:is:a:string";

@list = split (/:/, $line, 3);

As before, this breaks the string stored in $line into elements. After three elements have been created, no more new elements are created. Any subsequent matches of the pattern are ignored. In this case, the list assigned to @list is ("This", "is", "a:string").

TIP
If you use split with a limit, you can assign to several scalar variables at once:
$line = "11 12 13 14 15";
($var1, $var2, $line) = split (/\s+/, $line, 3);
This splits $line into the list ("11", "12", "13 14 15"). $var1 is assigned 11, $var2 is assigned 12, and $line is assigned "13 14 15". This enables you to assign the "leftovers" to a single variable, which can then be split again at a later time

The sort and reverse Functions

The sort function sorts a list in alphabetical order, as follows:

@sorted = sort (@list);

The sorted list is returned.

The reverse function reverses the order of a list:

@reversed = reverse (@list);

For more information on the sort and reverse functions, see Day 5. For information on how you can specify the sort order that sort is to use, see Day 9, "Using Subroutines."

The map Function

The map function, defined only in Perl 5, enables you to use each of the elements of a list, in turn, as an operand in an expression.

The syntax for the map function is

resultlist = map(expr, list);

list is the list of elements to be used as operands or arguments; this list is copied by map, but is not itself changed. expr is the expression to be repeated. The results of the repeated evaluation of the expression are stored in a list, which is returned in resultlist.

expr assumes that the system variable $_ contains the element of the list currently being used as an operand. For example:

@list = (100, 200, 300);

@results = map($_+1, @list);

This evaluates the expression $_+1 for each of 100, 200, and 300 in turn. The results, 101, 201, and 301, respectively, are formed into the list (101, 201, 301). This list is then assigned to @results.

To use map with a subroutine, just pass $_ to the subroutine, as in the following:

@results = map(&mysub($_), @list);

This calls the subroutine mysub once for each element of the list stored in @list. The values returned by mysub are stored in a list, which is assigned to @results.

This also works with built-in functions:

@results = map(chr($_), @list);

@results = map(chr, @list);  # same as above, 

Āsince $_ is the default argument for chr

This converts each element of the list in @list to its ASCII character equivalent. The resulting list of characters is stored in @results.

NOTE
For more information on the $_ system variable, refer to Day 17

The wantarray Function

In Perl, the behavior of some built-in functions depends on whether they are dealing with scalar values or lists. For example, the chop function either chops the last character of a single string or chops the last character of every element of a list:

chop($scalar);    # chop a single string

chop(@array);     # chop every element of an array

Perl 5 enables you to define similar two-way behavior for your subroutines using the wantarray function. (This function is not defined in Perl 4.)

The syntax for the wantarray function is

result = wantarray();

result is a non-zero value if the subroutine is expected to return a list, and is zero if the subroutine is expected to return a scalar value.

Listing 14.17 illustrates how wantarray works.


Listing 14.17. A program that uses the wantarray function.
1:  #!/usr/local/bin/perl

2: 

3:  @array = &mysub();

4:  $scalar = &mysub();

5:

6:  sub mysub {

7:          if (wantarray()) {

8:                  print ("true\n");

9:          } else {

10:                 print ("false\n");

11:         }

12: }  


$ program14_17

true

false

$

When mysub is first called in line 3, the return value is expected to be a list, which means that wantarray returns a non-zero (true) value in line 7. The second call to mysub in line 4 expects a scalar return value, which means that wantarray returns zero (false).

Associative Array Functions

Perl provides a variety of functions that operate on associative arrays. Most of these functions are described in detail on Day 10, "Associative Arrays"; a brief description of each function is presented here.

The keys Function

The keys function returns a list of the subscripts of the elements of an associative array.

The syntax for keys is straightforward:

list = keys (assoc_array);

assoc_array is the associative array from which subscripts are to be extracted, and list is the returned list of subscripts.

For example:

%array = ("foo", 26, "bar", 17);

@list = keys(%array);

This call to keys assigns ("foo", "bar") to @list. (The elements of the list might be in a different order. To specify a particular order, sort the list using the sort function.)

keys often is used with foreach, as in the following example:

foreach $subscript (keys (%array)) {

        # stuff goes here

}

This loops once for each subscript of the array.

The values Function

The values function returns a list consisting of all the values in an associative array.

The syntax for the values function is

list = values (assoc_array);

assoc_array is the associative array from which values are to be extracted, and list is the returned list of values.

The following is an example that uses values:

%array = ("foo", 26, "bar", 17);

@list = values(%array);

This assigns the list (26, 17) to @list (not necessarily in this order).

The each Function

The each function returns an associative array element as a two-element list. The list consists of the associative array subscript and its associated value. Successive calls to each return another associative array element.

The syntax for the each function is

pair = each (assoc_array);

assoc_array is the associative array from which pairs are to be returned, and pair is the subscript-element pair returned.

The following is an example:

%array = ("foo", 26, "bar", 17);

@list = each(%array);

The first call to each assigns either ("foo", 26) or ("bar", 17) to @list. A subsequent call returns the other element, and a third call returns an empty list. (The order in which the elements are returned depends on how the list is stored; no particular order is guaranteed.)

The delete Function

The delete function deletes an associative array element.

The syntax for the delete function is

element = delete (assoc_array_item);

assoc_array_item is the associative array element to be deleted, and element is the value of the deleted element.

The following is an example:

%array = ("foo", 26, "bar", 17);

$retval = delete ($array{"foo"});

After delete is called, the associative array %array contains only one element: the element with the subscript bar. $retval is assigned the value of the deleted element foo, which in this case is 26.

The exists Function

The exists function, defined only in Perl 5, enables you to determine whether a particular element of an associative array exists.

The syntax for the exists function is

result = exists(element);

element is the element of the associative array that is being tested for existence. result is non-zero if the element exists, and zero if it does not.

The following is an example:

$result = exists($myarray{$mykey});

$result is nonzero if $myarray{$mykey} exists.

Summary

Today, you learned about functions that manipulate scalar values and convert them from one form to another, and about functions that manipulate lists.

The chop function removes the last character from a scalar value or from each element of a list.

The crypt function encrypts a scalar value, using the same method that the UNIX password encryptor uses.

The int function takes a floating-point number and gets rid of everything after the decimal point.

The defined function checks whether a scalar variable, array element, or array has been assigned to. The undef function enables you to treat a previously defined scalar variable, array element, or array as if it is undefined. scalar enables you to treat an array or list as if it is a scalar value.

The other functions described in today's lesson convert values from one form into another. The hex and oct functions read hexadecimal and octal constants and convert them into decimal form. The ord function converts a character into its ASCII decimal equivalent. pack and unpack convert a scalar value into a format that can be stored in machine memory, and vice versa. vec enables you to treat a value as an array of numeric values, each of which is a certain number of bits long.

The grep function enables you to extract the elements of a list that match a particular pattern. This function can be used in conjunction with the file-test operators.

The splice function enables you to extract a portion of a list or insert a sublist into a list. The shift and pop functions remove an element from the left and right ends of a list, and the unshift and push functions add one or more elements to the left and right ends of a list. You can use push, pop, and shift to create stacks and queues.

The split function enables you to break a character string into list elements. You can impose an upper limit on the number of list elements to be created.

The sort function sorts a list in a specified order. The reverse function reverses the order of the elements in a list.

The map function copies a list and then performs an operation on every element of the list.

The wantarray function enables you to determine whether the statement that called a subroutine is expecting a scalar return value or a list.

Five functions are defined that manipulate associative arrays:

Q&A

Q:Why is the undefined value equivalent to the null string?
A:Basically, to keep Perl programs from blowing up if they try to access a variable that has not yet been assigned to.
Q:Why does oct handle hexadecimal constants that start with 0x or 0X?
A:There is no particular reason, except that it's a little more convenient. If you find that it bothers you to use oct to convert a hexadecimal constant, get rid of the leading 0x or 0X (using the substitute operator) and call hex instead.
Q:I want to put a password check in my program. How can I ensure that it is secure?
A:Do two things:
  • Don't include the unencrypted text of your password in your program source. People can then find out the password just by reading the file.
  • Use a password that is not a real English-language word or proper name. Include at least one digit. This makes your password harder to "crack."
Q:Why does int truncate instead of rounding?
A:Some programs might find it useful to just retrieve the integer part of a floating-point number. (For example, in earlier chapters, you have seen int used in conjunction with rand to return a random integer.)
You can always add 0.5 to your number before calling int, which will effectively round it up when necessary.
Q:When I pack integers using the s or i pack-format characters, the bits don't appear in the order I was expecting. What is happening?
A:Most machines enable you to store integers that are more than one byte long (two- and four-byte integers usually are supported). However, each machine does not store a multibyte integer in the same way. Some machines store the most significant byte of a word at a lower address; these machines are called big-endian machines because the big end of a word is first. Other machines, called little-endian machines, store the least significant byte of a word at a lower byte address.
If you are not getting the result you expect, you might be expecting big-endian and getting little-endian, or vice versa.
Q:The splice function works by shifting elements to the right or left to make room or fill gaps. Is this inefficient?
A:No. The Perl interpreter actually stores a list as a sequence of pointers (memory addresses). All splice has to do is rearrange the pointers. This holds true also for sort and reverse.
Q:Can I use each to work through an associative array in a specified order?
A:No. If you need to access the elements of an associative array in a specified order, use keys and sort to sort the subscripts, and then retrieve the value associated with each element.
Q:If I am using values with foreach, can I retrieve the subscript associated with a particular value if I need it?
A:No. If you are likely to need the subscripts as well as their values, use each or keys.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.

Quiz

  1. What format does each of the following pack-format characters specify?
    a.    A
    b.    A
    c.    d
    d.    p
    e.    @
  2. What do these unpack-format specifiers do?
    a.    "a"
    b.    "@4A10i*"
    c.    "@*X4C*"
    d.    "ix4iX8i"
    e.    "b*X*B*"
  3. What value is stored in $value by the following?
    a.    The statements
    $vector = pack ("b*", "10110110");
    $value = vec ($vector, 3, 1);
    b.    
    The statements
    $vector = pack ("b*", "10110110");
    $value = vec ($vector, 1, 2);
  4. What's the difference between defined and undef?
  5. Assume @list contains ("1", "2", "3", "4", "5"). What are the contents of @list after the following statement?
    a.    splice (@list, 0, 1, "new");
    b.    splice (@list, 2, 0, "test1", "test2");
    c.    splice (@list, -1, 1, "test1", "test2");
    d.    splice (@list, 2, 1);
    e.    splice (@list, 3);
  6. What do the following statements return?
    a.    grep (!/^!/, @array);
    b.    grep (/\b\d+\b/, @array);
    c.    grep (/./, @array);
    d.    grep (//, @array);
  7. What is the difference between shift and unshift?
  8. What arguments to splice are equivalent to the following function calls?
    a.    shift (@array);
    b.    pop (@array);
    c.    push (@array, @sublist);
    d.    unshift (@array, @sublist);>
  9. How can you create a stack using shift, pop, push, or unshift?
  10. How can you create a queue using shift, pop, push, or unshift?

Exercises

  1. Write a program that reads two binary strings of any length, adds them together, and writes out the binary output. (Hint: This is a really nasty problem. To get this to work, you will need to ensure that your bit strings are a multiple of eight bits by adding zeros at the front.)
  2. Write a program that reads two hexadecimal strings of any length, adds them together, and writes out the hexadecimal output. (Hint: This is a straightforward modification of Exercise 1.)
  3. Write a program that uses int to round a value to two decimal places. (Hint: This is trickier than it seems.)
  4. Write a program that encrypts a password and then asks the user to guess it. Give the user three chances to get it right.
  5. BUG BUSTER: What is wrong with the following program?
    #!/usr/local/bin/perl
    $bitstring = "00000011";
    $packed = pack("b*", $bitstring);
    $highbit = vec($packed, 0, 1);
    print ("The high-order bit is $highbit\n");
  6. Write a program that uses splice to sort a list in numeric order.
  7. Write a program that "flips" an associative array; that is, the subscripts of the old array become the values of the new, and vice versa. Print an error message if the old array has two subscripts with identical values.
  8. Write a program that reads a file from standard input, breaks each line into words, uses grep to get rid of all words longer than five characters, and prints the file.
  9. Write a program that reads an input line and uses split to read and print one word of the line at a time.
  10. BUG BUSTER: What is wrong with the following subroutine?
    sub retrieve_first_element {
    local ($retval);

    $retval = unshift(@array);
    }