Today's lesson teaches you how to manipulate your machine's file system using some of Perl's built-in library functions. Today, you learn about the following:
Many of the functions described in today's lesson use features of the UNIX operating system. If you are using Perl on a machine that is not running UNIX, some of these functions might not be defined or might behave differently. Check the documentation supplied with your version of Perl for details on which functions are supported or emulated on your machine |
The following sections describe the built-in library functions that read information from files and write information to files. These library functions perform the following tasks:
Some of the input and output functions supplied by Perl have been discussed in earlier chapters. These are
The following sections briefly describe these functions again, along with some features of these functions that have not been discussed previously.
The open function enables a Perl program to access a file. It associates a special file variable with each accessed file. The following is an example:
open (MYVAR, "/u/jqpublic/file");
Here, open requests access to the file /u/jqpublic/file, and it associates the file MYVAR with this file after it is open. open returns a nonzero value if the open succeeds, and zero if the open fails.
By default, open opens a file for reading only. To open a file for writing, put a > character in front of the filename, as follows:
open (MYVAR, ">/u/jqpublic/file");
To append information to an existing file, put two > characters in front of the filename, as follows:
open (MYVAR, ">>/u/jqpublic/file");
To treat the open file as a command to which to pipe data, put a pipe (|) character in front of the filename, as follows:
open (MAIL, "|mail dave");
(For more information, refer to Day 6, "Reading from and Writing to Files.")
The open function enables you to open files in several other ways not previously discussed. For example, to treat the open file as a command that is piping data to this program, put a | character after the filename. For example:
open (CAT, "cat file*|");
This call to open executes the command cat file*. This command creates a temporary file consisting of the contents of all files whose name starts with file; these contents are joined (concatenated) into a single file. This file is treated as an input file that is accessible using the file variable CAT.
$input = <CAT>;
Listing 12.1 is another example of a program that uses piped input.
This program uses the output from the w command to list
the users who are currently logged on to the machine.
Listing 12.1. A program that receives input from a piped command.
1: #!/usr/local/bin/perl 2: 3: open (WOUT, "w|"); 4: $time = <WOUT>; 5: $time =~ s/^ *//; 6: $time =~ s/ .*//; 7: <WOUT>; # skip headings line 8: @users = <WOUT>; 9: close (WOUT); 10: foreach $user (@users) { 11: $user =~ s/ .*//; 12: } 13: print ("Current time: $time"); 14: print ("Users logged on:\n"); 15: $prevuser = ""; 16: foreach $user (sort @users) { 17: if ($user ne $prevuser) { 18: print ("\t$user"); 19: $prevuser = $user; 20: } 21: }
$ program12_1 Current time: 4:25pm Users logged on: dave kilroy root zarquon $
The w command lists the current time, the machine load, and the users logged onto the machine. It also lists the job time and the currently executing command for each user.
Here is sample output for the w command:
4:25pm up 1 day, 6:37, 6 users, load average: 0.79, 0.36, 0.28 User tty login@ idle JCPU PCPU what dave ttyp0 2:26pm 27 3 w kilroy ttyp1 9:01am 2:27 1:04 11 -csh kilroy ttyp2 9:02am 43 1:46 27 rn root ttyp3 4:22pm 2 -csh zarquon ttyp4 1:26pm 4 43 16 cc myprog.c kilroy ttyp5 9:03am 2:14 48 /usr/games/hack
This Perl program takes the output from the w command and massages it to retrieve only the information needed: the current time and the users who are currently logged on.
Line 3 starts the w command. The call to open specifies that the output from w is to be treated as input to this program, and that the file variable WOUT is to be used to access this input.
Line 4 reads the first line of the input piped from WOUT. This is the line read:
4:25pm up 1 day, 6:37, 6 users, load average: 0.79, 0.36, 0.28
The following two lines extract the current time from this line. First, line 5 removes the leading spaces. Then, line 6 removes everything after the first word, except for the trailing newline character. This leaves the time, 4:25pm, along with the trailing newline, stored in $time.
Line 7 reads the second line from WOUT. Because this line contains no useful information, there is no need to assign it to any scalar variable.
Line 8 reads the rest of the output from w to the array variable @users. After this output has been read, line 9 closes WOUT, which terminates the process that is running the w command.
Each element of the list stored in @users contains one line of user information. Because this program needs only the first word of each line, lines 10-12 get rid of everything else (except, again, for the trailing newline character). After this loop is complete, the array in @users contains a list of users logged on.
Line 13 prints the current time, as stored in $time. Note that print does not need to specify a trailing newline character, because $time contains one.
Lines 16-21 sort the list of users in @users and prints them. Because a user can be logged on more than once, $prevuser stores the last user name printed. The value stored in $user is not printed unless it is not the same as the value stored in $prevuser.
Many UNIX shells enable you to direct both the standard output file and the standard error file to the same output file. For example, in the Bourne shell sh, the command
$ foo >file1 2>&1
runs the command foo and stores the output from the standard output file and the standard error file in file1.
Listing 12.2 shows how you can do this in Perl.
Listing 12.2. A program that redirects the standard output and standard error files.
1: #!/usr/local/bin/perl 2: 3: open (STDOUT, ">file1") || die ("open STDOUT failed"); 4: open (STDERR, ">&STDOUT") || die ("open STDERR failed"); 5: print STDOUT ("line 1\n"); 6: print STDERR ("line 2\n"); 7: close (STDOUT); 8: close (STDERR);
This program produces no output.
The following are the contents of the output file file1:
line 2 line 1
As you can see, these lines aren't in the order intended. To understand what is happening, let's examine this program in more detail.
Line 3 redirects the standard output file. To do this, it opens the output file file1 and associates it with the file variable STDOUT; this closes the standard output file.
Line 4 redirects the standard error file. The argument >&STDOUT tells the Perl interpreter to use the file already opened and associated with STDOUT. This means that the file variable STDERR refers to the same file as STDOUT.
Lines 5 and 6 write to STDOUT and STDERR, respectively. Because these file variables refer to the same file, both lines are written to file1. Unfortunately, they are written in the wrong order. What has happened?
The problem arises because of how UNIX handles the writing of output. When you use print (or any other function) to write to a file such as the standard output file, what the UNIX operating system really does is copy the output to a special internal storage area called a buffer. (You can think of a buffer as a giant character string or as an array of characters.) Subsequent output operations continue writing to the buffer until it is full; when the buffer is full, the entire buffer is written out. Copying to a buffer and then writing out the entire buffer takes much less time than writing individual lines of text. (This is because, on most machines, input-output operations are slower than memory-access operations.)
When a program ends, any non-empty buffers are written out. However, the system maintains separate buffers for STDERR and STDOUT, and it writes out the buffer for STDERR first. This means that line 2, which is stored in the STDERR buffer, appears before line 1, which is stored in the STDOUT buffer.
To get around this problem, you can tell the Perl interpreter not to use a buffer for a particular file. To do this, do the following:
The system variable $| indicates whether a particular file is to be buffered (in other words, whether it should use a buffer or not). If $| is assigned a nonzero value, no buffer is used. As with $~ and $^, assigning to $| affects the current default file, which is the file last specified in a call to select (or STDOUT, if select has not been called).
Listing 12.3 shows how you can use $| to ensure that your output lines appear in the correct order.
Listing 12.3. A program that redirects standard input and output and turns off buffering.
1: #!/usr/local/bin/perl 2: 3: open (STDOUT, ">file1") || die ("open STDOUT failed"); 4: open (STDERR, ">&STDOUT") || die ("open STDERR failed"); 5: $| = 1; 6: select (STDERR); 7: $| = 1; 8: print STDOUT ("line 1\n"); 9: print STDERR ("line 2\n"); 10: close (STDOUT); 11: close (STDERR);
This program produces no output.
The contents of the output file file1 are now the following:
line 1 line 2
Line 5 sets $| to 1, which tells the Perl interpreter that the current default file does not need to be buffered. Because select has not yet been called, the current default file is STDOUT, which means that line 5 turns off buffering for the standard output file (which has been redirected to file1).
Line 6 sets the current default file to STDERR, and line 7 once again sets $| to 1. This turns off buffering for the standard error file (which has also been redirected to file1).
Because buffering has been turned off for both STDERR and STDOUT, lines 8 and 9 write to file1 right away. This means that the output lines appear in file1 in the order in which they are printed.
To open a file for both read and write access, specify +> before the filename, as follows:
open (READWRITE, "+>file1");
This opens the file named file1 for both reading and writing. This enables you to overwrite portions of a file.
Opening a file for reading and writing works best in conjunction
with the library functions seek and tell, which
enable you to skip to the middle of a file. (For more information
on seek and tell, refer to the section called "Skipping and
Rereading Data," later in today's lesson.)
NOTE |
You also can use +< as the prefix to specify both reading and writing, as follows: open (READWRITE, "+<file1"); The prefix <, by itself, specifies that the file is to be opened for reading. This means that the following two statements are identical: open (READONLY, "<read"); |
The library function close was discussed on Day 6, "Reading from and Writing to Files." It closes a file opened by open, as follows:
close (MYFILE);
Here, MYFILE is the file variable (passed to open)
that is associated with the open file.
NOTE |
If you use close to close a pipe, the program will wait for the piped program to terminate. For example: open (MYPIPE, "cat file*|"); When close is called, the program suspends execution until the command cat file* is terminated |
The print, printf, and write functions have been covered also in previous chapters, but I'll briefly recap them here.
The print function is the simplest function. It writes to the file specified, or to the current default file if no file is specified. For example:
print ("Hello, there!\n"); print OUTFILE ("Hello, there!\n");
The first statement writes to the current default file (which is STDOUT unless select has been called). The second statement writes to the file specified by OUTFILE.
The printf function formats a string and sends it to either the file specified or the current default file. For example, the statement
printf OUTFILE ("You owe me %8.2f", $owing);
takes the value stored in $owing and substitutes it for %8.2f in the specified string. %8.2f is an example of a field specifier and indicates that the value stored in $owing is to be treated as a floating-point number.
The write function uses a print format to send formatted output to the file that is specified or to the current default file. For example:
select (OUTFILE); $~ = "MYFORMAT"; write;
This call to write uses the print format MYFORMAT to send output to the file OUTFILE.
For more information on printf or write, refer to Day 11, "Formatting Your Output."
The select function also is covered on Day 11. This function is passed a file variable, which becomes the new current default file. For example:
select (MYFILE);
In this case, MYFILE is now the current default file, which means that calls to print, write, and printf write to MYFILE unless a file variable is explicitly specified.
The library function eof checks whether the last input file read has been exhausted. If all of the input has been read, eof returns a nonzero value. If there is input remaining, eof returns zero.
The eof function was first introduced on Day 6. You might have noticed that, on that day, the examples that use eof use it without parentheses. This is because the behavior of eof is a little tricky if you are using it in conjunction with the <> operator; in this case, eof and eof() behave differently.
Listing 12.4 shows how eof interacts with <>. It prints the contents of one or more input files whose names are supplied on the command line. A line of dashes is printed after each input file is completed.
To run this program yourself, create two files named file1 and file2. Put the following in file1:
This is a line from the first file. Here is the last line of the first file.
Then, put the following in file2:
This is a line from the second and last file. Here is the last line of the last file.
Finally, specify file1 and file2 on the command line when you run this program. For example, if you have called this program program 12_4, run it as follows:
$ program12_4 file1 file2
This will give you the output shown in the input-output example.
Listing 12.4. A program that uses eof and <> together.
1: #!/usr/local/bin/perl 2: 3: while ($line = <>) { 4: print ($line); 5: if (eof) { 6: print ("-- end of current file --\n"); 7: } 8: }
$ program12_4 file1 file2 This is a line from the first file. Here is the last line of the first file. -- end of current file -- This is a line from the second and last file. Here is the last line of the last file. -- end of current file -- $
The <> operator in line 3 tells the program to read the next line of input from the input files supplied on the command line. Line 4 then prints the line.
Line 5 calls eof without parentheses. This is the form
of eof that you are familiar with. It returns true if
the current input file has been completely read.
When you test for end-of-file, use either eof or eof() but not both |
Compare the program in Listing 12.4 with Listing 12.5, which uses
eof() instead of eof.
Listing 12.5. A program that uses eof() and <> together.
1: #!/usr/local/bin/perl 2: 3: while ($line = <>) { 4: print ($line); 5: if (eof()) { 6: print ("-- end of output --\n"); 7: } 8: }
$ program12_5 file1 file2 This is a line from the first file. Here is the last line of the first file. This is a line from the second and last file. Here is the last line of the last file. -- end of output -- $
Line 5 of this program calls eof with
parentheses. Calls to eof with parentheses only return
true when all of the files have been read. If the program is at
the end of the first input file, eof() returns false
because there is still input to be read.
NOTE |
If you like, you can use eof with a particular file. For example: if (eof(MYFILE)) { Here, the conditional expression returns true if all of MYFILE has been read. Also, note that the distinction between eof and eof() is only meaningful when you are using the <> operator. If you are just reading from a single file, it doesn't matter whether you supply parentheses or not. For example: while ($line = <STDIN>) { |
When you call any of the functions described so far in today's lesson, you can indicate which file to use by specifying a file variable. However, these functions also enable you to supply a scalar variable in place of a file variable; when you do, the Perl interpreter treats the value stored in the scalar variable as the name of the file variable. For example, consider the following:
$filename = "MYFILENAME"; open ($filename, ">file1");
This call to open takes the value stored in $filename-MYFILENAME-and uses it as the file-variable name. This means that the file variable MYFILENAME is now associated with the output file file1.
Listing 12.6 is an example of a program that stores a file-variable
name in a scalar variable and passes the library variable to Perl
input and output functions.
Listing 12.6. A program that uses a scalar variable to store a file variable name.
1: #!/usr/local/bin/perl 2: 3: &open_file("INFILE", "", "file1"); 4: &open_file("OUTFILE", ">", "file2"); 5: while ($line = &read_from_file("INFILE")) { 6: &print_to_file("OUTFILE", $line); 7: } 8: 9: sub open_file { 10: local ($filevar, $filemode, $filename) = @_; 11: 12: open ($filevar, $filemode . $filename) || 13: die ("Can't open $filename"); 14: } 15: sub read_from_file { 16: local ($filevar) = @_; 17: 18: <$filevar>; 19: } 20: sub print_to_file { 21: local ($filevar, $line) = @_; 22: 23: print $filevar ($line); 24: }
This program produces no output.
This program is just a fancy way of copying the contents of file1 to file2. Line 3 opens the input file, file1, for reading by calling the subroutine open_file. This subroutine is passed the name of the file variable to use, which is INFILE.
Line 4 uses the same subroutine, open_file, to open the output file, file2, for writing. The file variable OUTFILE is used in this open operation.
Line 5 calls read_from_file to read a line of input and passes it the file variable name INFILE. Line 18 substitutes the value of $filevar, INFILE, into <$filevar>, yielding the result <INFILE>; then, it reads a line from this input file. Because this line-reading operation is the last expression evaluated in the subroutine, the line read is returned by the subroutine and assigned to $line.
Line 6 then passes OUTFILE and the input line just read
to the subroutine print_to_file.
NOTE |
All of the functions you've seen so far in this chapter-open, close, print, printf, write, select, and eof-enable you to use a scalar variable in place of a file variable. The functions open, close, write, select, and eof also enable you to use an expression in place of a file variable. The value of the expression must be a character string that can be used as a file variable |
In the programs you've seen so far,i nput files have always been read in order, starting with the first line of input and continuing on to the end. Perl provides two special functions, seek and tell, which enable you to skip forward or backward in a file so that you can skip or re-read data.
The seek function moves backward or forward in a file.
The syntax for the seek function is
seek (filevar, distance, relative_to);
As you can see, seek requires three arguments:
If relative_to is 0, the number of bytes to skip is relative to the beginning of the file. If relative_to is 1, the skip is relative to the current position in the file (the current position is the location of the next line to be read). If relative_to is 2, the skip is relative to the end of the file.
For example, to skip back to the beginning of the file MYFILE, use the following:
seek(MYFILE, 0, 0);
The following statement skips forward 80 bytes:
seek(MYFILE, 80, 1);
The following statement skips backward 80 bytes:
seek(MYFILE, -80, 1);
And the following statement skips to the end of the file (which is useful when the file has been opened for reading and writing):
seek(MYFILE, 0, 2);
The seek function returns true (nonzero) if the skip was successful, and 0 if it failed. It is often used in conjunction with the tell function, described in the next section.
The tell function returns the distance, in bytes, between the beginning of the file and the current position of the file (the location of the next line to be read).
The syntax for the tell function is
tell (filevar);
filevar, which is required, represents the file whose current position is needed.
For example, the following statement retrieves the current position of the file MYFILE:
$offset = tell (MYFILE);
NOTE |
tell and seek accept an expression in place of a file variable, provided the value of the expression is the name of a file variable |
You can use tell and seek to skip to a particular
position in a file. For example, Listing 12.7 uses these functions
to print pairs of lines twice each. (This is, of course, not the
fastest way to do this.)
Listing 12.7. A program that demonstrates seek and tell.
1: #!/usr/local/bin/perl 2: 3: @array = ("This", "is", "a", "test"); 4: open (TEMPFILE, ">file1"); 5: foreach $element (@array) { 6: print TEMPFILE ("$element\n"); 7: } 8: close (TEMPFILE); 9: open (TEMPFILE, "file1"); 10: while (1) { 11: $skipback = tell(TEMPFILE); 12: $line = <TEMPFILE>; 13: last if ($line eq ""); 14: print ($line); 15: $line = <TEMPFILE>; # assume the second line exists 16: print ($line); 17: seek (TEMPFILE, $skipback, 0); 18: $line = <TEMPFILE>; 19: print ($line); 20: $line = <TEMPFILE>; 21: print ($line); 22: }
$ program12_7 This is This is a test a test $
Lines 3-8 of this program create a temporary file named file1 consisting of four lines: This, is, a, and test. Line 9 opens this temporary file for reading.
Lines 10-22 loop through the test file. Line 11 calls tell to obtain the current position of the file before reading the pair of lines. Lines 12-16 read the lines and print them (first testing whether the end of the file has been reached).
Line 17 then calls seek, which positions the file at
the point returned by tell in line 11. This means that
the pair of lines read by lines 12 and 15 are read again by lines
18 and 20. Therefore, lines 19 and 21 print a second copy of the
input lines.
NOTE |
You cannot use seek and tell if the file variable actually refers to a pipe. For example, if you open a pipe using the statement open (MYPIPE, "cat file*|"); then the following statement makes no sense: $illegal = tell (MYPIPE) |
In Perl, the easiest way to read input from a file is to use the <filevar> operator, where filevar is the file variable representing the file to read. Perl also provides two other functions that read from an input file:
Perl also enables you to write output using the built-in function syswrite, which calls the UNIX write function.
These functions are described in the following sections.
The read function is designed to be equivalent to the UNIX function fread. It enables you to read an arbitrary number of characters (bytes) into a scalar variable.
The syntax for the read function is
read (filevar, result, length, skipval);
Here, filevar is the file variable representing the file to read, result is the scalar variable (or array variable element) into which the bytes are to be stored, and length is the number of bytes to read.
skipval is an optional argument which specifies the number of bytes to skip before reading.
For example:
read (MYFILE, $scalar, 80);
This call to read tries to read 80 bytes from the file represented by the file variable MYFILE, storing the resulting character string in $scalar. It returns the number of bytes actually read; if MYFILE is at end-of-file, it returns 0 (read returns the null string if an error occurs).
You can use read to append to an existing scalar variable by specifying a fourth argument, which indicates the number of bytes to skip in the scalar variable.
read (MYFILE, $scalar, 40, 80);
This call to read reads another 40 bytes from MYFILE. When copying these bytes into $scalar, read first skips the first 80 bytes already stored there.
If you want to read data as quickly as possible, you can call sysread instead of read.
The syntax for the sysread function is
sysread (filevar, result, length, skipval);
These arguments are the same as for read.
For example:
sysread (MYFILE, $scalar, 80); sysread (MYFILE, $scalar, 40, 80);
sysread is equivalent to the UNIX function read. The arguments to sysread are the same as those for the Perl read function.
To write as quickly as possible, call the syswrite function, which is equivalent to the UNIX function write.
The syntax of the syswrite function is
syswrite (filevar, data, length, skipval);
Here, filevar is the file to write to, data is the place where the data is located, length is the number of bytes to write, and skipval is the number of bytes to skip before writing.
For instance, the following call writes the first 80 bytes of $scalar to the file specified by MYFILE:
syswrite (MYFILE, $scalar, 80);
Similarly, the following statement skips the first 80 bytes stored in $scalar, and then writes the next 40 bytes to the file specified by MYFILE:
syswrite (MYFILE, $scalar, 40, 80);
Don't use sysread and syswrite unless you know what you are doing. For more information on these functions, refer to the UNIX system manual pages for the read and write functions |
Perl provides one other built-in function, getc, which reads a single character of input from a file.
The syntax for calls to the getc function is
char = getc (infile);
infile is the file from which to read, and char is the character returned.
For example:
$singlechar = getc(INFILE);
This statement reads a character from the file represented by INFILE and stores it (as a character string) in the scalar variable $singlechar.
The getc is useful for "hot key" applications.
These applications accept and process input one character at a
time rather than one line at a time. Listing 12.8 is an example
of such a program. It reads one character at a time and checks
whether the character is alphanumeric. If it is, it writes out
the next higher letter or number. For example, when you enter
a, the program prints out b, and so on. In this
example, the alphabetic letters a through z
and the digits 0 through 9 are typed in.
Listing 12.8. A program that demonstrates the use of getc.
1: #!/usr/local/bin/perl 2: 3: &start_hot_keys; 4: while (1) { 5: $char = getc(STDIN); 6: last if ($char eq "\\"); 7: $char =~ tr/a-zA-Z0-9/b-zaB-ZA1-90/; 8: print ($char); 9: } 10: &end_hot_keys; 11: print ("\n"); 12: 13: sub start_hot_keys { 14: system ("stty cbreak"); 15: system ("stty -echo"); 16: } 17: 18: sub end_hot_keys { 19: system ("stty -cbreak"); 20: system ("stty echo"); 21: }
$ program12_8 bcdefghijklmnopqrstuvwxyza1234567890 $
The subroutine start_hot_keys modifies
the runtime environment to support hot-key input. To do this,
it uses two calls to the built-in function system, which
simply takes its argument and executes it. The command stty
cbreak tells the system to process input one character at
a time, and the command stty -echo tells the system not
to display characters typed at the keyboard.
NOTE |
Some machines might not support hot keys or might use different commands to establish the hot-key environment. If you are on a machine that uses different commands to establish the environment, you still can run this program; just change the stty commands to whatever works on your machine |
The loop in lines 4-9 reads and writes one character per loop iteration. Line 5 starts off by reading a character from the standard input file using getc.
Line 6 tests whether the character read is a backslash. If it is, the loop terminates. If the character is not a backslash, the program continues with line 7. This line translates all alphanumeric characters to the next-highest letter or number; for example, it translates g to h, E to F, and 7 to 8. The characters z, Z, and 9 are translated to a, A, and 0, respectively.
Line 8 prints out the translated character. Because the characters you type at the keyboard are not displayed, the program makes it look like your keyboard is malfunctioning. (It's quite disorienting!)
The subroutine end_hot_keys restores the normal working
environment by undoing the system calls that are performed by
start_hot_keys.
If you are using hot keys, when you clean up make sure you call stty-cbreak before calling stty echo. If you call stty echo first, your terminal might wind up not printing newline characters properly |
If your machine distinguishes between text files and binary files (files that contain unprintable characters), your Perl program can tell the system that a particular file is a binary file. To do this, call the built-in function binmode.
The syntax for calling the binmode function is
binmode (filevar);
filevar is a file variable.
binmode expects a file variable (or an expression whose value is the name of a file variable). It must be called after the file is opened, but before the file is read.
The following is an example of a call to binmode:
binmode (MYFILE);
NOTE |
Normally, you won't need to use this function unless you are running in a DOS-like environment |
The input and output functions that you have seen earlier read and write data to files. Perl also provides a group of functions that enable you to manipulate UNIX directories. Functions exist that enable you to create, read, open, close, delete, and skip around in directories. The following sections describe these functions.
To create a new directory, call the function mkdir.
The syntax for the mkdir function is
mkdir (dirname, permissions);
mkdir requires two arguments:
For example, to create a directory named /u/jqpublic/newdir, you can use the following statement:
mkdir ("/u/jqpublic/newdir", 0777);
To create a subdirectory of the current working directory, just specify the new directory name, as follows:
mkdir ("newdir", 0777);
If the current working directory is /u/janedoe/mydir,
this creates a subdirectory named /u
/janedoe/mydir/newdir.
The permissions value of 0777 in both these examples
grants read, write, and execute permissions to everybody. Table
12.1 lists each possible access permission and the octal number
associated with it.
Permission | |
Set user ID on execution | |
Set group ID on execution | |
Sticky bit (see the UNIX chmod manual page) | |
Read permission for file owner | |
Write permission for file owner | |
Execute permission for file owner | |
Read permission for owner's group | |
Write permission for owner's group | |
Execute permission for owner's group | |
Read permission for world | |
Write permission for world | |
Execute permission for world |
You can combine access permissions by adding (or doing a logical
OR operation on) the appropriate octal values in the
table. For example, to grant read, write, and execute permission
to the owner but only read permission to everybody else, specify
0744 as the permission value.
NOTE |
All of the permission values shown here are in octal notation, because a leading zero is specified. If you like, you can use decimal or hexadecimal here, but it won't be as easy to read. Also note that the permission value set here is affected by the current value of umask. See the description of the umask function later today for more information |
mkdir returns true (nonzero) if the directory is successfully created. It returns false (0) if the directory is not.
To set a directory to be the current working directory, use the function chdir.
The syntax for the chdir function is
chdir (dirname);
dirname is the name of the new current working directory.
chdir returns true if the current directory is set properly, false if an error occurs.
For example, to set the current working directory to /u/jqpublic/newdir, use the following statement:
chdir ("/u/jqpublic/newdir");
NOTE |
As with mkdir, the directory name passed to chdir can be either a character string or an expression whose value is a directory name. For example, the following sets the current directory to be /u/jqpublic/newdir: $dir = "/u/jqpublic/"; |
You can have your program examine a list of the files contained in a directory. To do this, the first step is to call the built-in function opendir.
The syntax for the opendir function is
opendir (dirvar, dirname);
dirvar is the name the program is to use to represent the directory, also known as a directory variable, and dirname is the name of the directory to open (which can be a character string or the value of an expression).
opendir returns true if the open operation is successful, and it returns false otherwise.
For example, to open the directory named /u/janedoe/mydir, you can use the following statement:
opendir (DIR, "/u/janedoe/mydir");
This associates the directory variable DIR with the opened
directory.
NOTE |
If you like, you can use the same name as both a directory variable and a file variable. opendir (MYNAME, "/u/jqpublic/dir"); The Perl interpreter always can tell from context whether a name is being used as a directory variable or as a file variable. (However, there is no real reason to do so. Your programs will be easier to read if you use different names to represent files and directories. |
To close an opened directory, call the closedir function.
The syntax for the closedir function is
closedir (mydir);
closedir expects one argument: the directory variable associated with the directory to be closed.
After opendir has opened a directory, you can access the name of each file or subdirectory stored in the directory by calling the function readdir.
The syntax for the readdir function is
readdir (mydir);
Like closedir, readdir is passed the directory variable that is associated with the open directory.
If the value returned from readdir is assigned to a scalar variable, readdir returns the name of the first file or subdirectory stored in the directory. For example:
$filename = readdir(MYDIR);
The first name is returned also if the return value from readdir is assigned to an element of an array variable. For example:
$filearray[3] = readdir(MYDIR); $filearray{"foo"} = readdir(MYDIR);
If readdir is called again, it returns the next name
in the directory; subsequent calls return other names, continuing
until the directory is exhausted. Listing 12.9 uses readdir
to list the files and subdirectories in a directory.
Listing 12.9. A program that lists the files and subdirectories in a directory.
1: #!/usr/local/bin/perl 2: 3: opendir(HOMEDIR, "/u/jqpublic") || 4: die ("Unable to open directory"); 5: while ($filename = readdir(HOMEDIR)) { 6: print ("$filename\n"); 7: } 8: closedir(HOMEDIR);
$ program12_9 . .. .cshrc .Xresources .xsession test bin letter file1 $
Line 3 opens the directory /u/jqpublic, which is the home directory for user jqpublic. The opendir function associates the directory variable HOMEDIR with /u/jqpublic.
Lines 5-7 read the name of each file in the directory in turn. Line 6 prints each filename as it is read in.
Note that, on a UNIX system, the list of names includes two special files:
As you can see, readdir reads the names in the order in which they appear in the directory.
Listing 12.10 shows how you can display the names in alphabetical
order.
Listing 12.10. A program that lists the files and subdirectories in a directory in alphabetical order.
1: #!/usr/local/bin/perl 2: 3: opendir(HOMEDIR, "/u/jqpublic") || 4: die ("Unable to open directory"); 5: @files = readdir(HOMEDIR); 6: closedir(HOMEDIR); 7: foreach $file (sort @files) { 8: print ("$file\n"); 9: }
$ program12_10 . .. .Xresources .cshrc .xsession bin file1 letter test $
The readdir function behaves differently when its return value is assigned to an array; in this case, the entire list of files and subdirectories in the directory is assigned to the array variable @files by line 5.
After the entire list is stored, sort can be called to sort the list into alphabetical order. The foreach loop in lines 7-9 then prints the sorted list one name at a time.
As you've seen, the library functions tell and seek enable you to skip backward and forward in a file. Similarly, the library functions telldir and seekdir enable you to skip backward and forward in a list of directories.
To use telldir, pass it the directory variable defined by opendir. telldir returns the current directory location (where you are in the list of files).
The syntax for the telldir function is
location = telldir (mydir);
Here, mydir is the directory variable corresponding to the directory whose file list you are examining, and location is assigned the current directory location.
To skip to the directory location returned by telldir, call seekdir.
The syntax for the seekdir function is
seekdir(mydir, location);
This call to seekdir sets the current directory location
to the location specified by location.
seekdir works only with directory locations returned by telldir |
Although being able to skip anywhere you like in a directory list is useful, the most common skipping operation in directory lists is rewinding the directory list, or starting over again. Because of this, Perl provides a special function, rewinddir, that handles the rewind operation.
The syntax for the rewinddir function is
rewinddir (mydir);
rewinddir sets the current directory location to the beginning of the list of files, which lets you read the entire list of files again. As with the other directory functions, mydir is the directory variable defined by opendir.
The final directory function supplied by Perl is rmdir, which deletes an empty directory.
The syntax for calling the rmdir function is
rmdir (dirname);
rmdir returns true (nonzero) if the directory dirname is deleted successfully, and false if the directory is not empty or cannot be deleted.
Perl provides several library functions that modify the attributes or behavior of files. These functions can be divided into the following groups:
These groups of functions are described in the following sections.
Perl provides the following file-relocation functions:
The built-in function rename changes the name of a file.
The syntax for the rename function is
rename (oldname, newname);
oldname is the old filename, and newname is the new filename.
The rename function returns true if the rename succeeds, and false if an error occurs.
For example, to change a file named name1 to name2, use the following:
rename ("name1", "name2");
You can use the value stored in a scalar variable as an argument to rename, or any variable or expression whose value is a character string, as follows:
rename ($oldname, &get_new_name);
You can also use rename to move a file from one directory to another (provided both directories are in the same file system). For example:
rename ("/u/jqpublic/name1", "/u/janedoe/name2");
NOTE |
When rename moves a file, as in rename ("name1", "name2"); it does not check whether a file named name2 already exists. Any existing name2 is destroyed by the rename operation. To get around this problem, use the -e file-test operator, which checks whether a named file exists, as follows: -e "name2" || rename (name1, name2); Here, the || operator ensures that rename is called only when no file named name2 already exists |
To delete a file, use the unlink function.
The syntax for the unlink function is
num = unlink (filelist);
This function takes a list as its argument and deletes all the files named in that list.
unlink returns the number of files actually deleted.
The following is an example of a call to unlink:
@deletelist = ("file1", "file2"); unlink (@deletelist);
The function is called unlink, instead of delete, because what it is actually doing is removing a reference, or link, to the particular file. See the following section for more details on links in Perl.
In the UNIX environment, files can be "contained" in more than one directory at a time. Each directory contains a reference, or link, to the file.
The following sections describe how to create and access links.
NOTE |
If a file is referenced by multiple links, unlink removes only one of the links, and the file can still be referenced |
To create a link to an existing file, use the built-in function link.
The syntax for the link function is
link (newlink, file);
newlink is the link being created, and file is the file being linked to.
link returns true if the link is created, and false if an error occurs.
For example:
link ("/u/jqpublic/file", "/u/janedoe/newfile");
After link has been called, the file /u/jqpublic/file also can be thought of as the file /u/janedoe/newfile. If unlink is called using /u/jqpublic/file, as in
unlink ("/u/jqpublic/file");
you can still reference the file by specifying the name /u/janedoe/newfile.
The link created by the link function is called a hard link, which means that it actually references the file itself. Many operating systems also support symbolic links, which are references to the filename, not to the file itself.
To create a symbolic link, use the function symlink.
The syntax for the symlink function is
symlink (newlink, file);
newlink is the link being created, and file is the file being linked to.
symlink, like link returns true if the link is created, and false if an error occurs.
The following is an example of symlink:
symlink("/u/jqpublic/file", "/u/janedoe/newfile");
Here, /u/janedoe/newfile is symbolically linked to /u/jqpublic/file. Now, when the following statement is executed, the file is actually deleted:
unlink ("/u/jqpublic/file");
/u/janedoe/newfile now references nothing at all. (In this case, /u/janedoe/newfile is an example of an unresolved symbolic link.) When /u/jqpublic/file is created again, you will be able to access the new file using /u/janedoe/newfile.
If a filename, such as /u/janedoe/newfile, is actually a symbolic link to another filename, the function readlink returns the filename to which it is linked.
The syntax for the readlink function is
filename = readlink (linkname);
linkname is the symbolic link, and filename is the equivalent filename.
readlink returns an empty string if the filename is not a symbolic link. (In particular, readlink fails if the filename is actually a hard link.)
For example:
$linkname = readlink("/u/janedoe/newfile"); # $linkname now contains "/u/jqpublic/file"
Listing 12.11 is an example of a program that prints all the symbolic
links in a particular directory.
Listing 12.11. A program that prints symbolic links.
1: #!/usr/local/bin/perl 2: 3: $dir = "/u/janedoe"; 4: opendir(MYDIR, $dir); 5: while ($name = readdir(MYDIR)) { 6: if (-l $dir . "/" . $name) { 7: print ("$name is linked to "); 8: print (readlink($dir . "/". $name) . "\n"); 9: } 10: } 11: closedir(MYDIR);
$ program12_11 newfile is linked to /u/jqpublic/file $
This program uses opendir and readdir to examine each file in the directory in turn. Line 6 uses the -l file-test operator to determine whether the filename is actually a symbolic link. If the filename is a symbolic link, the following expression becomes true, and the program executes the calls to print in lines 7 and 8:
-l $dir . "/" . $name
Line 8 calls readlink, passing it the directory name and the filename stored in $name. Because readlink is called only if the expression in line 6 is true, $name is always a symbolic link.
As you've seen, the built-in function mkdir requires you to specify the access permissions for the directory you are creating. These permissions indicate, for example, whether particular users are allowed to read files from the directory or write into the directory.
In the UNIX environment, each individual file has its own set of access permissions. The set of possible permissions is the same as for directories. (Refer to Table 12.1 in the section titled "The mkdir Function" earlier in today's lesson for a complete list of the possible functions.)
In Perl, three functions are defined that deal with access permissions.
To change the access permissions for a list of files, call the chmod function.
The syntax for the chmod function is
chmod (permissions, filelist);
permissions is the set of access permissions you want to give, and is a standard UNIX file permissions mask. (For example, setting permissions to 0777 gives read, write, and execute permission to everybody. See the section called "The mkdir Function" for a description of the set of permissions.) filelist is the list of files whose permissions you want to change.
The chmod function returns the number of files whose permissions were successfully set.
The following is an example of a call to chmod:
@filelist = ("file1", "file2"); chmod (0777, @filelist);
In this example, the files file1 and file2 are
assigned global read, write, and execute permissions.
NOTE |
You cannot change access permissions using chmod unless you have permission to do so. You need to have been granted write permission on a file before you can change its permissions |
Normally, the owner of a file is the person who created it. To change the owner of a file, use the function chown.
The syntax for the chown function is
chown (userid, groupid, filelist);
The chown function requires three arguments:
The chown function returns the number of files changed.
The following is an example of a call to chown:
@filelist = ("file1", "file2"); chown (17, -1, @filelist);
NOTE |
On most UNIX systems, you can retrieve a user ID or group ID from the /etc/passwd file. You can use the Perl function getpwnam to retrieve information from this file. For more information on getpwnam, refer to Day 15, "System Functions." Also, the superuser (system administrator) is usually the only user allowed to change the owner of a file |
As you've seen, you can change the access permissions for a file using chmod. To specify access permissions you cannot use when you create a file, use the umask function.
The syntax for calls to umask is
oldmaskval = umask (maskval);
maskval is the current umask value, and umask returns the previous (superseded) umask value in oldmaskval. Each umask value is a file creation mask, and is used to set the default permissions for files and directories. (See the umask manual page for more details on file creation masks.)
For example, the following statement disables group and world access permissions for the newly created file:
$oldperms = umask(0022);
NOTE |
You can determine the current umask value by passing no arguments to umask, as follows: $currperms = umask(); This statement assigns the current umask value to $currperms. |
Some file-test operators in Perl are designed to test for various
permissions. Table 12.2 lists these file-test operators; in each
case, filename is the name of the file being tested.
Description | |
Does filename have its set group ID bit set? | |
Does filename have its "sticky bit" set? | |
Is filename a readable file? | |
Does filename have its set user ID bit set? | |
Is filename a writable file? | |
Is filename an executable file? | |
Is filename readable only if the real user ID can read it? | |
Is filename writable only if the real user ID can write? | |
Is filename executable only if the real user ID can execute it? |
In this case, the real user ID is the user id specified at login, as opposed to the effective user ID, which is the user id under which you are currently running. (On some machines, a command such as /usr/local/etc/suid enables you to change your effective user ID.)
(See Day 6 for more information on how to use file-test operators.)
The following sections describe other Perl functions that manipulate files.
The truncate function enables you to reduce the size of a specified file to a particular length.
The syntax for the truncate function is
truncate (filename, length);
filename is the name of the file to reduce, and length is the new length of the file.
For example, the statement
truncate ("/u/jqpublic/longfile", 5000);
reduces the size of /u/jqpublic/longfile to 5000 bytes
in length. (If the file is already smaller than 5000 bytes, truncate
does nothing.)
NOTE |
You can use a file variable in place of the filename. Truncate (MYFILE, 5000); The file variable must refer to a file opened for writing by the open function |
The stat function retrieves information about a particular file when given its name or a file variable representing its name.
The syntax for the stat function is
stat (file);
Here, file is either a filename or a file variable.
stat returns a list containing the following elements, in this order:
Some of the items returned by stat can be obtained using
file test operators. Table 12.3 lists these items.
Description | |
Is filename a mountable disk (block device)? | |
Is filename an I/O device (character device)? | |
Is filename a non-empty file? | |
Does filename represent a terminal? | |
How long since filename accessed? | |
How long since filename's inode accessed? | |
How long since filename modified? | |
Is filename a socket? |
For more information on stat or the information it returns, see the UNIX manual page for the stat command on your machine.
The lstat function returns the same information as stat, but it assumes that the name being passed as an argument is a symbolic link.
The syntax for lstat is the same as that for stat.
lstat (file); file is either a filename or a file variable.
The access and modification times returned by stat and by the -A and -M file-test operators are integers representing the number of elapsed seconds from January 1, 1970, to the time the file was accessed or modified.
To obtain the number of elapsed seconds from January 1, 1970, to the present time, call the built-in function time.
The syntax for calls to the time function is
currtime = time();
currtime is the returned elapsed-seconds value.
The value returned by time can be converted to either Greenwich Mean Time or your computer's local time.
To convert to Greenwich Mean Time, call the gmtime function. To convert to local time, call the localtime function.
The syntax for the gmtime and localtime functions is identical:
timelist = gmtime (timeval); timelist = localtime (timeval);
Both functions accept the time value returned by time, stat, or the -A and -M file-test operators.
Both functions return a list consisting of the following nine elements:
For more information on the list returned by gmtime or localtime, refer to the UNIX manual pages for the system functions with the same names.
The time values returned by stat, time, and the -A and -M file-test operators can be used to set the access and modification times of other files. To do this, use the utime function.
The syntax for the utime function is
utime (acctime, modtime, filelist);
acctime is the new access time, modtime is the new modification time, and filelist is the list of files.
utime returns the number of files whose access and modification times have been successfully changed.
The following is an example of a call to utime:
$acctime = -A "file1"; $modtime = -M "file1"; @filelist = ("file2", "file3"); utime ($acctime, $modtime, @filelist);
Here, the files file2 and file3 have their access and modification times changed to those of file1.
The fileno function returns the internal UNIX file descriptor associated with a particular file variable.
The syntax for the fileno function is
filedesc = fileno (filevar);
Here, filevar is the file variable whose descriptor is to be retrieved.
The file descriptor returned by fileno is used in various UNIX system calls; these calls can be accessed using the system function (as described on Day 15).
The flock and fcntl functions call the UNIX system commands of the same name.
The syntax for the flock and fcntl functions is
fcntl (filevar, fcntlrtn, value); flock (filevar, flockop);
Here, filevar is a file variable representing an open file. fcntlrtn is a fcntl function as defined in the UNIX fcntl manual page, and value is the value passed to the function, if appropriate. Similarly, flockop is a file-locking operation, as defined in the UNIX flock manual page.
For more information on these functions, refer to the manual pages or to a book about UNIX. (You won't really be able to use these functions effectively unless you know a fair bit about how your operating system works.)
Many systems on which Perl is available support files that are created using the Data Base Management (DBM) library. Perl enables you to use an associative array to access a particular DBM file.
The following sections describe how to access DBM files from Perl programs using the dbmopen and dbmclose functions. If you are running Perl 5, these functions have been superseded by the tie and untie functions; see Day 19, "Object-Oriented Programming in Perl," for more details.
For more information on DBM, refer to your system's appropriate manual pages.
To associate an associative array with a DBM file, use the dbmopen function.
The syntax for the dbmopen function is
dbmopen (array, dbmfilename, permissions);
This function requires three arguments:
After the DBM file has been opened, the subscripts for the associative
array represent the DBM file keys, and the values of the array
represent the values associated with the keys.
Calling dbmopen destroys any existing values in the associative array |
To close a DBM file opened by dbmopen, use dbmclose.
The syntax for the dbmclose function is
dbmclose (array);
Here, array is the associative array specified in the call to dbmopen.
Today, you learned how to open a pipe that directs input to the program, how to open a file for both reading and writing, and how to associate multiple file variables with a single file. You also learned how to test for the end of a particular input file or for the end of the last input file.
You also learned how to skip backward and forward in files and how to read single characters from a file using getc. You can use getc to build hot-key applications, which act as soon as they read a single character from the keyboard.
Perl provides several functions for manipulating directories. They enable you to create, open, read, close, delete, and skip around in directories. Other Perl functions enable you to move a file from one directory to another, create hard and symbolic links from one location to another, and delete a hard link (or a file).
You learned about the Perl functions that enable you to change the file owner or file permissions, truncate a file, retrieve file information, set file access and modification times, retrieve the file descriptor, and call the flock and fcntl system commands.
Finally, Perl provides an interface to the DBM library that enables you to associate DBM files with associative arrays.
Q: | How can I determine whether a particular Perl function that manipulates the UNIX file system is defined on my machine? |
A: | A Perl function that
manipulates the UNIX file system normally has the same name as the UNIX
command or C library function that performs the same task. If the UNIX
command or C library function is defined, the Perl function is usually
defined as well. To check whether a UNIX command or C library function is defined, enter the command man name, where name is the name of the Perl library function for which you are checking. |
Q: | Why does a list of files in a directory appear in unsorted order? |
A: | The list appears in the order in which the files are stored in the directory. This varies, depending on the machine; usually, however, newer files appear at the end of the list. |
Q: | Which is better to use: the file-test operators or the built-in function stat? |
A: | Whenever possible, use the file-test operators. They are easier to use and are often more efficient. |
Q: | Why are both read and sysread defined, when they are so similar? |
A: | read, like the UNIX function fread, uses the standard UNIX input-output (I/O) environment. sysread and syswrite, on the other hand, bypass the standard I/O environment and perform low-level system calls. |
Q: | Why are eof and eof() different? |
A: | The short answer is: Just because. The long answer is that an empty list as an argument (as in eof()) refers to the list of files on the command line, as does the <> in
while ($line = <>) ... eof, on the other hand, refers only to the file currently being read. |
The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.