Today's lesson describes everything you need to know about scalar values in Perl. Today, you learn about the following:
Basically, a scalar value is one unit of data. This unit of data can be either a number or a chunk of text.
There are several types of scalar values that Perl understands. Today's lesson describes each of them in turn and shows you how you can use them.
The most common scalar values in Perl programs are integer scalar values, also known as integer constants or integer literals.
An integer scalar value consists of one or more digits, optionally preceded by a plus or minus sign and optionally containing underscores.
Here are a few examples:
14 10000000000 -27 1_000_000
You can use integer scalar values in expressions or assign them to scalar variables, as follows:
$x = 12345; if (1217 + 116 == 1333) { # statement block goes here }
In Perl, there is a limit on the size of integers included in
a program. To see what this limit is and how it works, take a
look at Listing 3.1, which prints out integers of various sizes.
Listing 3.1. A program that displays integers and illustrates their size limitations.
1: #!/usr/local/bin/perl 2: 3: $value = 1234567890; 4: print ("first value is ", $value, "\n"); 5: $value = 1234567890123456; 6: print ("second value is ", $value, "\n"); 7: $value = 12345678901234567890; 8: print ("third value is ", $value, "\n");
$ program3_1 first value is 1234567890 second value is 1234567890123456 third value is 12345678901234567168 $
This program assigns integer scalar values to the variable $value, and then prints $value
Lines 3 and 4 store and print the value 1234567890 without any difficulty. Similarly, lines 5 and 6 successfully store and print the value 1234567890123456.
Line 7 attempts to assign the value 12345678901234567890 to $value. Unfortunately, this number is too big for Perl to understand. When line 8 prints out the value assigned to $value, it prints out
12345678901234567168
As you can see, the last three digits have been replaced with different values.
Here's what has happened: Perl actually stores integers in the floating-point registers on your machine. In other words, integers are treated as if they are floating-point numbers (numbers containing decimal points).
On most machines, floating-point registers can store approximately 16 digits before running out of space. As the output from line 8 shows, the first 17 digits of the number 12345678901234567890 are remembered and stored by the Perl interpreter, and the rest are thrown away. This means that the value printed by line 8 is not the same as the value assigned in line 7.
This somewhat annoying limitation on the number of digits in an integer can be found in almost all programming languages. In fact, many programming languages have an upper integer limit of 4294967295 (which is equal to 232 minus 1).
The number of digits that can be stored varies from machine to
machine. For a more detailed explanation, refer to the discussion
of precision in the following section, "Floating-Point Scalar
Values."
An integer constant that starts with a 0 is a special case: $x = 012345; The 0 at the beginning of the constant (also known as a leading zero) tells the Perl interpreter to treat this as an octal integer constant. To find out about octal integer constants, refer to the section called "Using Octal and Hexadecimal Notation" later today |
As you have just seen, integers in Perl actually are represented as floating-point numbers. This means that an integer scalar value is actually a special kind of floating-point scalar value.
In Perl, a floating-point scalar value consists of all of the following:
Here are some simple examples of floating-point scalar values:
11.4 -275 -0.3 .3 3.
The optional exponent tells the Perl interpreter to multiply or divide the scalar value by a power of ten. An exponent consists of all of the following:
The number in the exponent represents the value by which to multiply or divide, represented as a power of 10. For example, the exponent e+01 tells the Perl interpreter to multiply the scalar value by 10 to the power of 1, or 10. This means that the scalar value 8e+01 is equivalent to 8 multiplied by 10, or 80.
Similarly, the exponent e+02 is equivalent to multiplying by 100, e+03 is equivalent to multiplying by 1,000, and so on. The following scalar values are all equal:
541e+01 54.1e+02 5.41e+03
A negative exponent tells the Perl interpreter to divide by 10. For example, the value 54e-01 is equivalent to 54 divided by 10, or 5.4. Similarly, e-02 tells the Perl interpreter to divide by 100, e-03 to divide by 1,000, and so on.
The exponent e+00 is equivalent to multiplying by 1, which does nothing. Therefore, the following values are equal:
5.12e+00 5.12
If you want, you can omit the + when you multiply by a power of ten.
5.47e+03 5.47e03
Listing 3.2 shows how Perl works with and prints out floating-point
scalar values.
Listing 3.2. A program that displays various floating-point scalar values.
1: #!/usr/local/bin/perl 2: 3: $value = 34.0; 4: print ("first value is ", $value, "\n"); 5: $value = 114.6e-01; 6: print ("second value is ", $value, "\n"); 7: $value = 178.263e+19; 8: print ("third value is ", $value, "\n"); 9: $value = 123456789000000000000000000000; 10: print ("fourth value is ", $value, "\n"); 11: $value = 1.23e+999; 12: print ("fifth value is ", $value, "\n"); 13: $value = 1.23e-999; 14: print ("sixth value is ", $value, "\n");
$ program3_2 first value is 34 second value is 11.460000000000001 third value is 1.7826300000000001e+21 fourth value is 1.2345678899999999e+29 fifth value is Infinity sixth value is 0 $
As in Listing 3.1, this program stores and prints various scalar values. Line 3 assigns the floating-point value 34.0 to $value. Line 4 then prints this value. Note that because there are no significant digits after the decimal point, the Perl interpreter treats 34.0 as if it is an integer
Line 5 assigns 114.6e-01 to $value, and line 6 prints this value. Whenever possible, the Perl interpreter removes any exponents, shifting the decimal point appropriately. As a result, line 6 prints out
11.460000000000001
which is 114.6e-01 with the exponent e-01 removed and the decimal point shifted one place to the left (which is equivalent to dividing by 10).
Note that the number printed by line 6 is not exactly equal to
the value assigned in line 5. This is a result of round-off
error. The floating-point register cannot contain the exact
value 11.46, so it comes as close as it can. It comes
pretty close-in fact, the first 16 digits are correct. This number
of correct digits is known as the precision, and it is
a property of the machine on which you are working; the precision
of a floating-point number varies from machine to machine. (The
machine on which I ran these test examples supports a floating-point
precision of 16 or 17 digits. This is about normal.)
NOTE |
The size of an integer is roughly equivalent to the supported floating-point precision. If a machine supports a floating-point precision of 16 digits, an integer can be approximately 16 digits long. |
Line 6 shows that a floating-point value has its exponent removed whenever possible. Lines 7 and 8 show what happens when a number is too large to be conveniently displayed without the exponent. In this case, the number is displayed in scientific notation.
In scientific notation, one digit appears before the decimal point, and all the other significant digits (the rest of the machine's precision) follow the decimal point. The exponent is adjusted to reflect this. In this example, the number
178.263e+19
is converted into scientific notation and becomes
1.7826300000000001e+21
As you can see, the decimal point has been shifted two places to the left, and the exponent has, as a consequence, been adjusted from 19 to 21. As before, the 1 at the end is an example of round-off error.
If an integer is too large to be displayed conveniently, the Perl interpreter converts it to scientific notation. Lines 9 and 10 show this. The number
123456789000000000000000000000
is converted to
1.2345678899999999e+29
Here, scientific notation becomes useful. At a glance, you can tell approximately how large the number is. (In conventional notation, you can't do this without counting the zeros.)
Lines 11 and 12 show what happens when the Perl interpreter is given a number that is too large to fit into the machine's floating-point register. In this case, Perl just prints the word Infinity.
The maximum size of a floating-point number varies from machine to machine. Generally, the largest possible exponent that can be stored is about e+308.
Lines 13 and 14 illustrate the case of a number having a negative exponent that is too large (that is, it's too small to store). In such cases, Perl either gets as close as it can or just prints 0.
The largest negative exponent that produces reliable values is about e-309. Below that, accuracy diminishes.
The arithmetic operations you saw on Day 2, "Basic Operators and Control Flow," also work on floating-point values. On that day, you saw an example of a miles-to-kilometers conversion program that uses floating-point arithmetic.
When you perform floating-point arithmetic, you must remember
the problems with precision and round-off error. Listing 3.3 illustrates
what can go wrong and shows you how to attack this problem.
Listing 3.3. A program that illustrates round-off error problems in floating-point arithmetic.
1: #!/usr/local/bin/perl 2: 3: $value = 9.01e+21 + 0.01 - 9.01e+21; 4: print ("first value is ", $value, "\n"); 5: $value = 9.01e+21 - 9.01e+21 + 0.01; 6: print ("second value is ", $value, "\n");
$ program3_3 first value is 0 second value is 0.01 $
Line 3 and line 5 both subtract 9.01e+21 from itself and add 0.01. However, as you can see when you examine the output produced by line 4 and line 6, the order in which you perform the addition and subtraction has a significant effect
In line 3, a very small number, 0.01, is added to a very large number, 9.01e+21. If you work it out yourself, you see that the result is 9.01000000000000000000001e+21.
The final 1 in the preceding number can be retained only on machines that support 24 digits of precision in their floating-point numbers. Most machines, as you've seen, handle only 16 or 17 digits. As a result, the final 1, along with some of the zeros, is lost, and the number instead is stored as 9.0100000000000000e+21.
This is the same as 9.01e+21, which means that subtracting 9.01e+21 yields zero. The 0.01 is lost along the way.
Line 5, however, doesn't have this problem. The two large numbers are operated on first, yielding 0, and then 0.01 is added. The result is what you expect: 0.01.
The moral of the story: Floating-point arithmetic is accurate only when you bunch together operations on large numbers. If the arithmetic operations are on values stored in variables, it might not be as easy to spot this problem.
$result = $number1 + $number2 - $number3;
If $number1 and $number3 contain large numbers and $number2 is small, $result is likely to contain an incorrect value because of the problem demonstrated in Listing 3.3.
So far, all the integer scalar values you've seen have been in what normally is called base 10 or decimal notation. Perl also enables you to use two other notations to represent integer scalar values:
To use octal notation, put a zero in front of your integer scalar value:
$result = 047;
This assigns 47 octal, or 39 decimal, to $result.
To use hexadecimal notation, put 0x in front of your integer scalar value, as follows:
$result = 0x1f;
This assigns 1f hexadecimal, or 31 decimal, to $result.
Perl accepts either uppercase letters or lowercase letters as representations of the digits a through f:
$result = 0xe; $result = 0xE;
Both of the preceding statements assign 14 (decimal) to $result.
If you are not familiar with octal and hexadecimal notations and would like to learn more, read the following sections. These sections explain how to convert numbers to different bases. If you are familiar with this concept, you can skip to the section called "Character Strings."
To understand how the octal and hexadecimal notations work, take a closer look at what the standard decimal notation actually represents.
In decimal notation, each digit in a number has one of 10 values: the standard numbers 0 through 9. Each digit in a number in decimal notation corresponds to a power of 10. Mathematically, the value of a digit x in a number is
x * 10 to the exponent n,
where n is the number of digits you have to skip before reaching x.
This might sound complicated, but it's really straightforward. For example, the number 243 can be expressed as follows:
Adding the three numbers together yields 243.
Working through these steps might seem like a waste of time when you are dealing with decimal notation. However, once you understand this method, reading numbers in other notations becomes simple.
For example, in octal notation, each digit x in a number is
x * 8 to the exponent n
where x is the value of the digit, and n is the number of digits to skip before reaching x. This is the same formula as in decimal notation, but with the 10 replaced by 8.
Using this method, here's how to determine the decimal equivalent of 243 octal:
Adding 128, 32 and 3 yields 163, which is the decimal notation equivalent of 243 octal.
Hexadecimal notation works the same way, but with 16 as the base instead of 10 or 8. For example, here's how to convert 243 hexadecimal to decimal notation:
Adding these three numbers together yields 579.
Note that the letters a through f represent the numbers 10 through 15, respectively. For example, here's the hexadecimal number fe in decimal notation:
Adding 240 and 14 yields 254, which is the decimal equivalent of fe.
You might be wondering why Perl bothers supporting octal and hexadecimal
notation. Here's the answer: Computers store numbers in memory
in binary (base 2) notation, not decimal (base 10) notation. Because
8 and 16 are multiples of 2, it is easier to represent stored
computer memory in base 8 or base 16 than in base 10. (You could
use base 2, of course; however, base 2 numbers are clumsy because
they are very long.)
NOTE |
Perl supports base-2 operations on integer scalar values. These operations, called bit-manipulation operations, are discussed on Day 4, "More Operators. |
On previous days, you've seen that Perl enables you to assign text to scalar variables. In the following statement, for instance
$var = "This is some text";
the text This is some text is an example of what is called a character string (frequently shortened to just string). A character string is a sequence of one or more letters, digits, spaces, or special characters.
The following subsections show you
NOTE |
C programmers should be advised that character strings in Perl do not contain a hidden null character at the end of the string. In Perl, null characters can appear anywhere in a string. (See the discussion of escape sequences later today for more details. |
Perl supports scalar variable substitution in character strings enclosed by double quotation-mark characters. For example, consider the following assignments:
$number = 11; $text = "This text contains the number $number.";
When the Perl interpreter sees $number inside the string in the second statement, it replaces $number with its current value. This means that the string assigned to $text is actually
This text contains the number 11.
The most immediate practical application of this is in the print statement. So far, many of the print statements you have seen contain several arguments, as in the following:
print ("The final result is ", $result, "\n");
Because Perl supports scalar variable substitution, you can combine the three arguments to print into a single argument, as in the following:
print ("The final result is $result\n");
NOTE |
From now on, examples and listings that call print use scalar variable substitution because it is easier to read |
Character strings that are enclosed in double quotes accept escape sequences for special characters. These escape sequences consist of a backslash (\) followed by one or more characters. The most common escape sequence is \n, which represents the newline character as shown in this example:
$text = "This is a string terminated by a newline\n";
Table 3.1 lists the escape sequences recognized in double-quoted
strings.
Description | |
Bell (beep) | |
Backspace | |
The Ctrl+n character | |
Escape | |
Ends the effect of \L, \U or \Q | |
Form feed | |
Forces the next letter into lowercase | |
All following letters are lowercase | |
Newline | |
Carriage return | |
Do not look for special pattern characters | |
Tab | |
Force next letter into uppercase | |
All following letters are uppercase | |
Vertical tab |
The \Q escape sequence is useful only when the string is used as a pattern. Patterns are described on Day 7, "Pattern Matching."
The escape sequences \L, \U, and \Q can be turned off by \E, as follows:
$a = "T\LHIS IS A \ESTRING"; # same as "This is a STRING"
To include a backslash or double quote in a double-quoted string, precede the backslash or quote with another backslash:
$result = "A quote \" in a string"; $result = "A backslash \\ in a string";
A backslash also enables you to include a $ character in a string. For example, the statements
$result = 14; print("The value of \$result is $result.\n");
print the following on your screen:
The value of $result is 14.
You can specify the ASCII value for a character in base 8 or octal notation using \nnn, where each n is an octal digit; for example:
$result = "\377"; # this is the character 255, or EOF
You can also use hexadecimal notation to specify the ASCII value for a character. To do this, use the sequence \xnn, where each n is a hexadecimal digit.
$result = "\xff"; # this is also 255
Listing 3.4 is an example of a program that uses escape sequences.
This program takes a line of input and converts it to a variety
of cases.
Listing 3.4. A case-conversion program.
1: #!/usr/local/bin/perl 2: 3: print ("Enter a line of input:\n"); 4: $inputline = <STDIN>; 5: print ("uppercase: \U$inputline\E\n"); 6: print ("lowercase: \L$inputline\E\n"); 7: print ("as a sentence: \L\u$inputline\E\n");
$ program3_4 Enter a line of input: tHis Is My INpUT LiNE. uppercase: THIS IS MY INPUT LINE. lowercase: this is my input line. as a sentence: This is my input line. $
Line 3 of this program reads a line of input and stores it in the scalar variable $inputline
Line 5 replaces the string $inputline with the current value of the scalar variable $inputline. The escape character \U tells the Perl interpreter to convert everything in the string into uppercase until it sees a \E character; as a result, line 4 writes the contents of $inputline in uppercase.
Similarly, line 6 writes the input line in all lowercase characters by specifying the escape character \L in the string.
Line 7 combines the escape characters \L and \u. The \L specifies that everything in the string is to be in lowercase; however, the \u special character temporarily overrides this and tells the Perl interpreter that the next character is to be in uppercase. When this character-the first character in the line-is printed, the \L escape character remains in force, and the rest of the line is printed in lowercase. The result is as if the input line is a single sentence in English. The first character is capitalized, and the remainder is in lowercase.
Perl also enables you to enclose strings using the ' (single quotation mark) character:
$text = 'This is a string in single quotes';
There are two differences between double-quoted strings and single-quoted strings. The first difference is that scalar variables are replaced by their values in double-quoted strings but not in single-quoted strings. The following is an example:
$string = "a string"; $text = "This is $string"; # becomes "This is a string" $text = 'This is $string'; # remains 'This is $string'
The second difference is that the backslash character, \, does not have a special meaning in single-quoted strings. This means that the statement
$text = 'This is a string.\n';
assigns the following string to $text:
This is a string.\n
The \ character is special in only two instances for single-quoted strings. The first is when you want to include a single-quote character ' in a string.
$text = 'This string contains \', a quote character';
The preceding line of code assigns the following string to $text:
This string contains ', a quote character
The second instance is to escape the backslash itself.
$text = 'This string ends with a backslash \\';
The preceding code line assigns the following string to $text:
This string ends with a backslash \
As you can see, the double backslash makes it possible for the
backslash character (\) to be the last character in a
string.
Single-quoted strings can be spread over multiple lines. The statement $text = 'This is two is equivalent to the statement $text = "This is two\nlines of text\n"; This means that if you forget the closing ' for a string, the Perl interpreter is likely to get quite confused because it won't detect an error until after it starts processing the next line |
As you've seen, you can use a scalar variable to store a character string, an integer, or a floating-point value. In scalar variables, a value that was assigned as a string can be used as an integer whenever it makes sense to do so, and vice versa. In the following example:
$string = "43"; $number = 28; $result = $string + $number;
the value of $string is converted to an integer and added to the value of $number. The result of the addition, 71, is assigned to $result.
Another instance in which strings are converted to integers is when you are reading a number from the standard input file. The following is some code similar to code you've seen before:
$number = <STDIN>; chop ($number); $result = $number + 1;
This is what is happening: When $number is assigned a
line of standard input, it really is being assigned a string.
For instance, if you enter 22, $number is assigned the
string 22\n (the \n represents the newline character).
The chop function removes the \n, leaving the
string 22, and this string is converted to the number
22 in the arithmetic expression.
If a string contains characters that are not digits, the string is converted to 0 when used in an integer context. For example: $result = "hello" * 5; # this assigns 0 to $result, since "hello" becomes 0 This is true even if the string is a valid hexadecimal integer if the quotes are removed, as in the following: $result = "0xff" + 1; In cases like this, Perl does not tell you that anything has gone wrong, and your results might not be what you expect. Also, strings containing misprints might not contain what you expect. For example: $result = "12O34"; # the letter O, not the number 0 When converting from a string to an integer, Perl starts at the left and continues until it sees a letter that is not a digit. In the preceding instance, 12O34 is converted to the integer 12, not 12034 |
In Perl, all scalar variables have an initial value of the null string, "". This means that you do not need to define a value for a scalar variable.
#!/usr/local/bin/perl $result = $undefined + 2; # $undefined is not defined print ("The value of \$result is $result.\n");
This short program is perfectly legal Perl. The output is
The value of $result is 2.
Because $undefined is not defined, the Perl interpreter
assumes that its value is the null string. This null string is
then converted to 0, because it is being used in an addition operation.
The result of the addition, 2, is assigned to $result.
TIP |
Although you can use uninitialized variables in your Perl programs, you shouldn't. If your Perl program gets to be large (as many complicated programs do), it might be difficult to determine whether a particular variable is supposed to be appearing for the first time or whether it is a spelling mistake that should be fixed. To avoid ambiguity and to make life easier for yourself, initialize every scalar variable before using it |
Perl supports three kinds of scalar values: integers, floating-point numbers, and character strings.
Integers can be in three notations: standard (decimal) notation, octal notation, and hexadecimal notation. Octal notation is indicated by a leading 0, and hexadecimal notation is indicated by a leading 0x. Integers are stored as floating-point values and can be as long as the machine's floating-point precision (usually 16 digits or so).
Floating-point numbers can consist of a string of digits that contain a decimal point and an optional exponent. The exponent's range can be anywhere from about e-309 to e+308. (This value might be different on some machines.) When possible, floating-point numbers are displayed without the exponent; failing that, they are displayed in scientific notation (one digit before the decimal point).
When you use floating-point arithmetic, be alert for round-off errors. Performing arithmetic operations in the proper order-operating on large numbers first-might yield better results.
You can enclose character strings in either double quotes (") or single quotes ('). If a scalar variable name appears in a character string enclosed in double quotes, the value of the variable is substituted for its name. Escape characters are recognized in strings enclosed in double quotes; these characters are indicated by a backslash (\).
Character strings in single quotes do not support escape characters, with the exception of \\ and \'. Scalar variable names are not replaced by their values.
Strings and integers are freely interchangeable in Perl whenever it is logically possible to do so.
Q: | If Perl character strings are not terminated by null characters, how does the Perl interpreter know the length of a string? |
A: | The Perl interpreter keeps track of the length of a string as well as its contents. In Perl, you do not need to use a null character to indicate "end of string." |
Q: | Why does Perl use floating-point registers for floating-point arithmetic even though they cause round-off errors? |
A: | Basically, it's a
performance issue. It's possible to write routines that store
floating-point numbers as strings and convert parts of these strings to
numbers as necessary; however, you often don't need more than 16 or so
digits of precision anyway. Applications that need to do high-speed arithmetic calculations of great precision usually run on special computers designed for that purpose. |
Q: | What happens if I forget to call chop when reading a number from the standard input file? |
A: | As it happens, nothing. Perl is smart enough to ignore white space at the end of a line that consists only of a number. However, it's a good idea to get into the habit of using chop to get rid of a trailing newline at all times, because the trailing newline becomes significant when you start doing string comparisons. (You'll learn about string comparisons on Day 4, "More Operators.") |
The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.