home *** CD-ROM | disk | FTP | other *** search
- =head1 NAME
-
- perlsyn - Perl syntax
-
- =head1 DESCRIPTION
-
- A Perl script consists of a sequence of declarations and statements.
- The sequence of statements is executed just once, unlike in B<sed>
- and B<awk> scripts, where the sequence of statements is executed
- for each input line. While this means that you must explicitly
- loop over the lines of your input file (or files), it also means
- you have much more control over which files and which lines you look at.
- (Actually, I'm lying--it is possible to do an implicit loop with
- either the B<-n> or B<-p> switch. It's just not the mandatory
- default like it is in B<sed> and B<awk>.)
-
- Perl is, for the most part, a free-form language. (The only exception
- to this is format declarations, for obvious reasons.) Text from a
- C<"#"> character until the end of the line is a comment, and is
- ignored. If you attempt to use C</* */> C-style comments, it will be
- interpreted either as division or pattern matching, depending on the
- context, and C++ C<//> comments just look like a null regular
- expression, so don't do that.
-
- =head2 Declarations
-
- The only things you need to declare in Perl are report formats
- and subroutines--and even undefined subroutines can be handled
- through AUTOLOAD. A variable holds the undefined value (C<undef>)
- until it has been assigned a defined value, which is anything
- other than C<undef>. When used as a number, C<undef> is treated
- as C<0>; when used as a string, it is treated the empty string,
- C<"">; and when used as a reference that isn't being assigned
- to, it is treated as an error. If you enable warnings, you'll
- be notified of an uninitialized value whenever you treat C<undef>
- as a string or a number. Well, usually. Boolean contexts, such as:
-
- my $a;
- if ($a) {}
-
- are exempt from warnings (because they care about truth rather than
- definedness). Operators such as C<++>, C<-->, C<+=>,
- C<-=>, and C<.=>, that operate on undefined left values such as:
-
- my $a;
- $a++;
-
- are also always exempt from such warnings.
-
- A declaration can be put anywhere a statement can, but has no effect on
- the execution of the primary sequence of statements--declarations all
- take effect at compile time. Typically all the declarations are put at
- the beginning or the end of the script. However, if you're using
- lexically-scoped private variables created with C<my()>, you'll
- have to make sure
- your format or subroutine definition is within the same block scope
- as the my if you expect to be able to access those private variables.
-
- Declaring a subroutine allows a subroutine name to be used as if it were a
- list operator from that point forward in the program. You can declare a
- subroutine without defining it by saying C<sub name>, thus:
-
- sub myname;
- $me = myname $0 or die "can't get myname";
-
- Note that myname() functions as a list operator, not as a unary operator;
- so be careful to use C<or> instead of C<||> in this case. However, if
- you were to declare the subroutine as C<sub myname ($)>, then
- C<myname> would function as a unary operator, so either C<or> or
- C<||> would work.
-
- Subroutines declarations can also be loaded up with the C<require> statement
- or both loaded and imported into your namespace with a C<use> statement.
- See L<perlmod> for details on this.
-
- A statement sequence may contain declarations of lexically-scoped
- variables, but apart from declaring a variable name, the declaration acts
- like an ordinary statement, and is elaborated within the sequence of
- statements as if it were an ordinary statement. That means it actually
- has both compile-time and run-time effects.
-
- =head2 Simple statements
-
- The only kind of simple statement is an expression evaluated for its
- side effects. Every simple statement must be terminated with a
- semicolon, unless it is the final statement in a block, in which case
- the semicolon is optional. (A semicolon is still encouraged there if the
- block takes up more than one line, because you may eventually add another line.)
- Note that there are some operators like C<eval {}> and C<do {}> that look
- like compound statements, but aren't (they're just TERMs in an expression),
- and thus need an explicit termination if used as the last item in a statement.
-
- Any simple statement may optionally be followed by a I<SINGLE> modifier,
- just before the terminating semicolon (or block ending). The possible
- modifiers are:
-
- if EXPR
- unless EXPR
- while EXPR
- until EXPR
- foreach EXPR
-
- The C<if> and C<unless> modifiers have the expected semantics,
- presuming you're a speaker of English. The C<foreach> modifier is an
- iterator: For each value in EXPR, it aliases C<$_> to the value and
- executes the statement. The C<while> and C<until> modifiers have the
- usual "C<while> loop" semantics (conditional evaluated first), except
- when applied to a C<do>-BLOCK (or to the deprecated C<do>-SUBROUTINE
- statement), in which case the block executes once before the
- conditional is evaluated. This is so that you can write loops like:
-
- do {
- $line = <STDIN>;
- ...
- } until $line eq ".\n";
-
- See L<perlfunc/do>. Note also that the loop control statements described
- later will I<NOT> work in this construct, because modifiers don't take
- loop labels. Sorry. You can always put another block inside of it
- (for C<next>) or around it (for C<last>) to do that sort of thing.
- For C<next>, just double the braces:
-
- do {{
- next if $x == $y;
- # do something here
- }} until $x++ > $z;
-
- For C<last>, you have to be more elaborate:
-
- LOOP: {
- do {
- last if $x = $y**2;
- # do something here
- } while $x++ <= $z;
- }
-
- =head2 Compound statements
-
- In Perl, a sequence of statements that defines a scope is called a block.
- Sometimes a block is delimited by the file containing it (in the case
- of a required file, or the program as a whole), and sometimes a block
- is delimited by the extent of a string (in the case of an eval).
-
- But generally, a block is delimited by curly brackets, also known as braces.
- We will call this syntactic construct a BLOCK.
-
- The following compound statements may be used to control flow:
-
- if (EXPR) BLOCK
- if (EXPR) BLOCK else BLOCK
- if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
- LABEL while (EXPR) BLOCK
- LABEL while (EXPR) BLOCK continue BLOCK
- LABEL for (EXPR; EXPR; EXPR) BLOCK
- LABEL foreach VAR (LIST) BLOCK
- LABEL foreach VAR (LIST) BLOCK continue BLOCK
- LABEL BLOCK continue BLOCK
-
- Note that, unlike C and Pascal, these are defined in terms of BLOCKs,
- not statements. This means that the curly brackets are I<required>--no
- dangling statements allowed. If you want to write conditionals without
- curly brackets there are several other ways to do it. The following
- all do the same thing:
-
- if (!open(FOO)) { die "Can't open $FOO: $!"; }
- die "Can't open $FOO: $!" unless open(FOO);
- open(FOO) or die "Can't open $FOO: $!"; # FOO or bust!
- open(FOO) ? 'hi mom' : die "Can't open $FOO: $!";
- # a bit exotic, that last one
-
- The C<if> statement is straightforward. Because BLOCKs are always
- bounded by curly brackets, there is never any ambiguity about which
- C<if> an C<else> goes with. If you use C<unless> in place of C<if>,
- the sense of the test is reversed.
-
- The C<while> statement executes the block as long as the expression is
- true (does not evaluate to the null string C<""> or C<0> or C<"0">).
- The LABEL is optional, and if present, consists of an identifier followed
- by a colon. The LABEL identifies the loop for the loop control
- statements C<next>, C<last>, and C<redo>.
- If the LABEL is omitted, the loop control statement
- refers to the innermost enclosing loop. This may include dynamically
- looking back your call-stack at run time to find the LABEL. Such
- desperate behavior triggers a warning if you use the C<use warnings>
- pragma or the B<-w> flag.
- Unlike a C<foreach> statement, a C<while> statement never implicitly
- localises any variables.
-
- If there is a C<continue> BLOCK, it is always executed just before the
- conditional is about to be evaluated again, just like the third part of a
- C<for> loop in C. Thus it can be used to increment a loop variable, even
- when the loop has been continued via the C<next> statement (which is
- similar to the C C<continue> statement).
-
- =head2 Loop Control
-
- The C<next> command is like the C<continue> statement in C; it starts
- the next iteration of the loop:
-
- LINE: while (<STDIN>) {
- next LINE if /^#/; # discard comments
- ...
- }
-
- The C<last> command is like the C<break> statement in C (as used in
- loops); it immediately exits the loop in question. The
- C<continue> block, if any, is not executed:
-
- LINE: while (<STDIN>) {
- last LINE if /^$/; # exit when done with header
- ...
- }
-
- The C<redo> command restarts the loop block without evaluating the
- conditional again. The C<continue> block, if any, is I<not> executed.
- This command is normally used by programs that want to lie to themselves
- about what was just input.
-
- For example, when processing a file like F</etc/termcap>.
- If your input lines might end in backslashes to indicate continuation, you
- want to skip ahead and get the next record.
-
- while (<>) {
- chomp;
- if (s/\\$//) {
- $_ .= <>;
- redo unless eof();
- }
- # now process $_
- }
-
- which is Perl short-hand for the more explicitly written version:
-
- LINE: while (defined($line = <ARGV>)) {
- chomp($line);
- if ($line =~ s/\\$//) {
- $line .= <ARGV>;
- redo LINE unless eof(); # not eof(ARGV)!
- }
- # now process $line
- }
-
- Note that if there were a C<continue> block on the above code, it would
- get executed only on lines discarded by the regex (since redo skips the
- continue block). A continue block is often used to reset line counters
- or C<?pat?> one-time matches:
-
- # inspired by :1,$g/fred/s//WILMA/
- while (<>) {
- ?(fred)? && s//WILMA $1 WILMA/;
- ?(barney)? && s//BETTY $1 BETTY/;
- ?(homer)? && s//MARGE $1 MARGE/;
- } continue {
- print "$ARGV $.: $_";
- close ARGV if eof(); # reset $.
- reset if eof(); # reset ?pat?
- }
-
- If the word C<while> is replaced by the word C<until>, the sense of the
- test is reversed, but the conditional is still tested before the first
- iteration.
-
- The loop control statements don't work in an C<if> or C<unless>, since
- they aren't loops. You can double the braces to make them such, though.
-
- if (/pattern/) {{
- last if /fred/;
- next if /barney/; # same effect as "last", but doesn't document as well
- # do something here
- }}
-
- This is caused by the fact that a block by itself acts as a loop that
- executes once, see L<"Basic BLOCKs and Switch Statements">.
-
- The form C<while/if BLOCK BLOCK>, available in Perl 4, is no longer
- available. Replace any occurrence of C<if BLOCK> by C<if (do BLOCK)>.
-
- =head2 For Loops
-
- Perl's C-style C<for> loop works like the corresponding C<while> loop;
- that means that this:
-
- for ($i = 1; $i < 10; $i++) {
- ...
- }
-
- is the same as this:
-
- $i = 1;
- while ($i < 10) {
- ...
- } continue {
- $i++;
- }
-
- There is one minor difference: if variables are declared with C<my>
- in the initialization section of the C<for>, the lexical scope of
- those variables is exactly the C<for> loop (the body of the loop
- and the control sections).
-
- Besides the normal array index looping, C<for> can lend itself
- to many other interesting applications. Here's one that avoids the
- problem you get into if you explicitly test for end-of-file on
- an interactive file descriptor causing your program to appear to
- hang.
-
- $on_a_tty = -t STDIN && -t STDOUT;
- sub prompt { print "yes? " if $on_a_tty }
- for ( prompt(); <STDIN>; prompt() ) {
- # do something
- }
-
- =head2 Foreach Loops
-
- The C<foreach> loop iterates over a normal list value and sets the
- variable VAR to be each element of the list in turn. If the variable
- is preceded with the keyword C<my>, then it is lexically scoped, and
- is therefore visible only within the loop. Otherwise, the variable is
- implicitly local to the loop and regains its former value upon exiting
- the loop. If the variable was previously declared with C<my>, it uses
- that variable instead of the global one, but it's still localized to
- the loop.
-
- The C<foreach> keyword is actually a synonym for the C<for> keyword, so
- you can use C<foreach> for readability or C<for> for brevity. (Or because
- the Bourne shell is more familiar to you than I<csh>, so writing C<for>
- comes more naturally.) If VAR is omitted, C<$_> is set to each value.
-
- If any element of LIST is an lvalue, you can modify it by modifying
- VAR inside the loop. Conversely, if any element of LIST is NOT an
- lvalue, any attempt to modify that element will fail. In other words,
- the C<foreach> loop index variable is an implicit alias for each item
- in the list that you're looping over.
-
- If any part of LIST is an array, C<foreach> will get very confused if
- you add or remove elements within the loop body, for example with
- C<splice>. So don't do that.
-
- C<foreach> probably won't do what you expect if VAR is a tied or other
- special variable. Don't do that either.
-
- Examples:
-
- for (@ary) { s/foo/bar/ }
-
- for my $elem (@elements) {
- $elem *= 2;
- }
-
- for $count (10,9,8,7,6,5,4,3,2,1,'BOOM') {
- print $count, "\n"; sleep(1);
- }
-
- for (1..15) { print "Merry Christmas\n"; }
-
- foreach $item (split(/:[\\\n:]*/, $ENV{TERMCAP})) {
- print "Item: $item\n";
- }
-
- Here's how a C programmer might code up a particular algorithm in Perl:
-
- for (my $i = 0; $i < @ary1; $i++) {
- for (my $j = 0; $j < @ary2; $j++) {
- if ($ary1[$i] > $ary2[$j]) {
- last; # can't go to outer :-(
- }
- $ary1[$i] += $ary2[$j];
- }
- # this is where that last takes me
- }
-
- Whereas here's how a Perl programmer more comfortable with the idiom might
- do it:
-
- OUTER: for my $wid (@ary1) {
- INNER: for my $jet (@ary2) {
- next OUTER if $wid > $jet;
- $wid += $jet;
- }
- }
-
- See how much easier this is? It's cleaner, safer, and faster. It's
- cleaner because it's less noisy. It's safer because if code gets added
- between the inner and outer loops later on, the new code won't be
- accidentally executed. The C<next> explicitly iterates the other loop
- rather than merely terminating the inner one. And it's faster because
- Perl executes a C<foreach> statement more rapidly than it would the
- equivalent C<for> loop.
-
- =head2 Basic BLOCKs and Switch Statements
-
- A BLOCK by itself (labeled or not) is semantically equivalent to a
- loop that executes once. Thus you can use any of the loop control
- statements in it to leave or restart the block. (Note that this is
- I<NOT> true in C<eval{}>, C<sub{}>, or contrary to popular belief
- C<do{}> blocks, which do I<NOT> count as loops.) The C<continue>
- block is optional.
-
- The BLOCK construct is particularly nice for doing case
- structures.
-
- SWITCH: {
- if (/^abc/) { $abc = 1; last SWITCH; }
- if (/^def/) { $def = 1; last SWITCH; }
- if (/^xyz/) { $xyz = 1; last SWITCH; }
- $nothing = 1;
- }
-
- There is no official C<switch> statement in Perl, because there are
- already several ways to write the equivalent.
-
- However, starting from Perl 5.8 to get switch and case one can use
- the Switch extension and say:
-
- use Switch;
-
- after which one has switch and case. It is not as fast as it could be
- because it's not really part of the language (it's done using source
- filters) but it is available, and it's very flexible.
-
- In addition to the above BLOCK construct, you could write
-
- SWITCH: {
- $abc = 1, last SWITCH if /^abc/;
- $def = 1, last SWITCH if /^def/;
- $xyz = 1, last SWITCH if /^xyz/;
- $nothing = 1;
- }
-
- (That's actually not as strange as it looks once you realize that you can
- use loop control "operators" within an expression, That's just the normal
- C comma operator.)
-
- or
-
- SWITCH: {
- /^abc/ && do { $abc = 1; last SWITCH; };
- /^def/ && do { $def = 1; last SWITCH; };
- /^xyz/ && do { $xyz = 1; last SWITCH; };
- $nothing = 1;
- }
-
- or formatted so it stands out more as a "proper" C<switch> statement:
-
- SWITCH: {
- /^abc/ && do {
- $abc = 1;
- last SWITCH;
- };
-
- /^def/ && do {
- $def = 1;
- last SWITCH;
- };
-
- /^xyz/ && do {
- $xyz = 1;
- last SWITCH;
- };
- $nothing = 1;
- }
-
- or
-
- SWITCH: {
- /^abc/ and $abc = 1, last SWITCH;
- /^def/ and $def = 1, last SWITCH;
- /^xyz/ and $xyz = 1, last SWITCH;
- $nothing = 1;
- }
-
- or even, horrors,
-
- if (/^abc/)
- { $abc = 1 }
- elsif (/^def/)
- { $def = 1 }
- elsif (/^xyz/)
- { $xyz = 1 }
- else
- { $nothing = 1 }
-
- A common idiom for a C<switch> statement is to use C<foreach>'s aliasing to make
- a temporary assignment to C<$_> for convenient matching:
-
- SWITCH: for ($where) {
- /In Card Names/ && do { push @flags, '-e'; last; };
- /Anywhere/ && do { push @flags, '-h'; last; };
- /In Rulings/ && do { last; };
- die "unknown value for form variable where: `$where'";
- }
-
- Another interesting approach to a switch statement is arrange
- for a C<do> block to return the proper value:
-
- $amode = do {
- if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0?
- elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" }
- elsif ($flag & O_RDWR) {
- if ($flag & O_CREAT) { "w+" }
- else { ($flag & O_APPEND) ? "a+" : "r+" }
- }
- };
-
- Or
-
- print do {
- ($flags & O_WRONLY) ? "write-only" :
- ($flags & O_RDWR) ? "read-write" :
- "read-only";
- };
-
- Or if you are certain that all the C<&&> clauses are true, you can use
- something like this, which "switches" on the value of the
- C<HTTP_USER_AGENT> environment variable.
-
- #!/usr/bin/perl
- # pick out jargon file page based on browser
- $dir = 'http://www.wins.uva.nl/~mes/jargon';
- for ($ENV{HTTP_USER_AGENT}) {
- $page = /Mac/ && 'm/Macintrash.html'
- || /Win(dows )?NT/ && 'e/evilandrude.html'
- || /Win|MSIE|WebTV/ && 'm/MicroslothWindows.html'
- || /Linux/ && 'l/Linux.html'
- || /HP-UX/ && 'h/HP-SUX.html'
- || /SunOS/ && 's/ScumOS.html'
- || 'a/AppendixB.html';
- }
- print "Location: $dir/$page\015\012\015\012";
-
- That kind of switch statement only works when you know the C<&&> clauses
- will be true. If you don't, the previous C<?:> example should be used.
-
- You might also consider writing a hash of subroutine references
- instead of synthesizing a C<switch> statement.
-
- =head2 Goto
-
- Although not for the faint of heart, Perl does support a C<goto>
- statement. There are three forms: C<goto>-LABEL, C<goto>-EXPR, and
- C<goto>-&NAME. A loop's LABEL is not actually a valid target for
- a C<goto>; it's just the name of the loop.
-
- The C<goto>-LABEL form finds the statement labeled with LABEL and resumes
- execution there. It may not be used to go into any construct that
- requires initialization, such as a subroutine or a C<foreach> loop. It
- also can't be used to go into a construct that is optimized away. It
- can be used to go almost anywhere else within the dynamic scope,
- including out of subroutines, but it's usually better to use some other
- construct such as C<last> or C<die>. The author of Perl has never felt the
- need to use this form of C<goto> (in Perl, that is--C is another matter).
-
- The C<goto>-EXPR form expects a label name, whose scope will be resolved
- dynamically. This allows for computed C<goto>s per FORTRAN, but isn't
- necessarily recommended if you're optimizing for maintainability:
-
- goto(("FOO", "BAR", "GLARCH")[$i]);
-
- The C<goto>-&NAME form is highly magical, and substitutes a call to the
- named subroutine for the currently running subroutine. This is used by
- C<AUTOLOAD()> subroutines that wish to load another subroutine and then
- pretend that the other subroutine had been called in the first place
- (except that any modifications to C<@_> in the current subroutine are
- propagated to the other subroutine.) After the C<goto>, not even C<caller()>
- will be able to tell that this routine was called first.
-
- In almost all cases like this, it's usually a far, far better idea to use the
- structured control flow mechanisms of C<next>, C<last>, or C<redo> instead of
- resorting to a C<goto>. For certain applications, the catch and throw pair of
- C<eval{}> and die() for exception processing can also be a prudent approach.
-
- =head2 PODs: Embedded Documentation
-
- Perl has a mechanism for intermixing documentation with source code.
- While it's expecting the beginning of a new statement, if the compiler
- encounters a line that begins with an equal sign and a word, like this
-
- =head1 Here There Be Pods!
-
- Then that text and all remaining text up through and including a line
- beginning with C<=cut> will be ignored. The format of the intervening
- text is described in L<perlpod>.
-
- This allows you to intermix your source code
- and your documentation text freely, as in
-
- =item snazzle($)
-
- The snazzle() function will behave in the most spectacular
- form that you can possibly imagine, not even excepting
- cybernetic pyrotechnics.
-
- =cut back to the compiler, nuff of this pod stuff!
-
- sub snazzle($) {
- my $thingie = shift;
- .........
- }
-
- Note that pod translators should look at only paragraphs beginning
- with a pod directive (it makes parsing easier), whereas the compiler
- actually knows to look for pod escapes even in the middle of a
- paragraph. This means that the following secret stuff will be
- ignored by both the compiler and the translators.
-
- $a=3;
- =secret stuff
- warn "Neither POD nor CODE!?"
- =cut back
- print "got $a\n";
-
- You probably shouldn't rely upon the C<warn()> being podded out forever.
- Not all pod translators are well-behaved in this regard, and perhaps
- the compiler will become pickier.
-
- One may also use pod directives to quickly comment out a section
- of code.
-
- =head2 Plain Old Comments (Not!)
-
- Much like the C preprocessor, Perl can process line directives. Using
- this, one can control Perl's idea of filenames and line numbers in
- error or warning messages (especially for strings that are processed
- with C<eval()>). The syntax for this mechanism is the same as for most
- C preprocessors: it matches the regular expression
- C</^#\s*line\s+(\d+)\s*(?:\s"([^"]+)")?\s*$/> with C<$1> being the line
- number for the next line, and C<$2> being the optional filename
- (specified within quotes).
-
- There is a fairly obvious gotcha included with the line directive:
- Debuggers and profilers will only show the last source line to appear
- at a particular line number in a given file. Care should be taken not
- to cause line number collisions in code you'd like to debug later.
-
- Here are some examples that you should be able to type into your command
- shell:
-
- % perl
- # line 200 "bzzzt"
- # the `#' on the previous line must be the first char on line
- die 'foo';
- __END__
- foo at bzzzt line 201.
-
- % perl
- # line 200 "bzzzt"
- eval qq[\n#line 2001 ""\ndie 'foo']; print $@;
- __END__
- foo at - line 2001.
-
- % perl
- eval qq[\n#line 200 "foo bar"\ndie 'foo']; print $@;
- __END__
- foo at foo bar line 200.
-
- % perl
- # line 345 "goop"
- eval "\n#line " . __LINE__ . ' "' . __FILE__ ."\"\ndie 'foo'";
- print $@;
- __END__
- foo at goop line 345.
-
- =cut
-