home *** CD-ROM | disk | FTP | other *** search
Text File | 1989-04-13 | 48.8 KB | 1,386 lines |
- Info file gawk-info, produced by Makeinfo, -*- Text -*- from input
- file gawk.texinfo.
-
- This file documents `awk', a program that you can use to select
- particular records in a file and perform operations upon them.
-
- Copyright (C) 1989 Free Software Foundation, Inc.
-
- Permission is granted to make and distribute verbatim copies of this
- manual provided the copyright notice and this permission notice are
- preserved on all copies.
-
- Permission is granted to copy and distribute modified versions of
- this manual under the conditions for verbatim copying, provided that
- the entire resulting derived work is distributed under the terms of a
- permission notice identical to this one.
-
- Permission is granted to copy and distribute translations of this
- manual into another language, under the above conditions for modified
- versions, except that this permission notice may be stated in a
- translation approved by the Foundation.
-
-
- File: gawk-info, Node: Patterns, Next: Actions, Prev: One-liners, Up: Top
-
- Patterns
- ********
-
- Patterns control the execution of rules: a rule is executed when its
- pattern matches the input record. The `awk' language provides
- several special patterns that are described in the sections that
- follow. Patterns include:
-
- NULL
- The empty pattern, which matches every input record. (*Note The
- Empty Pattern: Empty.)
-
- /REGULAR EXPRESSION/
- A regular expression as a pattern. It matches when the text of
- the input record fits the regular expression. (*Note Regular
- Expressions as Patterns: Regexp.)
-
- CONDEXP
- A single comparison expression. It matches when it is true.
- (*Note Comparison Expressions as Patterns: Comparison Patterns.)
-
- `BEGIN'
- `END'
- Special patterns to supply start--up or clean--up information to
- `awk'. (*Note Specifying Record Ranges With Patterns: BEGIN/END.)
-
- PAT1, PAT2
- A pair of patterns separated by a comma, specifying a range of
- records. (*Note Specifying Record Ranges With Patterns: Ranges.)
-
- CONDEXP1 BOOLEAN CONDEXP2
- A "compound" pattern, which combines expressions with the
- operators `and', `&&', and `or', `||'. (*Note Boolean
- Operators and Patterns: Boolean.)
-
- ! CONDEXP
- The pattern CONDEXP is evaluated. Then the `!' performs a
- boolean ``not'' or logical negation operation; if the input line
- matches the pattern in CONDEXP then the associated action is
- *not* executed. If the input line did not match that pattern,
- then the action *is* executed. (*Note Boolean Operators and
- Patterns: Boolean.)
-
- (EXPR)
- Parentheses may be used to control how operators nest.
-
- PAT1 ? PAT2 : PAT3
- The first pattern is evaluated. If it is true, the input line
- is tested against the second pattern, otherwise it is tested
- against the third. (*Note Conditional Patterns: Conditional
- Patterns.)
-
- * Menu:
-
- The following subsections describe these forms in detail:
-
- * Empty:: The empty pattern, which matches every record.
-
- * Regexp:: Regular expressions such as `/foo/'.
-
- * Comparison Patterns:: Comparison expressions such as `$1 > 10'.
-
- * Boolean:: Combining comparison expressions.
-
- * Ranges:: Using pairs of patterns to specify record ranges.
-
- * BEGIN/END:: Specifying initialization and cleanup rules.
-
- * Conditional Patterns:: Patterns such as `pat1 ? pat2 : pat3'.
-
-
- File: gawk-info, Node: Empty, Next: Regexp, Up: Patterns
-
- The Empty Pattern
- =================
-
- An empty pattern is considered to match *every* input record. For
- example, the program:
-
- awk '{ print $1 }' BBS-list
-
- prints just the first field of every record.
-
-
- File: gawk-info, Node: Regexp, Next: Comparison Patterns, Prev: Empty, Up: Patterns
-
- Regular Expressions as Patterns
- ===============================
-
- A "regular expression", or "regexp", is a way of describing classes
- of strings. When enclosed in slashes (`/'), it makes an `awk'
- pattern that matches every input record that contains a match for the
- regexp.
-
- The simplest regular expression is a sequence of letters, numbers, or
- both. Such a regexp matches any string that contains that sequence.
- Thus, the regexp `foo' matches any string containing `foo'. (More
- complicated regexps let you specify classes of similar strings.)
-
- * Menu:
-
- * Usage: Regexp Usage. How regexps are used in patterns.
- * Operators: Regexp Operators. How to write a regexp.
-
-
- File: gawk-info, Node: Regexp Usage, Next: Regexp Operators, Up: Regexp
-
- How to use Regular Expressions
- ------------------------------
-
- When you enclose `foo' in slashes, you get a pattern that matches a
- record that contains `foo'. For example, this prints the second
- field of each record that contains `foo' anywhere:
-
- awk '/foo/ { print $2 }' BBS-list
-
- Regular expressions can also be used in comparison expressions. Then
- you can specify the string to match against; it need not be the
- entire current input record. These comparison expressions can be
- used as patterns or in `if' and `while' statements.
-
- `EXP ~ /REGEXP/'
- This is true if the expression EXP (taken as a character string)
- is matched by REGEXP. The following example matches, or
- selects, all input records with the letter `J' in the first field:
-
- awk '$1 ~ /J/' inventory-shipped
-
- So does this:
-
- awk '{ if ($1 ~ /J/) print }' inventory-shipped
-
- `EXP !~ /REGEXP/'
- This is true if the expression EXP (taken as a character string)
- is *not* matched by REGEXP. The following example matches, or
- selects, all input records whose first field *does not* contain
- the letter `J':
-
- awk '$1 !~ /J/' inventory-shipped
-
- The right hand side of a `~' or `!~' operator need not be a constant
- regexp (i.e. a string of characters between `/'s). It can also be
- "computed", or "dynamic". For example:
-
- identifier = "[A-Za-z_][A-Za-z_0-9]+"
- $0 ~ identifier
-
- sets `identifier' to a regexp that describes `awk' variable names,
- and tests if the input record matches this regexp.
-
- A dynamic regexp may actually be any expression. The expression is
- evaluated, and the result is treated as a string that describes a
- regular expression.
-
-
- File: gawk-info, Node: Regexp Operators, Prev: Regexp Usage, Up: Regexp
-
- Regular Expression Operators
- ----------------------------
-
- You can combine regular expressions with the following characters,
- called "regular expression operators", or "metacharacters", to
- increase the power and versatility of regular expressions. This is a
- table of metacharacters:
-
- `\'
- This is used to suppress the special meaning of a character when
- matching. For example:
-
- \$
-
- matches the character `$'.
-
- `^'
- This matches the beginning of the string or the beginning of a
- line within the string. For example:
-
- ^@chapter
-
- matches the `@chapter' at the beginning of a string, and can be
- used to identify chapter beginnings in Texinfo source files.
-
- `$'
- This is similar to `^', but it matches only at the end of a
- string or the end of a line within the string. For example:
-
- /p$/
-
- as a pattern matches a record that ends with a `p'.
-
- `.'
- This matches any single character except a newline. For example:
-
- .P
-
- matches any single character followed by a `P' in a string.
- Using concatenation we can make regular expressions like `U.A',
- which matches any three--character string that begins with `U'
- and ends with `A'.
-
- `[...]'
- This is called a "character set". It matches any one of a group
- of characters that are enclosed in the square brackets. For
- example:
-
- [MVX]
-
- matches any of the characters `M', `V', or `X' in a string.
-
- Ranges of characters are indicated by using a hyphen between the
- beginning and ending characters, and enclosing the whole thing
- in brackets. For example:
-
- [0-9]
-
- matches any string that contains a digit.
-
- Note that special patterns have to be followed to match the
- characters, `]', `-', and `^' when they are enclosed in the
- square brackets. To match a `]', make it the first character in
- the set. For example:
-
- []d]
-
- matches either `]', or `d'.
-
- To match `-', write it as `--', which is a range containing only
- `-'. You may also make the `-' be the first or last character
- in the set. To match `^', make it any character except the
- first one of a set.
-
- `[^ ...]'
- This is the "complemented character set". The first character
- after the `[' *must* be a `^'. This matches any characters
- *except* those in the square brackets. For example:
-
- [^0-9]
-
- matches any characters that are not digits.
-
- `|'
- This is the "alternation operator" and it is used to specify
- alternatives. For example:
-
- ^P|[0-9]
-
- matches any string that matches either `^P' or `[0-9]'. This
- means it matches any string that contains a digit or starts with
- `P'.
-
- `(...)'
- Parentheses are used for grouping in regular expressions as in
- arithmetic. They can be used to concatenate regular expressions
- containing the alternation operator, `|'.
-
- `*'
- This symbol means that the preceding regular expression is to be
- repeated as many times as possible to find a match. For example:
-
- ph*
-
- applies the `*' symbol to the preceding `h' and looks for
- matches to one `p' followed by any number of `h''s. This will
- also match just `p' if no `h''s are present.
-
- The `*' means repeat the *smallest* possible preceding
- expression in order to find a match. The `awk' language
- processes a `*' by matching as many repetitions as can be found.
- For example:
-
- awk '/\(c[ad][ad]*r x\)/ { print }' sample
-
- matches every record in the input containing a string of the
- form `(car x)', `(cdr x)', `(cadr x)', and so on.
-
- `+'
- This symbol is similar to `*', but the preceding expression must
- be matched at least once. This means that:
-
- wh+y
-
- would match `why' and `whhy' but not `wy', whereas `wh*y' would
- match all three of these strings. And this is a simpler way of
- writing the last `*' example:
-
- awk '/\(c[ad]+r x\)/ { print }' sample
-
- `?'
- This symbol is similar to `*', but the preceding expression can
- be matched once or not at all. For example:
-
- fe?d
-
- will match `fed' or `fd', but nothing else.
-
- In regular expressions, the `*', `+', and `?' operators have the
- highest precedence, followed by concatenation, and finally by `|'.
- As in arithmetic, parentheses can change how operators are grouped.
-
- Any other character stands for itself. However, it is important to
- note that case in regular expressions *is* significant, both when
- matching ordinary (i.e. non--metacharacter) characters, and inside
- character sets. Thus a `w' in a regular expression matches only a
- lower case `w' and not either an uppercase or lowercase `w'. When
- you want to do a case--independent match, you have to use a character
- set: `[Ww]'.
-
-
- File: gawk-info, Node: Comparison Patterns, Next: Ranges, Prev: Regexp, Up: Patterns
-
- Comparison Expressions as Patterns
- ==================================
-
- "Comparison patterns" use "relational operators" to compare strings
- or numbers. The relational operators are the same as in C. Here is
- a table of them:
-
- `X < Y'
- True if X is less than Y.
-
- `X <= Y'
- True if X is less than or equal to Y.
-
- `X > Y'
- True if X is greater than Y.
-
- `X >= Y'
- True if X is greater than or equal to Y.
-
- `X == Y'
- True if X is equal to Y.
-
- `X != Y'
- True if X is not equal to Y.
-
- Comparison expressions can be used as patterns to control whether a
- rule is executed. The expression is evaluated for each input record
- read, and the pattern is considered matched if the condition is "true".
-
- The operands of a relational operator are compared as numbers if they
- are both numbers. Otherwise they are converted to, and compared as,
- strings (*note Conversion::.). Strings are compared by comparing the
- first character of each, then the second character of each, and so on.
- Thus, `"10"' is less than `"9"'.
-
- The following example prints the second field of each input record
- whose first field is precisely `foo'.
-
- awk '$1 == "foo" { print $2 }' BBS-list
-
- Contrast this with the following regular expression match, which
- would accept any record with a first field that contains `foo':
-
- awk '$1 ~ "foo" { print $2 }' BBS-list
-
-
- File: gawk-info, Node: Ranges, Next: BEGIN/END, Prev: Comparison Patterns, Up: Patterns
-
- Specifying Record Ranges With Patterns
- ======================================
-
- A "range pattern" is made of two patterns separated by a comma:
- `BEGPAT, ENDPAT'. It matches ranges of consecutive input records.
- The first pattern BEGPAT controls where the range begins, and the
- second one ENDPAT controls where it ends.
-
- They work as follows: BEGPAT is matched against every input record;
- when a record matches BEGPAT, the range pattern becomes "turned on".
- The range pattern matches this record. As long as it stays turned
- on, it automatically matches every input record read. But meanwhile,
- ENDPAT is matched against every input record, and when it matches,
- the range pattern is turned off again for the following record. Now
- we go back to checking BEGPAT against each record. For example:
-
- awk '$1 == "on", $1 == "off"'
-
- prints every record between on/off pairs, inclusive.
-
- The record that turns on the range pattern and the one that turns it
- off both match the range pattern. If you don't want to operate on
- these records, you can write `if' statements in the rule's action to
- distinguish them.
-
- It is possible for a pattern to be turned both on and off by the same
- record, if both conditions are satisfied by that record. Then the
- action is executed for just that record.
-
-
- File: gawk-info, Node: BEGIN/END, Next: Boolean, Prev: Ranges, Up: Patterns
-
- `BEGIN' and `END' Special Patterns
- ==================================
-
- `BEGIN' and `END' are special patterns. They are not used to match
- input records. Rather, they are used for supplying start--up or
- clean--up information to your `awk' script. A `BEGIN' rule is
- executed, once, before the first input record has been read. An
- `END' rule is executed, once, after all the input has been read. For
- example:
-
- awk 'BEGIN { print "Analysis of ``foo'' program" }
- /foo/ { ++foobar }
- END { print "``foo'' appears " foobar " times." }' BBS-list
-
- This program finds out how many times the string `foo' appears in the
- input file `BBS-list'. The `BEGIN' pattern prints out a title for
- the report. There is no need to use the `BEGIN' pattern to
- initialize the counter `foobar' to zero, as `awk' does this for us
- automatically (*note Variables::.). The second rule increments the
- variable `foobar' every time a record containing the pattern `foo' is
- read. The last rule prints out the value of `foobar' at the end of
- the run.
-
- The special patterns `BEGIN' and `END' do not combine with other
- kinds of patterns.
-
- An `awk' program may have multiple `BEGIN' and/or `END' rules. The
- contents of multiple `BEGIN' or `END' rules are treated as if they
- had been enclosed in a single rule, in the order that the rules are
- encountered in the `awk' program. (This feature was introduced with
- the new version of `awk'.)
-
- Multiple `BEGIN' and `END' sections are also useful for writing
- library functions that need to do initialization and/or cleanup of
- their own. Note that the order in which library functions are named
- on the command line will affect the order in which their `BEGIN' and
- `END' rules will be executed. Therefore you have to be careful how
- you write your library functions. (*Note Command Line::, for more
- information on using library functions.)
-
- If an `awk' program only has a `BEGIN' rule, and no other rules, then
- the program will exit after the `BEGIN' rule has been run. Older
- versions of `awk' used to read their input until end of file was
- seen. However, if an `END' rule exists as well, then the input will
- be read, even if there are no other rules in the program.
-
- `BEGIN' and `END' rules must have actions; there is no default action
- for these rules since there is no current record when they run.
-
-
- File: gawk-info, Node: Boolean, Next: Conditional Patterns, Prev: BEGIN/END, Up: Patterns
-
- Boolean Operators and Patterns
- ==============================
-
- A boolean pattern is a combination of other patterns using the
- boolean operators ``or'' (`||'), ``and'' (`&&'), and ``not'' (`!'),
- along with parentheses to control nesting. Whether the boolean
- pattern matches an input record is computed from whether its
- subpatterns match.
-
- The subpatterns of a boolean pattern can be regular expressions,
- matching expressions, comparisons, or other boolean combinations of
- such. Range patterns cannot appear inside boolean operators, since
- they don't make sense for classifying a single record, and neither
- can the special patterns `BEGIN' and `END', which never match any
- input record.
-
- Here are descriptions of the three boolean operators.
-
- `PAT1 && PAT2'
- Matches if both PAT1 and PAT2 match by themselves. For example,
- the following command prints all records in the input file
- `BBS-list' that contain both `2400' and `foo'.
-
- awk '/2400/ && /foo/' BBS-list
-
- Whether PAT2 matches is tested only if PAT1 succeeds. This can
- make a difference when PAT2 contains expressions that have side
- effects: in the case of `/foo/ && ($2 == bar++)', the variable
- `bar' is not incremented if there is no `foo' in the record.
-
- `PAT1 || PAT2'
- Matches if at least one of PAT1 and PAT2 matches the current
- input record. For example, the following command prints all
- records in the input file `BBS-list' that contain *either*
- `2400' or `foo', or both.
-
- awk '/2400/ || /foo/' BBS-list
-
- Whether PAT2 matches is tested only if PAT1 fails to match.
- This can make a difference when PAT2 contains expressions that
- have side effects.
-
- `!PAT'
- Matches if PAT does not match. For example, the following
- command prints all records in the input file `BBS-list' that do
- *not* contain the string `foo'.
-
- awk '! /foo/' BBS-list
-
- Note that boolean patterns are built from other patterns just as
- boolean expressions are built from other expressions (*note Boolean
- Ops::.). Any boolean expression is also a valid boolean pattern.
- But the converse is not true: simple regular expression patterns such
- as `/foo/' are not allowed in boolean expressions. Regular
- expressions can appear in boolean expressions only in conjunction
- with the matching operators, `~' and `!~'.
-
-
- File: gawk-info, Node: Conditional Patterns, Prev: Boolean, Up: Patterns
-
- Conditional Patterns
- ====================
-
- Patterns may use a "conditional expression" much like the conditional
- expression of the C language. This takes the form:
-
- PAT1 ? PAT2 : PAT3
-
- The first pattern is evaluated. If it evaluates to TRUE, then the
- input record is tested against PAT2. Otherwise it is tested against
- PAT3. The conditional pattern matches if PAT2 or PAT3 (whichever one
- is selected) matches.
-
-
- File: gawk-info, Node: Actions, Next: Expressions, Prev: Patterns, Up: Top
-
- Actions: The Basics
- *******************
-
- The "action" part of an `awk' rule tells `awk' what to do once a
- match for the pattern is found. An action consists of one or more
- `awk' "statements", enclosed in curly braces (`{' and `}'). The
- curly braces must be used even if the action contains only one
- statement, or even if it contains no statements at all. Action
- statements are separated by newlines or semicolons.
-
- Besides the print statements already covered (*note Printing::.),
- there are four kinds of action statements: expressions, control
- statements, compound statements, and function definitions.
-
- * "Expressions" include assignments, arithmetic, function calls,
- and more (*note Expressions::.).
-
- * "Control statements" specify the control flow of `awk' programs.
- The `awk' language gives you C--like constructs (`if', `for',
- `while', and so on) as well as a few special ones (*note
- Statements::.).
-
- * A "compound statement" is just one or more `awk' statements
- enclosed in curly braces. This way you can group several
- statements to form the body of an `if' or similar statement.
-
- * You can define "user--defined functions" for use elsewhere in
- the `awk' program (*note User-defined::.).
-
-
- File: gawk-info, Node: Expressions, Next: Statements, Prev: Actions, Up: Top
-
- Actions: Expressions
- ********************
-
- Expressions are the basic building block of `awk' actions. An
- expression evaluates to a value, which you can print, test, store in
- a variable or pass to a function.
-
- But, beyond that, an expression can assign a new value to a variable
- or a field, with an assignment operator.
-
- An expression can serve as a statement on its own. Most other action
- statements are made up of various combinations of expressions. As in
- other languages, expressions in `awk' include variables, array
- references, constants, and function calls, as well as combinations of
- these with various operators.
-
- * Menu:
-
- * Constants:: String and numeric constants.
- * Variables:: Variables give names to values for future use.
- * Fields:: Field references such as `$1' are also expressions.
- * Arrays:: Array element references are expressions.
-
- * Arithmetic Ops:: Arithmetic operations (`+', `-', etc.)
- * Concatenation:: Concatenating strings.
- * Comparison Ops:: Comparison of numbers and strings with `<', etc.
- * Boolean Ops:: Combining comparison expressions using boolean operators
- `||' (``or''), `&&' (``and'') and `!' (``not'').
-
- * Assignment Ops:: Changing the value of a variable or a field.
- * Increment Ops:: Incrementing the numeric value of a variable.
-
- * Conversion:: The conversion of strings to numbers and vice versa.
- * Conditional Exp:: Conditional expressions select between two subexpressions
- under control of a third subexpression.
- * Function Calls:: A function call is an expression.
-
-
- File: gawk-info, Node: Constants, Next: Variables, Up: Expressions
-
- Constant Expressions
- ====================
-
- There are two types of constants: numeric constants and string
- constants.
-
- The "numeric constant" is a number. This number can be an integer, a
- decimal fraction, or a number in scientific (exponential) notation.
- Note that all numeric values are represented within `awk' in
- double--precision floating point. Here are some examples of numeric
- constants, which all have the same value:
-
- 105
- 1.05e+2
- 1050e-1
-
- A string constant consists of a sequence of characters enclosed in
- double--quote marks. For example:
-
- "parrot"
-
- represents the string constant `parrot'. Strings in `gawk' can be of
- any length and they can contain all the possible 8--bit ASCII
- characters including ASCII NUL. Other `awk' implementations may have
- difficulty with some character codes.
-
- Some characters cannot be included literally in a string. You
- represent them instead with "escape sequences", which are character
- sequences beginning with a backslash (`\').
-
- One use of the backslash is to include double--quote characters in a
- string. Since a plain double--quote would end the string, you must
- use `\"'. Backslash itself is another character that can't be
- included normally; you write `\\' to put one backslash in the string.
-
- Another use of backslash is to represent unprintable characters such
- as newline. While there is nothing to stop you from writing these
- characters directly in an `awk' program, they may look ugly.
-
- `\b'
- Represents a backspaced, H'.
-
- `\f'
- Represents a formfeed, L'.
-
- `\n'
- Represents a newline, J'.
-
- `\r'
- Represents a carriage return, M'.
-
- `\t'
- Represents a horizontal tab, I'.
-
- `\v'
- Represents a vertical tab, K'.
-
- `\NNN'
- Represents the octal value NNN, where NNN is one to three digits
- between 0 and 7. For example, the code for the ASCII ESC
- (escape) character is `\033'.
-
-
- File: gawk-info, Node: Variables, Next: Arithmetic Ops, Prev: Constants, Up: Expressions
-
- Variables
- =========
-
- Variables let you give names to values and refer to them later. You
- have already seen variables in many of the examples. The name of a
- variable must be a sequence of letters, digits and underscores, but
- it may not begin with a digit. Case is significant in variable
- names; `a' and `A' are distinct variables.
-
- A variable name is a valid expression by itself; it represents the
- variable's current value. Variables are given new values with
- "assignment operators" and "increment operators". *Note Assignment
- Ops::.
-
- A few variables have special built--in meanings, such as `FS', the
- field separator, and `NF', the number of fields in the current input
- record. *Note Special::, for a list of them. Special variables can
- be used and assigned just like all other variables, but their values
- are also used or changed automatically by `awk'. Each special
- variable's name is made entirely of upper case letters.
-
- Variables in `awk' can be assigned either numeric values or string
- values. By default, variables are initialized to the null string,
- which has the numeric value zero. So there is no need to
- ``initialize'' each variable explicitly in `awk', the way you would
- need to do in C or most other traditional programming languages.
-
-
- File: gawk-info, Node: Arithmetic Ops, Next: Concatenation, Prev: Variables, Up: Expressions
-
- Arithmetic Operators
- ====================
-
- The `awk' language uses the common arithmetic operators when
- evaluating expressions. All of these arithmetic operators follow
- normal precedence rules, and work as you would expect them to. This
- example divides field 3 by field 4, adds field 2, stores the result
- into field 1, and prints the results:
-
- awk '{ $1 = $2 + $3 / $4; print }' inventory-shipped
-
- The arithmetic operators in `awk' are:
-
- `X + Y'
- Addition.
-
- `X - Y'
- Subtraction.
-
- `- X'
- Negation.
-
- `X / Y'
- Division. Since all numbers in `awk' are double--precision
- floating point, the result is not rounded to an integer: `3 / 4'
- has the value 0.75.
-
- `X * Y'
- Multiplication.
-
- `X % Y'
- Remainder. The quotient is rounded toward zero to an integer,
- multiplied by Y and this result is subtracted from X. This
- operation is sometimes known as ``trunc--mod''. The following
- relation always holds:
-
- `b * int(a / b) + (a % b) == a'
-
- One undesirable effect of this definition of remainder is that X
- % Y is negative if X is negative. Thus,
-
- -17 % 8 = -1
-
- `X ^ Y'
- `X ** Y'
- Exponentiation: X raised to the Y power. `2 ^ 3' has the value
- 8. The character sequence `**' is equivalent to `^'.
-
-
- File: gawk-info, Node: Concatenation, Next: Comparison Ops, Prev: Arithmetic Ops, Up: Expressions
-
- String Concatenation
- ====================
-
- There is only one string operation: concatenation. It does not have
- a specific operator to represent it. Instead, concatenation is
- performed by writing expressions next to one another, with no
- operator. For example:
-
- awk '{ print "Field number one: " $1 }' BBS-list
-
- produces, for the first record in `BBS-list':
-
- Field number one: aardvark
-
- If you hadn't put the space after the `:', the line would have run
- together. For example:
-
- awk '{ print "Field number one:" $1 }' BBS-list
-
- produces, for the first record in `BBS-list':
-
- Field number one:aardvark
-
-
- File: gawk-info, Node: Comparison Ops, Next: Boolean Ops, Prev: Concatenation, Up: Expressions
-
- Comparison Expressions
- ======================
-
- "Comparison expressions" use "relational operators" to compare
- strings or numbers. The relational operators are the same as in C.
- Here is a table of them:
-
- `X < Y'
- True if X is less than Y.
-
- `X <= Y'
- True if X is less than or equal to Y.
-
- `X > Y'
- True if X is greater than Y.
-
- `X >= Y'
- True if X is greater than or equal to Y.
-
- `X == Y'
- True if X is equal to Y.
-
- `X != Y'
- True if X is not equal to Y.
-
- `X ~ REGEXP'
- True if regexp REGEXP matches the string X.
-
- `X !~ REGEXP'
- True if regexp REGEXP does not match the string X.
-
- `SUBSCRIPT in ARRAY'
- True if array ARRAY has an element with the subscript SUBSCRIPT.
-
- Comparison expressions have the value 1 if true and 0 if false.
-
- The operands of a relational operator are compared as numbers if they
- are both numbers. Otherwise they are converted to, and compared as,
- strings (*note Conversion::.). Strings are compared by comparing the
- first character of each, then the second character of each, and so on.
- Thus, `"10"' is less than `"9"'.
-
- For example,
-
- $1 == "foo"
-
- has the value of 1, or is true, if the first field of the current
- input record is precisely `foo'. By contrast,
-
- $1 ~ /foo/
-
- has the value 1 if the first field contains `foo'.
-
-
- File: gawk-info, Node: Boolean Ops, Next: Assignment Ops, Prev: Comparison Ops, Up: Expressions
-
- Boolean Operators
- =================
-
- A boolean expression is combination of comparison expressions or
- matching expressions, using the boolean operators ``or'' (`||'),
- ``and'' (`&&'), and ``not'' (`!'), along with parentheses to control
- nesting. The truth of the boolean expression is computed by
- combining the truth values of the component expressions.
-
- Boolean expressions can be used wherever comparison and matching
- expressions can be used. They can be used in `if' and `while'
- statements. They have numeric values (1 if true, 0 if false).
-
- In addition, every boolean expression is also a valid boolean
- pattern, so you can use it as a pattern to control the execution of
- rules.
-
- Here are descriptions of the three boolean operators, with an example
- of each. It may be instructive to compare these examples with the
- analogous examples of boolean patterns (*note Boolean::.), which use
- the same boolean operators in patterns instead of expressions.
-
- `BOOLEAN1 && BOOLEAN2'
- True if both BOOLEAN1 and BOOLEAN2 are true. For example, the
- following statement prints the current input record if it
- contains both `2400' and `foo'.
-
- if ($0 ~ /2400/ && $0 ~ /foo/) print
-
- The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is
- true. This can make a difference when BOOLEAN2 contains
- expressions that have side effects: in the case of `$0 ~ /foo/
- && ($2 == bar++)', the variable `bar' is not incremented if
- there is no `foo' in the record.
-
- `BOOLEAN1 || BOOLEAN2'
- True if at least one of BOOLEAN1 and BOOLEAN2 is true. For
- example, the following command prints all records in the input
- file `BBS-list' that contain *either* `2400' or `foo', or both.
-
- awk '{ if ($0 ~ /2400/ || $0 ~ /foo/) print }' BBS-list
-
- The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is
- true. This can make a difference when BOOLEAN2 contains
- expressions that have side effects.
-
- `!BOOLEAN'
- True if BOOLEAN is false. For example, the following program
- prints all records in the input file `BBS-list' that do *not*
- contain the string `foo'.
-
- awk '{ if (! ($0 ~ /foo/)) print }' BBS-list
-
-
- File: gawk-info, Node: Assignment Ops, Next: Increment Ops, Prev: Boolean Ops, Up: Expressions
-
- Assignment Operators
- ====================
-
- An "assignment" is an expression that stores a new value into a
- variable. For example, let's assign the value 1 to the variable `z':
-
- z = 1
-
- After this expression is executed, the variable `z' has the value 1.
- Whatever old value `z' had before the assignment is forgotten.
-
- The `=' sign is called an "assignment operator". It is the simplest
- assignment operator because the value of the right--hand operand is
- stored unchanged.
-
- The left--hand operand of an assignment can be a variable (*note
- Variables::.), a field (*note Changing Fields::.) or an array element
- (*note Arrays::.). These are all called "lvalues", which means they
- can appear on the left side of an assignment operator. The
- right--hand operand may be any expression; it produces the new value
- which the assignment stores in the specified variable, field or array
- element.
-
- Assignments can store string values also. For example, this would
- store the value `"this food is good"' in the variable `message':
-
- thing = "food"
- predicate = "good"
- message = "this " thing " is " predicate
-
- (This also illustrates concatenation of strings.)
-
- It is important to note that variables do *not* have permanent types.
- The type of a variable is simply the type of whatever value it
- happens to hold at the moment. In the following program fragment,
- the variable `foo' has a numeric value at first, and a string value
- later on:
-
- foo = 1
- print foo
- foo = "bar"
- print foo
-
- When the second assignment gives `foo' a string value, the fact that
- it previously had a numeric value is forgotten.
-
- An assignment is an expression, so it has a value: the same value
- that is assigned. Thus, `z = 1' as an expression has the value 1.
- One consequence of this is that you can write multiple assignments
- together:
-
- x = y = z = 0
-
- stores the value 0 in all three variables. It does this because the
- value of `z = 0', which is 0, is stored into `y', and then the value
- of `y = z = 0', which is 0, is stored into `x'.
-
- You can use an assignment anywhere an expression is called for. For
- example, it is valid to write `x != (y = 1)' to set `y' to 1 and then
- test whether `x' equals 1. But this style tends to make programs
- hard to read; except in a one--shot program, you should rewrite it to
- get rid of such nesting of assignments. This is never very hard.
-
- Aside from `=', there are several other assignment operators that do
- arithmetic with the old value of the variable. For example, the
- operator `+=' computes a new value by adding the right--hand value to
- the old value of the variable. Thus, the following assignment adds 5
- to the value of `foo':
-
- foo += 5
-
- This is precisely equivalent to the following:
-
- foo = foo + 5
-
- Use whichever one makes the meaning of your program clearer.
-
- Here is a table of the arithmetic assignment operators. In each
- case, the right--hand operand is an expression whose value is
- converted to a number.
-
- `LVALUE += INCREMENT'
- Adds INCREMENT to the value of LVALUE to make the new value of
- LVALUE.
-
- `LVALUE -= DECREMENT'
- Subtracts DECREMENT from the value of LVALUE.
-
- `LVALUE *= COEFFICIENT'
- Multiplies the value of LVALUE by COEFFICIENT.
-
- `LVALUE /= QUOTIENT'
- Divides the value of LVALUE by QUOTIENT.
-
- `LVALUE %= MODULUS'
- Sets LVALUE to its remainder by MODULUS.
-
- `LVALUE ^= POWER'
- `LVALUE **= POWER'
- Raises LVALUE to the power POWER.
-
-
- File: gawk-info, Node: Increment Ops, Next: Conversion, Prev: Assignment Ops, Up: Expressions
-
- Increment Operators
- ===================
-
- "Increment operators" increase or decrease the value of a variable by
- 1. You could do the same thing with an assignment operator, so the
- increment operators add no power to the `awk' language; but they are
- convenient abbreviations for something very common.
-
- The operator to add 1 is written `++'. There are two ways to use
- this operator: pre--incrementation and post--incrementation.
-
- To pre--increment a variable V, write `++V'. This adds 1 to the
- value of V and that new value is also the value of this expression.
- The assignment expression `V += 1' is completely equivalent.
-
- Writing the `++' after the variable specifies post--increment. This
- increments the variable value just the same; the difference is that
- the value of the increment expression itself is the variable's *old*
- value. Thus, if `foo' has value 4, then the expression `foo++' has
- the value 4, but it changes the value of `foo' to 5.
-
- The post--increment `foo++' is nearly equivalent to writing `(foo +=
- 1) - 1'. It is not perfectly equivalent because all numbers in `awk'
- are floating point: in floating point, `foo + 1 - 1' does not
- necessarily equal `foo'. But the difference will be minute as long
- as you stick to numbers that are fairly small (less than a trillion).
-
- Any lvalue can be incremented. Fields and array elements are
- incremented just like variables.
-
- The decrement operator `--' works just like `++' except that it
- subtracts 1 instead of adding. Like `++', it can be used before the
- lvalue to pre--decrement or after it to post--decrement.
-
- Here is a summary of increment and decrement expressions.
-
- `++LVALUE'
- This expression increments LVALUE and the new value becomes the
- value of this expression.
-
- `LVALUE++'
- This expression causes the contents of LVALUE to be incremented.
- The value of the expression is the *old* value of LVALUE.
-
- `--LVALUE'
- Like `++LVALUE', but instead of adding, it subtracts. It
- decrements LVALUE and delivers the value that results.
-
- `LVALUE--'
- Like `LVALUE++', but instead of adding, it subtracts. It
- decrements LVALUE. The value of the expression is the *old*
- value of LVALUE.
-
-
- File: gawk-info, Node: Conversion, Next: Conditional Exp, Prev: Increment Ops, Up: Expressions
-
- Conversion of Strings and Numbers
- =================================
-
- Strings are converted to numbers, and numbers to strings, if the
- context of your `awk' statement demands it. For example, if the
- values of `foo' or `bar' in the expression `foo + bar' happen to be
- strings, they are converted to numbers before the addition is
- performed. If numeric values appear in string concatenation, they
- are converted to strings. Consider this:
-
- two = 2; three = 3
- print (two three) + 4
-
- This eventually prints the (numeric) value `27'. The numeric
- variables `two' and `three' are converted to strings and concatenated
- together, and the resulting string is converted back to a number
- before adding `4'. The resulting numeric value `27' is printed.
-
- If, for some reason, you need to force a number to be converted to a
- string, concatenate the null string with that number. To force a
- string to be converted to a number, add zero to that string. Strings
- that can't be interpreted as valid numbers are given the numeric
- value zero.
-
- The exact manner in which numbers are converted into strings is
- controlled by the `awk' special variable `OFMT' (*note Special::.).
- Numbers are converted using a special version of the `sprintf'
- function (*note Built-in::.) with `OFMT' as the format specifier.
-
- `OFMT''s default value is `"%.6g"', which prints a value with at
- least six significant digits. You might want to change it to specify
- more precision, if your version of `awk' uses double precision
- arithmetic. Double precision on most modern machines gives you 16 or
- 17 decimal digits of precision.
-
- Strange results can happen if you set `OFMT' to a string that doesn't
- tell `sprintf' how to format floating point numbers in a useful way.
- For example, if you forget the `%' in the format, all numbers will be
- converted to the same constant string.
-
-
- File: gawk-info, Node: Conditional Exp, Next: Function Calls, Prev: Conversion, Up: Expressions
-
- Conditional Expressions
- =======================
-
- A "conditional expression" is a special kind of expression with three
- operands. It allows you to use one expression's value to select one
- of two other expressions.
-
- The conditional expression looks the same as in the C language:
-
- SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP
-
- There are three subexpressions. The first, SELECTOR, is always
- computed first. If it is ``true'' (not zero) then IF-TRUE-EXP is
- computed next and its value becomes the value of the whole expression.
- Otherwise, IF-FALSE-EXP is computed next and its value becomes the
- value of the whole expression.
-
- For example, this expression produces the absolute value of `x':
-
- x > 0 ? x : -x
-
- Each time the conditional expression is computed, exactly one of
- IF-TRUE-EXP and IF-FALSE-EXP is computed; the other is ignored. This
- is important when the expressions contain side effects. For example,
- this conditional expression examines element `i' of either array `a'
- or array `b', and increments `i'.
-
- x == y ? a[i++] : b[i++]
-
- This is guaranteed to increment `i' exactly once, because each time
- one or the other of the two increment expressions will be executed
- and the other will not be.
-
-
- File: gawk-info, Node: Function Calls, Prev: Conditional Exp, Up: Expressions
-
- Function Calls
- ==============
-
- A "function" is a name for a particular calculation. Because it has
- a name, you can ask for it by name at any point in the program. For
- example, the function `sqrt' computes the square root of a number.
-
- A fixed set of functions are "built in", which means they are
- available in every `awk' program. The `sqrt' function is one of
- these. *Note Built-in::, for a list of built--in functions and their
- descriptions. In addition, you can define your own functions in the
- program for use elsewhere in the same program. *Note User-defined::,
- for how to do this.
-
- The way to use a function is with a "function call" expression, which
- consists of the function name followed by a list of "arguments" in
- parentheses. The arguments are expressions which give the raw
- materials for the calculation that the function will do. When there
- is more than one argument, they are separated by commas. If there
- are no arguments, write just `()' after the function name.
-
- *Do not put any space between the function name and the
- open--parenthesis!* A user--defined function name looks just like
- the name of a variable, and space would make the expression look like
- concatenation of a variable with an expression inside parentheses.
- Space before the parenthesis is harmless with built--in functions,
- but it is best not to get into the habit of using space, lest you do
- likewise for a user--defined function one day by mistake.
-
- Each function needs a particular number of arguments. For example,
- the `sqrt' function must be called with a single argument, like this:
-
- sqrt(ARGUMENT)
-
- The argument is the number to take the square root of.
-
- Some of the built--in functions allow you to omit the final argument.
- If you do so, they will use a reasonable default. *Note Built-in::,
- for full details. If arguments are omitted in calls to user--defined
- functions, then those arguments are treated as local variables,
- initialized to the null string (*note User-defined::.).
-
- Like every other expression, the function call has a value, which is
- computed by the function based on the arguments you give it. In this
- example, the value of `sqrt(ARGUMENT)' is the square root of the
- argument. A function can also have side effects, such as assigning
- the values of certain variables or doing I/O.
-
- Here is a command to read numbers, one number per line, and print the
- square root of each one:
-
- awk '{ print "The square root of", $1, "is", sqrt($1) }'
-
-
- File: gawk-info, Node: Statements, Next: Arrays, Prev: Expressions, Up: Top
-
- Actions: Statements
- *******************
-
- "Control statements" such as `if', `while', and so on control the
- flow of execution in `awk' programs. Most of the control statements
- in `awk' are patterned on similar statements in C.
-
- The simplest kind of statement is an expression. The other kinds of
- statements start with special keywords such as `if' and `while', to
- distinguish them from simple expressions.
-
- In all the examples in this chapter, BODY can be either a single
- statement or a group of statements. Groups of statements are
- enclosed in braces, and separated by newlines or semicolons.
-
- * Menu:
-
- * Expressions:: One kind of statement simply computes an expression.
-
- * If:: Conditionally execute some `awk' statements.
-
- * While:: Loop until some condition is satisfied.
-
- * Do:: Do specified action while looping until some
- condition is satisfied.
-
- * For:: Another looping statement, that provides
- initialization and increment clauses.
-
- * Break:: Immediately exit the innermost enclosing loop.
-
- * Continue:: Skip to the end of the innermost enclosing loop.
-
- * Next:: Stop processing the current input record.
-
- * Exit:: Stop execution of `awk'.
-
-
- File: gawk-info, Node: If, Next: While, Up: Statements
-
- The `if' Statement
- ==================
-
- The `if'-`else' statement is `awk''s decision--making statement. The
- `else' part of the statement is optional.
-
- `if (CONDITION) BODY1 else BODY2'
-
- Here CONDITION is an expression that controls what the rest of the
- statement will do. If CONDITION is true, BODY1 is executed;
- otherwise, BODY2 is executed (assuming that the `else' clause is
- present). The condition is considered true if it is nonzero or
- nonnull.
-
- Here is an example:
-
- awk '{ if (x % 2 == 0)
- print "x is even"
- else
- print "x is odd" }'
-
- In this example, if the statement containing `x' is found to be true
- (that is, x is divisible by 2), then the first `print' statement is
- executed, otherwise the second `print' statement is performed.
-
- If the `else' appears on the same line as BODY1, and BODY1 is a
- single statement, then a semicolon must separate BODY1 from `else'.
- To illustrate this, let's rewrite the previous example:
-
- awk '{ if (x % 2 == 0) print "x is even"; else
- print "x is odd" }'
-
- If you forget the `;', `awk' won't be able to parse it, and you will
- get a syntax error.
-
- We would not actually write this example this way, because a human
- reader might fail to see the `else' if it were not the first thing on
- its line.
-
-
- File: gawk-info, Node: While, Next: Do, Prev: If, Up: Statements
-
- The `while' Statement
- =====================
-
- In programming, a loop means a part of a program that is (or at least
- can be) executed two or more times in succession.
-
- The `while' statement is the simplest looping statement in `awk'. It
- repeatedly executes a statement as long as a condition is true. It
- looks like this:
-
- while (CONDITION)
- BODY
-
- Here BODY is a statement that we call the "body" of the loop, and
- CONDITION is an expression that controls how long the loop keeps
- running.
-
- The first thing the `while' statement does is test CONDITION. If
- CONDITION is true, it executes the statement BODY. After BODY has
- been executed, CONDITION is tested again and this process is repeated
- until CONDITION is no longer true. If CONDITION is initially false,
- the body of the loop is never executed.
-
- awk '{ i = 1
- while (i <= 3) {
- print $i
- i++
- }
- }'
-
- This example prints the first three input fields, one per line.
-
- The loop works like this: first, the value of `i' is set to 1. Then,
- the `while' tests whether `i' is less than or equal to three. This
- is the case when `i' equals one, so the `i'-th field is printed.
- Then the `i++' increments the value of `i' and the loop repeats.
-
- When `i' reaches 4, the loop exits. Here BODY is a compound
- statement enclosed in braces. As you can see, a newline is not
- required between the condition and the body; but using one makes the
- program clearer unless the body is a compound statement or is very
- simple.
-
-
- File: gawk-info, Node: Do, Next: For, Prev: While, Up: Statements
-
- The `do'--`while' Statement
- ===========================
-
- The `do' loop is a variation of the `while' looping statement. The
- `do' loop executes the BODY once, then repeats BODY as long as
- CONDITION is true. It looks like this:
-
- do
- BODY
- while (CONDITION)
-
- Even if CONDITION is false at the start, BODY is executed at least
- once (and only once, unless executing BODY makes CONDITION true).
- Contrast this with the corresponding `while' statement:
-
- while (CONDITION)
- BODY
-
- This statement will not execute BODY even once if CONDITION is false
- to begin with.
-
- Here is an example of a `do' statement:
-
- awk '{ i = 1
- do {
- print $0
- i++
- } while (i <= 10)
- }'
-
- prints each input record ten times. It isn't a very realistic
- example, since in this case an ordinary `while' would do just as
- well. But this is normal; there is only occasionally a real use for
- a `do' statement.
-
-
-