home *** CD-ROM | disk | FTP | other *** search
Text File | 1989-04-13 | 49.9 KB | 1,232 lines |
- Info file gawk-info, produced by Makeinfo, -*- Text -*- from input
- file gawk.texinfo.
-
- This file documents `awk', a program that you can use to select
- particular records in a file and perform operations upon them.
-
- Copyright (C) 1989 Free Software Foundation, Inc.
-
- Permission is granted to make and distribute verbatim copies of this
- manual provided the copyright notice and this permission notice are
- preserved on all copies.
-
- Permission is granted to copy and distribute modified versions of
- this manual under the conditions for verbatim copying, provided that
- the entire resulting derived work is distributed under the terms of a
- permission notice identical to this one.
-
- Permission is granted to copy and distribute translations of this
- manual into another language, under the above conditions for modified
- versions, except that this permission notice may be stated in a
- translation approved by the Foundation.
-
-
- File: gawk-info, Node: Top, Next: Preface, Prev: (dir), Up: (dir)
-
- This file documents `awk', a program that you can use to select
- particular records in a file and perform operations upon them; it
- contains the following chapters:
-
- * Menu:
-
- * Preface:: What you can do with `awk'; brief history
- and acknowledgements.
-
- * License:: Your right to copy and distribute `gawk'.
-
- * This Manual:: Using this manual.
-
- Includes sample input files that you can use.
-
- * Getting Started:: A basic introduction to using `awk'.
- How to run an `awk' program. Command line syntax.
-
- * Reading Files:: How to read files and manipulate fields.
-
- * Printing:: How to print using `awk'. Describes the
- `print' and `printf' statements.
- Also describes redirection of output.
-
- * One-liners:: Short, sample `awk' programs.
-
- * Patterns:: The various types of patterns explained in detail.
-
- * Actions:: The various types of actions are introduced here.
- Describes expressions and the various operators in
- detail. Also describes comparison expressions.
-
- * Statements:: The various control statements are described in
- detail.
-
- * Arrays:: The description and use of arrays. Also includes
- array--oriented control statements.
-
- * User-defined:: User--defined functions are described in detail.
-
- * Built-in:: The built--in functions are summarized here.
-
- * Special:: The special variables are summarized here.
-
- * Sample Program:: A sample `awk' program with a complete explanation.
-
- * Notes:: Something about the implementation of `gawk'.
-
- * Glossary:: An explanation of some unfamiliar terms.
-
- * Index::
-
-
- File: gawk-info, Node: Preface, Next: License, Prev: Top, Up: Top
-
- Preface
- *******
-
- If you are like many computer users, you frequently would like to
- make changes in various text files wherever certain patterns appear,
- or extract data from parts of certain lines while discarding the
- rest. To write a program to do this in a language such as C or
- Pascal is a time--consuming inconvenience that may take many lines of
- code. The job may be easier with `awk'.
-
- The `awk' utility interprets a special--purpose programming language
- that makes it possible to handle simple data--reformatting jobs
- easily with just a few lines of code.
-
- The GNU implementation of `awk' is called `gawk'; it is fully upward
- compatible with the System V Release 3.1 and later version of `awk'.
- All properly written `awk' programs should work with `gawk'. So we
- usually don't distinguish between `gawk' and other `awk'
- implementations in this manual.
-
- This manual teaches you what `awk' does and how you can use `awk'
- effectively. You should already be familiar with basic,
- general--purpose, operating system commands such as `ls'. Using
- `awk' you can:
-
- * manage small, personal databases,
-
- * generate reports,
-
- * validate data,
-
- * produce indexes, and perform other document preparation tasks,
-
- * even experiment with algorithms that can be adapted later to
- other computer languages!
-
- * Menu:
-
- * History:: The history of gawk and awk. Acknowledgements.
-
-
- File: gawk-info, Node: History, Up: Preface
-
- History of `awk' and `gawk'
- ===========================
-
- The name `awk' comes from the initials of its designers: Alfred V.
- Aho, Peter J. Weinberger, and Brian W. Kernighan. The original
- version of `awk' was written in 1977. In 1985 a new version made the
- programming language more powerful, introducing user--defined
- functions, multiple input streams, and computed regular expressions.
-
- The GNU implementation, `gawk', was written in 1986 by Paul Rubin and
- Jay Fenlason, with advice from Richard Stallman. John Woods
- contributed parts of the code as well. In 1988, David Trueman, with
- help from Arnold Robbins, reworked `gawk' for compatibility with the
- newer `awk'.
-
- Many people need to be thanked for their assistance in producing this
- manual. Jay Fenlason contributed many ideas and sample programs.
- Richard Mlynarik and Robert Chassell gave helpful comments on drafts
- of this manual. The paper ``A Supplemental Document for `awk''' by
- John W. Pierce of the Chemistry Department at UC San Diego,
- pinpointed several issues relevant both to `awk' implementation and
- to this manual, that would otherwise have escaped us.
-
- Finally, we would like to thank Brian Kernighan of Bell Labs for
- invaluable assistance during the testing and debugging of `gawk', and
- for help in clarifying several points about the language.
-
-
- File: gawk-info, Node: License, Next: This Manual, Prev: Preface, Up: Top
-
- GNU GENERAL PUBLIC LICENSE
- **************************
-
- Version 1, February 1989
-
- Copyright (C) 1989 Free Software Foundation, Inc.
- 675 Mass Ave, Cambridge, MA 02139, USA
-
- Everyone is permitted to copy and distribute verbatim copies
- of this license document, but changing it is not allowed.
-
- Preamble
- =========
-
- The license agreements of most software companies try to keep users
- at the mercy of those companies. By contrast, our General Public
- License is intended to guarantee your freedom to share and change
- free software--to make sure the software is free for all its users.
- The General Public License applies to the Free Software Foundation's
- software and to any other program whose authors commit to using it.
- You can use it for your programs, too.
-
- When we speak of free software, we are referring to freedom, not
- price. Specifically, the General Public License is designed to make
- sure that you have the freedom to give away or sell copies of free
- software, that you receive source code or can get it if you want it,
- that you can change the software or use pieces of it in new free
- programs; and that you know you can do these things.
-
- To protect your rights, we need to make restrictions that forbid
- anyone to deny you these rights or to ask you to surrender the rights.
- These restrictions translate to certain responsibilities for you if
- you distribute copies of the software, or if you modify it.
-
- For example, if you distribute copies of a such a program, whether
- gratis or for a fee, you must give the recipients all the rights that
- you have. You must make sure that they, too, receive or can get the
- source code. And you must tell them their rights.
-
- We protect your rights with two steps: (1) copyright the software,
- and (2) offer you this license which gives you legal permission to
- copy, distribute and/or modify the software.
-
- Also, for each author's protection and ours, we want to make certain
- that everyone understands that there is no warranty for this free
- software. If the software is modified by someone else and passed on,
- we want its recipients to know that what they have is not the
- original, so that any problems introduced by others will not reflect
- on the original authors' reputations.
-
- The precise terms and conditions for copying, distribution and
- modification follow.
-
- TERMS AND CONDITIONS
-
- 1. This License Agreement applies to any program or other work
- which contains a notice placed by the copyright holder saying it
- may be distributed under the terms of this General Public
- License. The ``Program'', below, refers to any such program or
- work, and a ``work based on the Program'' means either the
- Program or any work containing the Program or a portion of it,
- either verbatim or with modifications. Each licensee is
- addressed as ``you''.
-
- 2. You may copy and distribute verbatim copies of the Program's
- source code as you receive it, in any medium, provided that you
- conspicuously and appropriately publish on each copy an
- appropriate copyright notice and disclaimer of warranty; keep
- intact all the notices that refer to this General Public License
- and to the absence of any warranty; and give any other
- recipients of the Program a copy of this General Public License
- along with the Program. You may charge a fee for the physical
- act of transferring a copy.
-
- 3. You may modify your copy or copies of the Program or any portion
- of it, and copy and distribute such modifications under the
- terms of Paragraph 1 above, provided that you also do the
- following:
-
- * cause the modified files to carry prominent notices stating
- that you changed the files and the date of any change; and
-
- * cause the whole of any work that you distribute or publish,
- that in whole or in part contains the Program or any part
- thereof, either with or without modifications, to be
- licensed at no charge to all third parties under the terms
- of this General Public License (except that you may choose
- to grant warranty protection to some or all third parties,
- at your option).
-
- * If the modified program normally reads commands
- interactively when run, you must cause it, when started
- running for such interactive use in the simplest and most
- usual way, to print or display an announcement including an
- appropriate copyright notice and a notice that there is no
- warranty (or else, saying that you provide a warranty) and
- that users may redistribute the program under these
- conditions, and telling the user how to view a copy of this
- General Public License.
-
- * You may charge a fee for the physical act of transferring a
- copy, and you may at your option offer warranty protection
- in exchange for a fee.
-
- Mere aggregation of another independent work with the Program
- (or its derivative) on a volume of a storage or distribution
- medium does not bring the other work under the scope of these
- terms.
-
- 4. You may copy and distribute the Program (or a portion or
- derivative of it, under Paragraph 2) in object code or
- executable form under the terms of Paragraphs 1 and 2 above
- provided that you also do one of the following:
-
- * accompany it with the complete corresponding
- machine-readable source code, which must be distributed
- under the terms of Paragraphs 1 and 2 above; or,
-
- * accompany it with a written offer, valid for at least three
- years, to give any third party free (except for a nominal
- charge for the cost of distribution) a complete
- machine-readable copy of the corresponding source code, to
- be distributed under the terms of Paragraphs 1 and 2 above;
- or,
-
- * accompany it with the information you received as to where
- the corresponding source code may be obtained. (This
- alternative is allowed only for noncommercial distribution
- and only if you received the program in object code or
- executable form alone.)
-
- Source code for a work means the preferred form of the work for
- making modifications to it. For an executable file, complete
- source code means all the source code for all modules it
- contains; but, as a special exception, it need not include
- source code for modules which are standard libraries that
- accompany the operating system on which the executable file
- runs, or for standard header files or definitions files that
- accompany that operating system.
-
- 5. You may not copy, modify, sublicense, distribute or transfer the
- Program except as expressly provided under this General Public
- License. Any attempt otherwise to copy, modify, sublicense,
- distribute or transfer the Program is void, and will
- automatically terminate your rights to use the Program under
- this License. However, parties who have received copies, or
- rights to use copies, from you under this General Public License
- will not have their licenses terminated so long as such parties
- remain in full compliance.
-
- 6. By copying, distributing or modifying the Program (or any work
- based on the Program) you indicate your acceptance of this
- license to do so, and all its terms and conditions.
-
- 7. Each time you redistribute the Program (or any work based on the
- Program), the recipient automatically receives a license from
- the original licensor to copy, distribute or modify the Program
- subject to these terms and conditions. You may not impose any
- further restrictions on the recipients' exercise of the rights
- granted herein.
-
- 8. The Free Software Foundation may publish revised and/or new
- versions of the General Public License from time to time. Such
- new versions will be similar in spirit to the present version,
- but may differ in detail to address new problems or concerns.
-
- Each version is given a distinguishing version number. If the
- Program specifies a version number of the license which applies
- to it and ``any later version'', you have the option of
- following the terms and conditions either of that version or of
- any later version published by the Free Software Foundation. If
- the Program does not specify a version number of the license,
- you may choose any version ever published by the Free Software
- Foundation.
-
- 9. If you wish to incorporate parts of the Program into other free
- programs whose distribution conditions are different, write to
- the author to ask for permission. For software which is
- copyrighted by the Free Software Foundation, write to the Free
- Software Foundation; we sometimes make exceptions for this. Our
- decision will be guided by the two goals of preserving the free
- status of all derivatives of our free software and of promoting
- the sharing and reuse of software generally.
-
- NO WARRANTY
-
- 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
- WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
- LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
- HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ``AS IS''
- WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
- INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
- MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
- ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS
- WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE
- COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
-
- 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
- WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
- MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
- LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
- INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
- INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS
- OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
- YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH
- ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
- ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
-
- END OF TERMS AND CONDITIONS
-
- Appendix: How to Apply These Terms to Your New Programs
- =======================================================
-
- If you develop a new program, and you want it to be of the greatest
- possible use to humanity, the best way to achieve this is to make it
- free software which everyone can redistribute and change under these
- terms.
-
- To do so, attach the following notices to the program. It is safest
- to attach them to the start of each source file to most effectively
- convey the exclusion of warranty; and each file should have at least
- the ``copyright'' line and a pointer to where the full notice is found.
-
- ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
- Copyright (C) 19YY NAME OF AUTHOR
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 1, or (at your option)
- any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-
- Also add information on how to contact you by electronic and paper
- mail.
-
- If the program is interactive, make it output a short notice like
- this when it starts in an interactive mode:
-
- Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
- Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
- This is free software, and you are welcome to redistribute it
- under certain conditions; type `show c' for details.
-
- The hypothetical commands `show w' and `show c' should show the
- appropriate parts of the General Public License. Of course, the
- commands you use may be called something other than `show w' and
- `show c'; they could even be mouse-clicks or menu items--whatever
- suits your program.
-
- You should also get your employer (if you work as a programmer) or
- your school, if any, to sign a ``copyright disclaimer'' for the
- program, if necessary. Here a sample; alter the names:
-
- Yoyodyne, Inc., hereby disclaims all copyright interest in the
- program `Gnomovision' (a program to direct compilers to make passes
- at assemblers) written by James Hacker.
-
- SIGNATURE OF TY COON, 1 April 1989
- Ty Coon, President of Vice
-
- That's all there is to it!
-
-
- File: gawk-info, Node: This Manual, Next: Getting Started, Prev: License, Up: Top
-
- Using This Manual
- *****************
-
- The term `gawk' refers to a program (a version of `awk') developed by
- the Free Software Foundation, and to the language you use to tell it
- what to do. When we need to be careful, we call the program ``the
- `awk' utility'' and the language ``the `awk' language''. The purpose
- of this manual is to explain the `awk' language and how to run the
- `awk' utility.
-
- The term "`awk' program" refers to a program written by you in the
- `awk' programming language.
-
- *Note Getting Started::, for the bare essentials you need to know to
- start using `awk'.
-
- Useful ``one--liners'' are included to give you a feel for the `awk'
- language (*note One-liners::.).
-
- A sizable sample `awk' program has been provided for you (*note
- Sample Program::.).
-
- If you find terms that you aren't familiar with, try looking them up
- in the glossary (*note Glossary::.).
-
- Most of the time complete `awk' programs are used as examples, but in
- some of the more advanced sections, only the part of the `awk'
- program that illustrates the concept being described is shown.
-
- * Menu:
-
- This chapter contains the following sections:
-
- * The Files:: Sample data files for use in the `awk' programs
- illustrated in this manual.
-
-
- File: gawk-info, Node: The Files, Up: This Manual
-
- Input Files for the Examples
- ============================
-
- This manual contains many sample programs. The data for many of
- those programs comes from two files. The first file, called
- `BBS-list', represents a list of computer bulletin board systems and
- information about those systems.
-
- Each line of this file is one "record". Each record contains the
- name of a computer bulletin board, its phone number, the board's baud
- rate, and a code for the number of hours it is operational. An `A'
- in the last column means the board operates 24 hours all week. A `B'
- in the last column means the board operates evening and weekend
- hours, only. A `C' means the board operates only on weekends.
-
- aardvark 555-5553 1200/300 B
- alpo-net 555-3412 2400/1200/300 A
- barfly 555-7685 1200/300 A
- bites 555-1675 2400/1200/300 A
- camelot 555-0542 300 C
- core 555-2912 1200/300 C
- fooey 555-1234 2400/1200/300 B
- foot 555-6699 1200/300 B
- macfoo 555-6480 1200/300 A
- sdace 555-3430 2400/1200/300 A
- sabafoo 555-2127 1200/300 C
-
- The second data file, called `inventory-shipped', represents
- information about shipments during the year. Each line of this file
- is also one record. Each record contains the month of the year, the
- number of green crates shipped, the number of red boxes shipped, the
- number of orange bags shipped, and the number of blue packages
- shipped, respectively.
-
- Jan 13 25 15 115
- Feb 15 32 24 226
- Mar 15 24 34 228
- Apr 31 52 63 420
- May 16 34 29 208
- Jun 31 42 75 492
- Jul 24 34 67 436
- Aug 15 34 47 316
- Sep 13 55 37 277
- Oct 29 54 68 525
- Nov 20 87 82 577
- Dec 17 35 61 401
-
- Jan 21 36 64 620
- Feb 26 58 80 652
- Mar 24 75 70 495
- Apr 21 70 74 514
-
- If you are reading this in GNU Emacs using Info, you can copy the
- regions of text showing these sample files into your own test files.
- This way you can try out the examples shown in the remainder of this
- document. You do this by using the command `M-x write-region' to
- copy text from the Info file into a file for use with `awk' (see your
- ``GNU Emacs Manual'' for more information). Using this information,
- create your own `BBS-list' and `inventory-shipped' files, and
- practice what you learn in this manual.
-
-
- File: gawk-info, Node: Getting Started, Next: Reading Files, Prev: This Manual, Up: Top
-
- Getting Started With `awk'
- **************************
-
- The basic function of `awk' is to search files for lines (or other
- units of text) that contain certain patterns. When a line matching
- any of those patterns is found, `awk' performs specified actions on
- that line. Then `awk' keeps processing input lines until the end of
- the file is reached.
-
- An `awk' "program" or "script" consists of a series of "rules".
- (They may also contain "function definitions", but that is an
- advanced feature, so let's ignore it for now. *Note User-defined::.)
-
- A rule contains a "pattern", an "action", or both. Actions are
- enclosed in curly braces to distinguish them from patterns.
- Therefore, an `awk' program is a sequence of rules in the form:
-
- PATTERN { ACTION }
- PATTERN { ACTION }
- ...
-
- * Menu:
-
- * Very Simple:: A very simple example.
- * Two Rules:: A less simple one--line example with two rules.
- * More Complex:: A more complex example.
- * Running gawk:: How to run gawk programs; includes command line syntax.
- * Comments:: Adding documentation to gawk programs.
- * Statements/Lines:: Subdividing or combining statements into lines.
-
- * When:: When to use gawk and when to use other things.
-
-
- File: gawk-info, Node: Very Simple, Next: Two Rules, Up: Getting Started
-
- A Very Simple Example
- =====================
-
- The following command runs a simple `awk' program that searches the
- input file `BBS-list' for the string of characters: `foo'. (A string
- of characters is usually called, quite simply, a "string".)
-
- awk '/foo/ { print $0 }' BBS-list
-
- When lines containing `foo' are found, they are printed, because
- `print $0' means print the current line. (Just `print' by itself
- also means the same thing, so we could have written that instead.)
-
- You will notice that slashes, `/', surround the string `foo' in the
- actual `awk' program. The slashes indicate that `foo' is a pattern
- to search for. This type of pattern is called a "regular
- expression", and is covered in more detail later (*note Regexp::.).
- There are single quotes around the `awk' program so that the shell
- won't interpret any of it as special shell characters.
-
- Here is what this program prints:
-
- fooey 555-1234 2400/1200/300 B
- foot 555-6699 1200/300 B
- macfoo 555-6480 1200/300 A
- sabafoo 555-2127 1200/300 C
-
- In an `awk' rule, either the pattern or the action can be omitted,
- but not both.
-
- If the pattern is omitted, then the action is performed for *every*
- input line.
-
- If the action is omitted, the default action is to print all lines
- that match the pattern. We could leave out the action (the print
- statement and the curly braces) in the above example, and the result
- would be the same: all lines matching the pattern `foo' would be
- printed. (By comparison, omitting the print statement but retaining
- the curly braces makes an empty action that does nothing; then no
- lines would be printed.)
-
-
- File: gawk-info, Node: Two Rules, Next: More Complex, Prev: Very Simple, Up: Getting Started
-
- An Example with Two Rules
- =========================
-
- The `awk' utility reads the input files one line at a time. For each
- line, `awk' tries the patterns of all the rules. If several patterns
- match then several actions are run, in the order in which they appear
- in the `awk' program. If no patterns match, then no actions are run.
-
- After processing all the rules (perhaps none) that match the line,
- `awk' reads the next line (however, *note Next::.). This continues
- until the end of the file is reached.
-
- For example, the `awk' program:
-
- /12/ { print $0 }
- /21/ { print $0 }
-
- contains two rules. The first rule has the string `12' as the
- pattern and `print $0' as the action. The second rule has the string
- `21' as the pattern and also has `print $0' as the action. Each
- rule's action is enclosed in its own pair of braces.
-
- This `awk' program prints every line that contains the string `12'
- *or* the string `21'. If a line contains both strings, it is printed
- twice, once by each rule.
-
- If we run this program on our two sample data files, `BBS-list' and
- `inventory-shipped', as shown here:
-
- awk '/12/ { print $0 }
- /21/ { print $0 }' BBS-list inventory-shipped
-
- we get the following output:
-
- aardvark 555-5553 1200/300 B
- alpo-net 555-3412 2400/1200/300 A
- barfly 555-7685 1200/300 A
- bites 555-1675 2400/1200/300 A
- core 555-2912 1200/300 C
- fooey 555-1234 2400/1200/300 B
- foot 555-6699 1200/300 B
- macfoo 555-6480 1200/300 A
- sdace 555-3430 2400/1200/300 A
- sabafoo 555-2127 1200/300 C
- sabafoo 555-2127 1200/300 C
- Jan 21 36 64 620
- Apr 21 70 74 514
-
- Note how the line in `BBS-list' beginning with `sabafoo' was printed
- twice, once for each rule.
-
-
- File: gawk-info, Node: More Complex, Next: Running gawk, Prev: Two Rules, Up: Getting Started
-
- A More Complex Example
- ======================
-
- Here is an example to give you an idea of what typical `awk' programs
- do. This example shows how `awk' can be used to summarize, select,
- and rearrange the output of another utility. It uses features that
- haven't been covered yet, so don't worry if you don't understand all
- the details.
-
- ls -l | awk '$5 == "Nov" { sum += $4 }
- END { print sum }'
-
- This command prints the total number of bytes in all the files in the
- current directory that were last modified in November (of any year).
- (In the C shell you would need to type a semicolon and then a
- backslash at the end of the first line; in the Bourne shell you can
- type the example as shown.)
-
- The `ls -l' part of this example is a command that gives you a full
- listing of all the files in a directory, including file size and date.
- Its output looks like this:
-
- -rw-r--r-- 1 close 1933 Nov 7 13:05 Makefile
- -rw-r--r-- 1 close 10809 Nov 7 13:03 gawk.h
- -rw-r--r-- 1 close 983 Apr 13 12:14 gawk.tab.h
- -rw-r--r-- 1 close 31869 Jun 15 12:20 gawk.y
- -rw-r--r-- 1 close 22414 Nov 7 13:03 gawk1.c
- -rw-r--r-- 1 close 37455 Nov 7 13:03 gawk2.c
- -rw-r--r-- 1 close 27511 Dec 9 13:07 gawk3.c
- -rw-r--r-- 1 close 7989 Nov 7 13:03 gawk4.c
-
- The first field contains read--write permissions, the second field
- contains the number of links to the file, and the third field
- identifies the owner of the file. The fourth field contains the size
- of the file in bytes. The fifth, sixth, and seventh fields contain
- the month, day, and time, respectively, that the file was last
- modified. Finally, the eighth field contains the name of the file.
-
- The `$5 == "Nov"' in our `awk' program is an expression that tests
- whether the fifth field of the output from `ls -l' matches the string
- `Nov'. Each time a line has the string `Nov' in its fifth field, the
- action `{ sum += $4 }' is performed. This adds the fourth field (the
- file size) to the variable `sum'. As a result, when `awk' has
- finished reading all the input lines, `sum' will be the sum of the
- sizes of files whose lines matched the pattern.
-
- After the last line of output from `ls' has been processed, the `END'
- pattern is executed, and the value of `sum' is printed. In this
- example, the value of `sum' would be 80600.
-
- These more advanced `awk' techniques are covered in later sections
- (*note Actions::.). Before you can move on to more advanced `awk'
- programming, you have to know how `awk' interprets your input and
- displays your output. By manipulating "fields" and using special
- "print" statements, you can produce some very useful and spectacular
- looking reports.
-
-
- File: gawk-info, Node: Running gawk, Next: Comments, Prev: More Complex, Up: Getting Started
-
- How to Run `awk' Programs
- =========================
-
- There are several ways to run an `awk' program. If the program is
- short, it is easiest to include it in the command that runs `awk',
- like this:
-
- awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
-
- where PROGRAM consists of a series of PATTERNS and ACTIONS, as
- described earlier.
-
- When the program is long, you would probably prefer to put it in a
- file and run it with a command like this:
-
- awk -f PROGRAM-FILE INPUT-FILE1 INPUT-FILE2 ...
-
- * Menu:
-
- * One-shot:: Running a short throw--away `awk' program.
- * Read Terminal:: Using no input files (input from terminal instead).
- * Long:: Putting permanent `awk' programs in files.
- * Executable Scripts:: Making self--contained `awk' programs.
- * Command Line:: How the `awk' command line is laid out.
-
-
- File: gawk-info, Node: One-shot, Next: Read Terminal, Up: Running gawk
-
- One--shot Throw--away `awk' Programs
- ------------------------------------
-
- Once you are familiar with `awk', you will often type simple programs
- at the moment you want to use them. Then you can write the program
- as the first argument of the `awk' command, like this:
-
- awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
-
- where PROGRAM consists of a series of PATTERNS and ACTIONS, as
- described earlier.
-
- This command format tells the shell to start `awk' and use the
- PROGRAM to process records in the input file(s). There are single
- quotes around the PROGRAM so that the shell doesn't interpret any
- `awk' characters as special shell characters. They cause the shell
- to treat all of PROGRAM as a single argument for `awk'. They also
- allow PROGRAM to be more than one line long.
-
- This format is also useful for running short or medium--sized `awk'
- programs from shell scripts, because it avoids the need for a
- separate file for the `awk' program. A self--contained shell script
- is more reliable since there are no other files to misplace.
-
-
- File: gawk-info, Node: Read Terminal, Next: Long, Prev: One-shot, Up: Running gawk
-
- Running `awk' without Input Files
- ---------------------------------
-
- You can also use `awk' without any input files. If you type the
- command line:
-
- awk 'PROGRAM'
-
- then `awk' applies the PROGRAM to the "standard input", which usually
- means whatever you type on the terminal. This continues until you
- indicate end--of--file by typing `Control-d'.
-
- For example, if you type:
-
- awk '/th/'
-
- whatever you type next will be taken as data for that `awk' program.
- If you go on to type the following data,
-
- Kathy
- Ben
- Tom
- Beth
- Seth
- Karen
- Thomas
- `Control-d'
-
- then `awk' will print
-
- Kathy
- Beth
- Seth
-
- as matching the pattern `th'. Notice that it did not recognize
- `Thomas' as matching the pattern. The `awk' language is "case
- sensitive", and matches patterns *exactly*.
-
-
- File: gawk-info, Node: Long, Next: Executable Scripts, Prev: Read Terminal, Up: Running gawk
-
- Running Long Programs
- ---------------------
-
- Sometimes your `awk' programs can be very long. In this case it is
- more convenient to put the program into a separate file. To tell
- `awk' to use that file for its program, you type:
-
- awk -f SOURCE-FILE INPUT-FILE1 INPUT-FILE2 ...
-
- The `-f' tells the `awk' utility to get the `awk' program from the
- file SOURCE-FILE. Any file name can be used for SOURCE-FILE. For
- example, you could put the program:
-
- /th/
-
- into the file `th-prog'. Then the command:
-
- awk -f th-prog
-
- does the same thing as this one:
-
- awk '/th/'
-
- which was explained earlier (*note Read Terminal::.). Note that you
- don't usually need single quotes around the file name that you
- specify with `-f', because most file names don't contain any of the
- shell's special characters.
-
- If you want to identify your `awk' program files clearly as such, you
- can add the extension `.awk' to the filename. This doesn't affect
- the execution of the `awk' program, but it does make ``housekeeping''
- easier.
-
-
- File: gawk-info, Node: Executable Scripts, Next: Command Line, Prev: Long, Up: Running gawk
-
- Executable `awk' Programs
- -------------------------
-
- (The following section assumes that you are already somewhat familiar
- with `awk'.)
-
- Once you have learned `awk', you may want to write self--contained
- `awk' scripts, using the `#!' script mechanism. You can do this on
- BSD Unix systems and GNU.
-
- For example, you could create a text file named `hello', containing
- the following (where `BEGIN' is a feature we have not yet discussed):
-
- #! /bin/awk -f
-
- # a sample awk program
-
- BEGIN { print "hello, world" }
-
- After making this file executable (with the `chmod' command), you can
- simply type:
-
- hello
-
- at the shell, and the system will arrange to run `awk' as if you had
- typed:
-
- awk -f hello
-
- Self--contained `awk' scripts are particularly useful for putting
- `awk' programs into production on your system, without your users
- having to know that they are actually using an `awk' program.
-
- If your system does not support the `#!' mechanism, you can get a
- similar effect using a regular shell script. It would look something
- like this:
-
- : a sample awk program
-
- awk 'PROGRAM' "$@"
-
- Using this technique, it is *vital* to enclose the PROGRAM in single
- quotes to protect it from interpretation by the shell. If you omit
- the quotes, only a shell wizard can predict the result.
-
- The `"$@"' causes the shell to forward all the command line arguments
- to the `awk' program, without interpretation.
-
-
- File: gawk-info, Node: Command Line, Prev: Executable Scripts, Up: Running gawk
-
- Details of the `awk' Command Line
- ---------------------------------
-
- (The following section assumes that you are already familiar with
- `awk'.)
-
- There are two ways to run `awk'. Here are templates for both of
- them; items enclosed in `[' and `]' in these templates are optional.
-
- awk [ -FFS ] [ -- ] 'PROGRAM' FILE ...
- awk [ -FFS ] -f SOURCE-FILE [ -f SOURCE-FILE ... ] [ -- ] FILE ...
-
- Options begin with a minus sign, and consist of a single character.
- The options and their meanings are as follows:
-
- `-FFS'
- This sets the `FS' variable to FS (*note Special::.). As a
- special case, if FS is `t', then `FS' will be set to the tab
- character (`"\t"').
-
- `-f SOURCE-FILE'
- Indicates that the `awk' program is to be found in SOURCE-FILE
- instead of in the first non--option argument.
-
- `--'
- This signals the end of the command line options. If you wish
- to specify an input file named `-f', you can precede it with the
- `--' argument to prevent the `-f' from being interpreted as an
- option. This handling of `--' follows the POSIX argument
- parsing conventions.
-
- Any other options will be flagged as invalid with a warning message,
- but are otherwise ignored.
-
- If the `-f' option is *not* used, then the first non--option command
- line argument is expected to be the program text.
-
- The `-f' option may be used more than once on the command line.
- `awk' will read its program source from all of the named files, as if
- they had been concatenated together into one big file. This is
- useful for creating libraries of `awk' functions. Useful functions
- can be written once, and then retrieved from a standard place,
- instead of having to be included into each individual program. You
- can still type in a program at the terminal and use library
- functions, by specifying `/dev/tty' as one of the arguments to a
- `-f'. Type your program, and end it with the keyboard end--of--file
- character `Control-d'.
-
- Any additional arguments on the command line are made available to
- your `awk' program in the `ARGV' array (*note Special::.). These
- arguments are normally treated as input files to be processed in the
- order specified. However, an argument that has the form VAR`='VALUE,
- means to assign the value VALUE to the variable VAR--it does not
- specify a file at all.
-
- Command line options and the program text (if present) are omitted
- from the `ARGV' array. All other arguments, including variable
- assignments, are included (*note Special::.).
-
- The distinction between file name arguments and variable--assignment
- arguments is made when `awk' is about to open the next input file.
- At that point in execution, it checks the ``file name'' to see
- whether it is really a variable assignment; if so, instead of trying
- to read a file it will, *at that point in the execution*, assign the
- variable.
-
- Therefore, the variables actually receive the specified values after
- all previously specified files have been read. In particular, the
- values of variables assigned in this fashion are *not* available
- inside a `BEGIN' rule (*note BEGIN/END::.), since such rules are run
- before `awk' begins scanning the argument list.
-
- The variable assignment feature is most useful for assigning to
- variables such as `RS', `OFS', and `ORS', which control input and
- output formats, before listing the data files. It is also useful for
- controlling state if multiple passes are needed over a data file.
- For example:
-
- awk 'pass == 1 { PASS 1 STUFF }
- pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile
-
-
- File: gawk-info, Node: Comments, Next: Statements/Lines, Prev: Running gawk, Up: Getting Started
-
- Comments in `awk' Programs
- ==========================
-
- When you write a complicated `awk' program, you can put "comments" in
- the program file to help you remember what the program does, and how
- it works.
-
- A comment starts with the the sharp sign character, `#', and
- continues to the end of the line. The `awk' language ignores the
- rest of a line following a sharp sign. For example, we could have
- put the following into `th-prog':
-
- # This program finds records containing the pattern `th'. This is how
- # you continue comments on additional lines.
- /th/
-
- You can put comment lines into keyboard--composed throw--away `awk'
- programs also, but this usually isn't very useful; the purpose of a
- comment is to help yourself or another person understand the program
- at another time.
-
-
- File: gawk-info, Node: Statements/Lines, Next: When, Prev: Comments, Up: Getting Started
-
- `awk' Statements versus Lines
- =============================
-
- Most often, each line in an `awk' program is a separate statement or
- separate rule, like this:
-
- awk '/12/ { print $0 }
- /21/ { print $0 }' BBS-list inventory-shipped
-
- But sometimes statements can be more than one line, and lines can
- contain several statements.
-
- You can split a statement into multiple lines by inserting a newline
- after any of the following:
-
- , { ? : || &&
-
- Lines ending in `do' or `else' automatically have their statements
- continued on the following line(s). A newline at any other point
- ends the statement.
-
- If you would like to split a single statement into two lines at a
- point where a newline would terminate it, you can "continue" it by
- ending the first line with a backslash character, `\'. This is
- allowed absolutely anywhere in the statement, even in the middle of a
- string or regular expression. For example:
-
- awk '/This program is too long, so continue it\
- on the next line/ { print $1 }'
-
- We have generally not used backslash continuation in the sample
- programs in this manual. Since there is no limit on the length of a
- line, it is never strictly necessary; it just makes programs
- prettier. We have preferred to make them even more pretty by keeping
- the statements short. Backslash continuation is most useful when
- your `awk' program is in a separate source file, instead of typed in
- on the command line.
-
- *Warning: this does not work if you are using the C shell.*
- Continuation with backslash works for `awk' programs in files, and
- also for one--shot programs *provided* you are using the Bourne
- shell, the Korn shell, or the Bourne--again shell. But the C shell
- used on Berkeley Unix behaves differently! There, you must use two
- backslashes in a row, followed by a newline.
-
- When `awk' statements within one rule are short, you might want to
- put more than one of them on a line. You do this by separating the
- statements with semicolons, `;'. This also applies to the rules
- themselves. Thus, the above example program could have been written:
-
- /12/ { print $0 } ; /21/ { print $0 }
-
- *Note:* It is a new requirement that rules on the same line require
- semicolons as a separator in the `awk' language; it was done for
- consistency with the statements in the action part of rules.
-
-
- File: gawk-info, Node: When, Prev: Statements/Lines, Up: Getting Started
-
- When to Use `awk'
- =================
-
- What use is all of this to me, you might ask? Using additional
- operating system utilities, more advanced patterns, field separators,
- arithmetic statements, and other selection criteria, you can produce
- much more complex output. The `awk' language is very useful for
- producing reports from large amounts of raw data, like summarizing
- information from the output of standard operating system programs
- such as `ls'. (*Note A More Complex Example: More Complex.)
-
- Programs written with `awk' are usually much smaller than they would
- be in other languages. This makes `awk' programs easy to compose and
- use. Often `awk' programs can be quickly composed at your terminal,
- used once, and thrown away. Since `awk' programs are interpreted,
- you can avoid the usually lengthy edit--compile--test--debug cycle of
- software development.
-
- Complex programs have been written in `awk', including a complete
- retargetable assembler for 8--bit microprocessors (*note Glossary::.
- for more information) and a microcode assembler for a special purpose
- Prolog computer. However, `awk''s capabilities are strained by tasks
- of such complexity.
-
- If you find yourself writing `awk' scripts of more than, say, a few
- hundred lines, you might consider using a different programming
- language. Emacs Lisp is a good choice if you need sophisticated
- string or pattern matching capabilities. The shell is also good at
- string and pattern matching; in addition it allows powerful use of
- the standard utilities. More conventional languages like C, C++, or
- Lisp offer better facilities for system programming and for managing
- the complexity of large programs. Programs in these languages may
- require more lines of source code than the equivalent `awk' programs,
- but they will be easier to maintain and usually run more efficiently.
-
-
- File: gawk-info, Node: Reading Files, Next: Printing, Prev: Getting Started, Up: Top
-
- Reading Files (Input)
- *********************
-
- In the typical `awk' program, all input is read either from the
- standard input (usually the keyboard) or from files whose names you
- specify on the `awk' command line. If you specify input files, `awk'
- reads data from the first one until it reaches the end; then it reads
- the second file until it reaches the end, and so on. The name of the
- current input file can be found in the special variable `FILENAME'
- (*note Special::.).
-
- The input is split automatically into "records", and processed by the
- rules one record at a time. (Records are the units of text mentioned
- in the introduction; by default, a record is a line of text.) Each
- record read is split automatically into "fields", to make it more
- convenient for a rule to work on parts of the record under
- consideration.
-
- On rare occasions you will need to use the `getline' command, which
- can do explicit input from any number of files.
-
- * Menu:
-
- * Records:: Controlling how data is split into records.
- * Fields:: An introduction to fields.
- * Field Separators:: The field separator and how to change it.
- * Multiple:: Reading multi--line records.
-
- * Assignment Options:: Setting variables on the command line and a summary
- of command line syntax. This is an advanced method
- of input.
-
- * Getline:: Reading files under explicit program control
- using the `getline' function.
- * Close Input:: Closing an input file (so you can read from
- the beginning once more).
-
-
- File: gawk-info, Node: Records, Next: Fields, Up: Reading Files
-
- How Input is Split into Records
- ===============================
-
- The `awk' language divides its input into records and fields.
- Records are separated from each other by the "record separator". By
- default, the record separator is the "newline" character. Therefore,
- normally, a record is a line of text.
-
- Sometimes you may want to use a different character to separate your
- records. You can use different characters by changing the special
- variable `RS'.
-
- The value of `RS' is a string that says how to separate records; the
- default value is `"\n"', the string of just a newline character.
- This is why lines of text are the default record. Although `RS' can
- have any string as its value, only the first character of the string
- will be used as the record separator. The other characters are
- ignored. `RS' is exceptional in this regard; `awk' uses the full
- value of all its other special variables.
-
- The value of `RS' is changed by "assigning" it a new value (*note
- Assignment Ops::.). One way to do this is at the beginning of your
- `awk' program, before any input has been processed, using the special
- `BEGIN' pattern (*note BEGIN/END::.). This way, `RS' is changed to
- its new value before any input is read. The new value of `RS' is
- enclosed in quotation marks. For example:
-
- awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
-
- changes the value of `RS' to `/', the slash character, before reading
- any input. Records are now separated by a slash. The second rule in
- the `awk' program (the action with no pattern) will proceed to print
- each record. Since each `print' statement adds a newline at the end
- of its output, the effect of this `awk' program is to copy the input
- with each slash changed to a newline.
-
- Another way to change the record separator is on the command line,
- using the variable--assignment feature (*note Command Line::.).
-
- awk '...' RS="/" SOURCE-FILE
-
- `RS' will be set to `/' before processing SOURCE-FILE.
-
- The empty string (a string of no characters) has a special meaning as
- the value of `RS': it means that records are separated only by blank
- lines. *Note Multiple::, for more details.
-
- The `awk' utility keeps track of the number of records that have been
- read so far from the current input file. This value is stored in a
- special variable called `FNR'. It is reset to zero when a new file
- is started. Another variable, `NR', is the total number of input
- records read so far from all files. It starts at zero but is never
- automatically reset to zero.
-
- If you change the value of `RS' in the middle of an `awk' run, the
- new value is used to delimit subsequent records, but the record
- currently being processed (and records already finished) are not
- affected.
-
-
-