The Datafile PD-CD 4

home *** CD-ROM | disk | FTP | other *** search

/ The Datafile PD-CD 4 / DATAFILE_PDCD4.iso / unix / riscbsd / 1_1_contri / usd / 19_awk / awk

Wrap

Text File | 1986-05-22 | 26.1 KB | 1,409 lines

.\" @(#)awk 6.1 (Berkeley) 5/22/86 .\" .EH 'USD:19-%''Awk \(em A Pattern Scanning and Processing Language' .OH 'Awk \(em A Pattern Scanning and Processing Language''USD:19-%' .\" .fp 3 G no G on APS (use gb) or Dandelion Printer (use CW) .\" the .T is only a ditroff feature... .if '\*.T'dp' .fp 3 El .if '\*.T'aps' .fp 3 gB ....TM "78-1271-12, 78-1273-6" 39199 39199-11 .ND "September 1, 1978" ....TR 68 .\".RP . \" macros here .tr _\(em .if t .tr ~\(ap .tr |\(or .tr *\(** .de UC \&\\$3\s-1\\$1\\s0\&\\$2 .. .de IT .if n .ul \&\\$3\f2\\$1\fP\|\\$2 .. .de UL .if n .ul \&\\$3\f3\\$1\fP\&\\$2 .. .de P1 .DS I 3n .nf .if n .ta 5 10 15 20 25 30 35 40 45 50 55 60 .if t .ta .3i .6i .9i 1.2i .if t .tr -\-'\(fm*\(** .if t .tr _\(ul .ft 3 .lg 0 .ss 18 . \"use first argument as indent if present .. .de P2 .ps \\n(PS .vs \\n(VSp .ft R .ss 12 .if n .ls 2 .tr --''``^^!! .if t .tr _\(em .fi .lg .DE .. .hw semi-colon .hy 14 . \"2=not last lines; 4= no -xx; 8=no xx- . \"special chars in programs .de WS .sp \\$1 .. . \" end of macros .TL Awk \(em A Pattern Scanning and Processing Language .br (Second Edition) .AU "MH 2C-522" 4862 Alfred V. Aho .AU "MH 2C-518" 6021 Brian W. Kernighan .AU "MH 2C-514" 7214 Peter J. Weinberger .AI .MH .AB .IT Awk is a programming language whose basic operation is to search a set of files for patterns, and to perform specified actions upon lines or fields of lines which contain instances of those patterns. .IT Awk makes certain data selection and transformation operations easy to express; for example, the .IT awk program .sp .ce .ft 3 length > 72 .ft .sp prints all input lines whose length exceeds 72 characters; the program .ce .sp .ft 3 NF % 2 == 0 .ft R .sp prints all lines with an even number of fields; and the program .ce .sp .ft 3 { $1 = log($1); print } .ft R .sp replaces the first field of each line by its logarithm. .PP .IT Awk patterns may include arbitrary boolean combinations of regular expressions and of relational operators on strings, numbers, fields, variables, and array elements. Actions may include the same pattern-matching constructions as in patterns, as well as arithmetic and string expressions and assignments, .UL if-else , .UL while , .UL for statements, and multiple output streams. .PP This report contains a user's guide, a discussion of the design and implementation of .IT awk , and some timing statistics. ....It supersedes TM-77-1271-5, dated September 8, 1977. .AE .CS 6 1 7 0 1 4 .if n .ls 2 .nr PS 9 .nr VS 11 .NH Introduction .if t .2C .PP .IT Awk is a programming language designed to make many common information retrieval and text manipulation tasks easy to state and to perform. .PP The basic operation of .IT awk is to scan a set of input lines in order, searching for lines which match any of a set of patterns which the user has specified. For each pattern, an action can be specified; this action will be performed on each line that matches the pattern. .PP Readers familiar with the .UX program .IT grep\| .[ unix program manual .] will recognize the approach, although in .IT awk the patterns may be more general than in .IT grep , and the actions allowed are more involved than merely printing the matching line. For example, the .IT awk program .P1 {print $3, $2} .P2 prints the third and second columns of a table in that order. The program .P1 $2 ~ /A\||B\||C/ .P2 prints all input lines with an A, B, or C in the second field. The program .P1 $1 != prev { print; prev = $1 } .P2 prints all lines in which the first field is different from the previous first field. .NH 2 Usage .PP The command .P1 awk program [files] .P2 executes the .IT awk commands in the string .UL program on the set of named files, or on the standard input if there are no files. The statements can also be placed in a file .UL pfile , and executed by the command .P1 awk -f pfile [files] .P2 .NH 2 Program Structure .PP An .IT awk program is a sequence of statements of the form: .P1 .ft I pattern { action } pattern { action } ... .ft 3 .P2 Each line of input is matched against each of the patterns in turn. For each pattern that matches, the associated action is executed. When all the patterns have been tested, the next line is fetched and the matching starts over. .PP Either the pattern or the action may be left out, but not both. If there is no action for a pattern, the matching line is simply copied to the output. (Thus a line which matches several patterns can be printed several times.) If there is no pattern for an action, then the action is performed for every input line. A line which matches no pattern is ignored. .PP Since patterns and actions are both optional, actions must be enclosed in braces to distinguish them from patterns. .NH 2 Records and Fields .PP .IT Awk input is divided into ``records'' terminated by a record separator. The default record separator is a newline, so by default .IT awk processes its input a line at a time. The number of the current record is available in a variable named .UL NR . .PP Each input record is considered to be divided into ``fields.'' Fields are normally separated by white space \(em blanks or tabs \(em but the input field separator may be changed, as described below. Fields are referred to as .UL "$1, $2," and so forth, where .UL $1 is the first field, and .UL $0 is the whole input record itself. Fields may be assigned to. The number of fields in the current record is available in a variable named .UL NF . .PP The variables .UL FS and .UL RS refer to the input field and record separators; they may be changed at any time to any single character. The optional command-line argument \f3\-F\fIc\fR may also be used to set .UL FS to the character .IT c . .PP If the record separator is empty, an empty input line is taken as the record separator, and blanks, tabs and newlines are treated as field separators. .PP The variable .UL FILENAME contains the name of the current input file. .NH 2 Printing .PP An action may have no pattern, in which case the action is executed for all lines. The simplest action is to print some or all of a record; this is accomplished by the .IT awk command .UL print . The .IT awk program .P1 { print } .P2 prints each record, thus copying the input to the output intact. More useful is to print a field or fields from each record. For instance, .P1 print $2, $1 .P2 prints the first two fields in reverse order. Items separated by a comma in the print statement will be separated by the current output field separator when output. Items not separated by commas will be concatenated, so .P1 print $1 $2 .P2 runs the first and second fields together. .PP The predefined variables .UL NF and .UL NR can be used; for example .P1 { print NR, NF, $0 } .P2 prints each record preceded by the record number and the number of fields. .PP Output may be diverted to multiple files; the program .P1 { print $1 >"foo1"; print $2 >"foo2" } .P2 writes the first field, .UL $1 , on the file .UL foo1 , and the second field on file .UL foo2 . The .UL >> notation can also be used: .P1 print $1 >>"foo" .P2 appends the output to the file .UL foo . (In each case, the output files are created if necessary.) The file name can be a variable or a field as well as a constant; for example, .P1 print $1 >$2 .P2 uses the contents of field 2 as a file name. .PP Naturally there is a limit on the number of output files; currently it is 10. .PP Similarly, output can be piped into another process (on .UC UNIX only); for instance, .P1 print | "mail bwk" .P2 mails the output to .UL bwk . .PP The variables .UL OFS and .UL ORS may be used to change the current output field separator and output record separator. The output record separator is appended to the output of the .UL print statement. .PP .IT Awk also provides the .UL printf statement for output formatting: .P1 printf format expr, expr, ... .P2 formats the expressions in the list according to the specification in .UL format and prints them. For example, .P1 printf "%8.2f %10ld\en", $1, $2 .P2 prints .UL $1 as a floating point number 8 digits wide, with two after the decimal point, and .UL $2 as a 10-digit long decimal number, followed by a newline. No output separators are produced automatically; you must add them yourself, as in this example. The version of .UL printf is identical to that used with C. .[ C programm language prentice hall 1978 .] .NH 1 Patterns .PP A pattern in front of an action acts as a selector that determines whether the action is to be executed. A variety of expressions may be used as patterns: regular expressions, arithmetic relational expressions, string-valued expressions, and arbitrary boolean combinations of these. .NH 2 BEGIN and END .PP The special pattern .UL BEGIN matches the beginning of the input, before the first record is read. The pattern .UL END matches the end of the input, after the last record has been processed. .UL BEGIN and .UL END thus provide a way to gain control before and after processing, for initialization and wrapup. .PP As an example, the field separator can be set to a colon by .P1 BEGIN { FS = ":" } .ft I \&... rest of program ... .ft 3 .P2 Or the input lines may be counted by .P1 END { print NR } .P2 If .UL BEGIN is present, it must be the first pattern; .UL END must be the last if used. .NH 2 Regular Expressions .PP The simplest regular expression is a literal string of characters enclosed in slashes, like .P1 /smith/ .P2 This is actually a complete .IT awk program which will print all lines which contain any occurrence of the name ``smith''. If a line contains ``smith'' as part of a larger word, it will also be printed, as in .P1 blacksmithing .P2 .PP .IT Awk regular expressions include the regular expression forms found in the .UC UNIX text editor .IT ed\| .[ unix program manual .] and .IT grep (without back-referencing). In addition, .IT awk allows parentheses for grouping, | for alternatives, .UL + for ``one or more'', and .UL ? for ``zero or one'', all as in .IT lex . Character classes may be abbreviated: .UL [a\-zA\-Z0\-9] is the set of all letters and digits. As an example, the .IT awk program .P1 /[Aa]ho\||[Ww]einberger\||[Kk]ernighan/ .P2 will print all lines which contain any of the names ``Aho,'' ``Weinberger'' or ``Kernighan,'' whether capitalized or not. .PP Regular expressions (with the extensions listed above) must be enclosed in slashes, just as in .IT ed and .IT sed . Within a regular expression, blanks and the regular expression metacharacters are significant. To turn of the magic meaning of one of the regular expression characters, precede it with a backslash. An example is the pattern .P1 /\|\e/\^.\^*\e// .P2 which matches any string of characters enclosed in slashes. .PP One can also specify that any field or variable matches a regular expression (or does not match it) with the operators .UL ~ and .UL !~ . The program .P1 $1 ~ /[jJ]ohn/ .P2 prints all lines where the first field matches ``john'' or ``John.'' Notice that this will also match ``Johnson'', ``St. Johnsbury'', and so on. To restrict it to exactly .UL [jJ]ohn , use .P1 $1 ~ /^[jJ]ohn$/ .P2 The caret ^ refers to the beginning of a line or field; the dollar sign .UL $ refers to the end. .NH 2 Relational Expressions .PP An .IT awk pattern can be a relational expression involving the usual relational operators .UL < , .UL <= , .UL == , .UL != , .UL >= , and .UL > . An example is .P1 $2 > $1 + 100 .P2 which selects lines where the second field is at least 100 greater than the first field. Similarly, .P1 NF % 2 == 0 .P2 prints lines with an even number of fields. .PP In relational tests, if neither operand is numeric, a string comparison is made; otherwise it is numeric. Thus, .P1 $1 >= "s" .P2 selects lines that begin with an .UL s , .UL t , .UL u , etc. In the absence of any other information, fields are treated as strings, so the program .P1 $1 > $2 .P2 will perform a string comparison. .NH 2 Combinations of Patterns .PP A pattern can be any boolean combination of patterns, using the operators .UL \||\|| (or), .UL && (and), and .UL ! (not). For example, .P1 $1 >= "s" && $1 < "t" && $1 != "smith" .P2 selects lines where the first field begins with ``s'', but is not ``smith''. .UL && and .UL \||\|| guarantee that their operands will be evaluated from left to right; evaluation stops as soon as the truth or falsehood is determined. .NH 2 Pattern Ranges .PP The ``pattern'' that selects an action may also consist of two patterns separated by a comma, as in .P1 pat1, pat2 { ... } .P2 In this case, the action is performed for each line between an occurrence of .UL pat1 and the next occurrence of .UL pat2 (inclusive). For example, .P1 /start/, /stop/ .P2 prints all lines between .UL start and .UL stop , while .P1 NR == 100, NR == 200 { ... } .P2 does the action for lines 100 through 200 of the input. .NH 1 Actions .PP An .IT awk action is a sequence of action statements terminated by newlines or semicolons. These action statements can be used to do a variety of bookkeeping and string manipulating tasks. .NH 2 Built-in Functions .PP .IT Awk provides a ``length'' function to compute the length of a string of characters. This program prints each record, preceded by its length: .P1 {print length, $0} .P2 .UL length by itself is a ``pseudo-variable'' which yields the length of the current record; .UL length(argument) is a function which yields the length of its argument, as in the equivalent .P1 {print length($0), $0} .P2 The argument may be any expression. .PP .IT Awk also provides the arithmetic functions .UL sqrt , .UL log , .UL exp , and .UL int , for square root, base .IT e logarithm, exponential, and integer part of their respective arguments. .PP The name of one of these built-in functions, without argument or parentheses, stands for the value of the function on the whole record. The program .P1 length < 10 || length > 20 .P2 prints lines whose length is less than 10 or greater than 20. .PP The function .UL substr(s,\ m,\ n) produces the substring of .UL s that begins at position .UL m (origin 1) and is at most .UL n characters long. If .UL n is omitted, the substring goes to the end of .UL s . The function .UL index(s1,\ s2) returns the position where the string .UL s2 occurs in .UL s1 , or zero if it does not. .PP The function .UL sprintf(f,\ e1,\ e2,\ ...) produces the value of the expressions .UL e1 , .UL e2 , etc., in the .UL printf format specified by .UL f . Thus, for example, .P1 x = sprintf("%8.2f %10ld", $1, $2) .P2 sets .UL x to the string produced by formatting the values of .UL $1 and .UL $2 . .NH 2 Variables, Expressions, and Assignments .PP .IT Awk variables take on numeric (floating point) or string values according to context. For example, in .P1 x = 1 .P2 .UL x is clearly a number, while in .P1 x = "smith" .P2 it is clearly a string. Strings are converted to numbers and vice versa whenever context demands it. For instance, .P1 x = "3" + "4" .P2 assigns 7 to .UL x . Strings which cannot be interpreted as numbers in a numerical context will generally have numeric value zero, but it is unwise to count on this behavior. .PP By default, variables (other than built-ins) are initialized to the null string, which has numerical value zero; this eliminates the need for most .UL BEGIN sections. For example, the sums of the first two fields can be computed by .P1 { s1 += $1; s2 += $2 } END { print s1, s2 } .P2 .PP Arithmetic is done internally in floating point. The arithmetic operators are .UL + , .UL \- , .UL \(** , .UL / , and .UL % (mod). The C increment .UL ++ and decrement .UL \-\- operators are also available, and so are the assignment operators .UL += , .UL \-= , .UL *= , .UL /= , and .UL %= . These operators may all be used in expressions. .NH 2 Field Variables .PP Fields in .IT awk share essentially all of the properties of variables _ they may be used in arithmetic or string operations, and may be assigned to. Thus one can replace the first field with a sequence number like this: .P1 { $1 = NR; print } .P2 or accumulate two fields into a third, like this: .P1 { $1 = $2 + $3; print $0 } .P2 or assign a string to a field: .P1 { if ($3 > 1000) $3 = "too big" print } .P2 which replaces the third field by ``too big'' when it is, and in any case prints the record. .PP Field references may be numerical expressions, as in .P1 { print $i, $(i+1), $(i+n) } .P2 Whether a field is deemed numeric or string depends on context; in ambiguous cases like .P1 if ($1 == $2) ... .P2 fields are treated as strings. .PP Each input line is split into fields automatically as necessary. It is also possible to split any variable or string into fields: .P1 n = split(s, array, sep) .P2 splits the the string .UL s into .UL array[1] , \&..., .UL array[n] . The number of elements found is returned. If the .UL sep argument is provided, it is used as the field separator; otherwise .UL FS is used as the separator. .NH 2 String Concatenation .PP Strings may be concatenated. For example .P1 length($1 $2 $3) .P2 returns the length of the first three fields. Or in a .UL print statement, .P1 print $1 " is " $2 .P2 prints the two fields separated by `` is ''. Variables and numeric expressions may also appear in concatenations. .NH 2 Arrays .PP Array elements are not declared; they spring into existence by being mentioned. Subscripts may have .ul any non-null value, including non-numeric strings. As an example of a conventional numeric subscript, the statement .P1 x[NR] = $0 .P2 assigns the current input record to the .UL NR -th element of the array .UL x . In fact, it is possible in principle (though perhaps slow) to process the entire input in a random order with the .IT awk program .P1 { x[NR] = $0 } END { \fI... program ...\fP } .P2 The first action merely records each input line in the array .UL x . .PP Array elements may be named by non-numeric values, which gives .IT awk a capability rather like the associative memory of Snobol tables. Suppose the input contains fields with values like .UL apple , .UL orange , etc. Then the program .P1 /apple/ { x["apple"]++ } /orange/ { x["orange"]++ } END { print x["apple"], x["orange"] } .P2 increments counts for the named array elements, and prints them at the end of the input. .NH 2 Flow-of-Control Statements .PP .IT Awk provides the basic flow-of-control statements .UL if-else , .UL while , .UL for , and statement grouping with braces, as in C. We showed the .UL if statement in section 3.3 without describing it. The condition in parentheses is evaluated; if it is true, the statement following the .UL if is done. The .UL else part is optional. .PP The .UL while statement is exactly like that of C. For example, to print all input fields one per line, .P1 i = 1 while (i <= NF) { print $i ++i } .P2 .PP The .UL for statement is also exactly that of C: .P1 for (i = 1; i <= NF; i++) print $i .P2 does the same job as the .UL while statement above. .PP There is an alternate form of the .UL for statement which is suited for accessing the elements of an associative array: .P1 for (i in array) \fIstatement\f3 .P2 does .ul statement with .UL i set in turn to each element of .UL array . The elements are accessed in an apparently random order. Chaos will ensue if .UL i is altered, or if any new elements are accessed during the loop. .PP The expression in the condition part of an .UL if , .UL while or .UL for can include relational operators like .UL < , .UL <= , .UL > , .UL >= , .UL == (``is equal to''), and .UL != (``not equal to''); regular expression matches with the match operators .UL ~ and .UL !~ ; the logical operators .UL \||\|| , .UL && , and .UL ! ; and of course parentheses for grouping. .PP The .UL break statement causes an immediate exit from an enclosing .UL while or .UL for ; the .UL continue statement causes the next iteration to begin. .PP The statement .UL next causes .IT awk to skip immediately to the next record and begin scanning the patterns from the top. The statement .UL exit causes the program to behave as if the end of the input had occurred. .PP Comments may be placed in .IT awk programs: they begin with the character .UL # and end with the end of the line, as in .P1 print x, y # this is a comment .P2 .NH Design .PP The .UX system already provides several programs that operate by passing input through a selection mechanism. .IT Grep , the first and simplest, merely prints all lines which match a single specified pattern. .IT Egrep provides more general patterns, i.e., regular expressions in full generality; .IT fgrep searches for a set of keywords with a particularly fast algorithm. .IT Sed\| .[ unix programm manual .] provides most of the editing facilities of the editor .IT ed , applied to a stream of input. None of these programs provides numeric capabilities, logical relations, or variables. .PP .IT Lex\| .[ lesk lexical analyzer cstr .] provides general regular expression recognition capabilities, and, by serving as a C program generator, is essentially open-ended in its capabilities. The use of .IT lex , however, requires a knowledge of C programming, and a .IT lex program must be compiled and loaded before use, which discourages its use for one-shot applications. .PP .IT Awk is an attempt to fill in another part of the matrix of possibilities. It provides general regular expression capabilities and an implicit input/output loop. But it also provides convenient numeric processing, variables, more general selection, and control flow in the actions. It does not require compilation or a knowledge of C. Finally, .IT awk provides a convenient way to access fields within lines; it is unique in this respect. .PP .IT Awk also tries to integrate strings and numbers completely, by treating all quantities as both string and numeric, deciding which representation is appropriate as late as possible. In most cases the user can simply ignore the differences. .PP Most of the effort in developing .I awk went into deciding what .I awk should or should not do (for instance, it doesn't do string substitution) and what the syntax should be (no explicit operator for concatenation) rather than on writing or debugging the code. We have tried to make the syntax powerful but easy to use and well adapted to scanning files. For example, the absence of declarations and implicit initializations, while probably a bad idea for a general-purpose programming language, is desirable in a language that is meant to be used for tiny programs that may even be composed on the command line. .PP In practice, .IT awk usage seems to fall into two broad categories. One is what might be called ``report generation'' \(em processing an input to extract counts, sums, sub-totals, etc. This also includes the writing of trivial data validation programs, such as verifying that a field contains only numeric information or that certain delimiters are properly balanced. The combination of textual and numeric processing is invaluable here. .PP A second area of use is as a data transformer, converting data from the form produced by one program into that expected by another. The simplest examples merely select fields, perhaps with rearrangements. .NH Implementation .PP The actual implementation of .IT awk uses the language development tools available on the .UC UNIX operating system. The grammar is specified with .IT yacc ; .[ yacc johnson cstr .] the lexical analysis is done by .IT lex ; the regular expression recognizers are deterministic finite automata constructed directly from the expressions. An .IT awk program is translated into a parse tree which is then directly executed by a simple interpreter. .PP .IT Awk was designed for ease of use rather than processing speed; the delayed evaluation of variable types and the necessity to break input into fields makes high speed difficult to achieve in any case. Nonetheless, the program has not proven to be unworkably slow. .PP Table I below shows the execution (user + system) time on a PDP-11/70 of the .UC UNIX programs .IT wc , .IT grep , .IT egrep , .IT fgrep , .IT sed , .IT lex , and .IT awk on the following simple tasks: .IP "\ \ 1." count the number of lines. .IP "\ \ 2." print all lines containing ``doug''. .IP "\ \ 3." print all lines containing ``doug'', ``ken'' or ``dmr''. .IP "\ \ 4." print the third field of each line. .IP "\ \ 5." print the third and second fields of each line, in that order. .IP "\ \ 6." append all lines containing ``doug'', ``ken'', and ``dmr'' to files ``jdoug'', ``jken'', and ``jdmr'', respectively. .IP "\ \ 7." print each line prefixed by ``line-number\ :\ ''. .IP "\ \ 8." sum the fourth column of a table. .LP The program .IT wc merely counts words, lines and characters in its input; we have already mentioned the others. In all cases the input was a file containing 10,000 lines as created by the command .IT "ls \-l" ; each line has the form .P1 -rw-rw-rw- 1 ava 123 Oct 15 17:05 xxx .P2 The total length of this input is 452,960 characters. Times for .IT lex do not include compile or load. .PP As might be expected, .IT awk is not as fast as the specialized tools .IT wc , .IT sed , or the programs in the .IT grep family, but is faster than the more general tool .IT lex . In all cases, the tasks were about as easy to express as .IT awk programs as programs in these other languages; tasks involving fields were considerably easier to express as .IT awk programs. Some of the test programs are shown in .IT awk , .IT sed and .IT lex . .[ $LIST$ .] .1C .TS center; c c c c c c c c c c c c c c c c c c c|n|n|n|n|n|n|n|n|. Task Program 1 2 3 4 5 6 7 8 _ \fIwc\fR 8.6 \fIgrep\fR 11.7 13.1 \fIegrep\fR 6.2 11.5 11.6 \fIfgrep\fR 7.7 13.8 16.1 \fIsed\fR 10.2 11.6 15.8 29.0 30.5 16.1 \fIlex\fR 65.1 150.1 144.2 67.7 70.3 104.0 81.7 92.8 \fIawk\fR 15.0 25.6 29.9 33.3 38.9 46.4 71.4 31.1 _ .TE .sp .ce \fBTable I.\fR Execution Times of Programs. (Times are in sec.) .sp 2 .2C .PP The programs for some of these jobs are shown below. The .IT lex programs are generally too long to show. .LP AWK: .LP .P1 1. END {print NR} .P2 .P1 2. /doug/ .P2 .P1 3. /ken|doug|dmr/ .P2 .P1 4. {print $3} .P2 .P1 5. {print $3, $2} .P2 .P1 6. /ken/ {print >"jken"} /doug/ {print >"jdoug"} /dmr/ {print >"jdmr"} .P2 .P1 7. {print NR ": " $0} .P2 .P1 8. {sum = sum + $4} END {print sum} .P2 .LP SED: .LP .P1 1. $= .P2 .P1 2. /doug/p .P2 .P1 3. /doug/p /doug/d /ken/p /ken/d /dmr/p /dmr/d .P2 .P1 4. /[^ ]* [ ]*[^ ]* [ ]*\e([^ ]*\e) .*/s//\e1/p .P2 .P1 5. /[^ ]* [ ]*\e([^ ]*\e) [ ]*\e([^ ]*\e) .*/s//\e2 \e1/p .P2 .P1 6. /ken/w jken /doug/w jdoug /dmr/w jdmr .P2 .LP LEX: .LP .P1 1. %{ int i; %} %% \en i++; . ; %% yywrap() { printf("%d\en", i); } .P2 .P1 2. %% ^.*doug.*$ printf("%s\en", yytext); . ; \en ; .P2