home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
-
-
-
- SED -- A Non-interactive Text Editor
-
- Lee E. McMahon
-
-
-
-
- ABSTRACT
-
-
- Sed is a non-interactive context editor that
- runs on the UNIX(R) operating system. Sed is
- designed to be especially useful in three cases:
-
- 1) To edit files too large for comfortable interactive
- editing;
- 2) To edit any size file when the sequence of editing
- commands is too complicated to be comfortably
- typed in interactive mode.
- 3) To perform multiple `global' editing functions effi-
- ciently in one pass through the input.
-
- This memorandum constitutes a manual for users of
- sed.
-
-
- Introduction
-
- Sed is a non-interactive context editor designed to be espe-
- cially useful in three cases:
-
- 1) To edit files too large for comfortable interactive edit-
- ing;
- 2) To edit any size file when the sequence of editing com-
- mands is too complicated to be comfortably typed in interac-
- tive mode;
- 3) To perform multiple `global' editing functions effi-
- ciently in one pass through the input.
-
- Since only a few lines of the input reside in core at one
- time, and no temporary files are used, the effective size of
- file that can be edited is limited only by the requirement
- that the input and output fit simultaneously into available
- secondary storage.
-
- Complicated editing scripts can be created separately and
- given to sed as a command file. For complex edits, this
- saves considerable typing, and its attendant errors. Sed
- running from a command file is much more efficient than any
- interactive editor known to the author, even if that editor
- can be driven by a pre-written script.
-
-
-
-
-
-
-
-
-
- USD:18-2 SED -- A Non-interactive Text Editor
-
-
- The principal loss of functions compared to an interactive
- editor are lack of relative addressing (because of the line-
- at-a-time operation), and lack of immediate verification
- that a command has done what was intended.
-
- Sed is a lineal descendant of the UNIX editor, ed. Because
- of the differences between interactive and non-interactive
- operation, considerable changes have been made between ed
- and sed; even confirmed users of ed will frequently be sur-
- prised (and probably chagrined), if they rashly use sed
- without reading Sections 2 and 3 of this document. The most
- striking family resemblance between the two editors is in
- the class of patterns (`regular expressions') they recog-
- nize; the code for matching patterns is copied almost verba-
- tim from the code for ed, and the description of regular
- expressions in Section 2 is copied almost verbatim from the
- UNIX Programmer's Manual[1]. (Both code and description were
- written by Dennis M. Ritchie.)
-
-
- 1. Overall Operation
-
- Sed by default copies the standard input to the standard
- output, perhaps performing one or more editing commands on
- each line before writing it to the output. This behavior
- may be modified by flags on the command line; see Section
- 1.1 below.
-
- The general format of an editing command is:
-
- [address1,address2][function][arguments]
-
- One or both addresses may be omitted; the format of
- addresses is given in Section 2. Any number of blanks or
- tabs may separate the addresses from the function. The
- function must be present; the available commands are dis-
- cussed in Section 3. The arguments may be required or
- optional, according to which function is given; again, they
- are discussed in Section 3 under each individual function.
-
- Tab characters and spaces at the beginning of lines are
- ignored.
-
-
- 1.1. Command-line Flags
-
- Three flags are recognized on the command line:
- -n: tells sed not to copy all lines, but only those
- specified by p functions or p flags after s func-
- tions (see Section 3.3);
- -e: tells sed to take the next argument as an editing
- command;
- -f: tells sed to take the next argument as a file name;
- the file should contain editing commands, one to a
-
-
-
-
-
-
-
-
-
- SED -- A Non-interactive Text Editor USD:18-3
-
-
- line.
-
- 1.2. Order of Application of Editing Commands
-
- Before any editing is done (in fact, before any input file
- is even opened), all the editing commands are compiled into
- a form which will be moderately efficient during the execu-
- tion phase (when the commands are actually applied to lines
- of the input file). The commands are compiled in the order
- in which they are encountered; this is generally the order
- in which they will be attempted at execution time. The com-
- mands are applied one at a time; the input to each command
- is the output of all preceding commands.
-
- The default linear order of application of editing commands
- can be changed by the flow-of-control commands, t and b (see
- Section 3). Even when the order of application is changed
- by these commands, it is still true that the input line to
- any command is the output of any previously applied command.
-
- 1.3. Pattern-space
-
- The range of pattern matches is called the pattern space.
- Ordinarily, the pattern space is one line of the input text,
- but more than one line can be read into the pattern space by
- using the N command (Section 3.6.).
-
-
- 1.4. Examples
-
- Examples are scattered throughout the text. Except where
- otherwise noted, the examples all assume the following input
- text:
-
- In Xanadu did Kubla Khan
- A stately pleasure dome decree:
- Where Alph, the sacred river, ran
- Through caverns measureless to man
- Down to a sunless sea.
-
- (In no case is the output of the sed commands to be consid-
- ered an improvement on Coleridge.)
-
-
- Example:
-
- The command
-
- 2q
-
- will quit after copying the first two lines of the input.
- The output will be:
-
-
-
-
-
-
-
-
-
-
-
- USD:18-4 SED -- A Non-interactive Text Editor
-
-
- In Xanadu did Kubla Khan
- A stately pleasure dome decree:
-
-
- 2. ADDRESSES: Selecting lines for editing
-
- Lines in the input file(s) to which editing commands are to
- be applied can be selected by addresses. Addresses may be
- either line numbers or context addresses.
-
- The application of a group of commands can be controlled by
- one address (or address-pair) by grouping the commands with
- curly braces (`{ }')(Sec. 3.6.).
-
- 2.1. Line-number Addresses
-
- A line number is a decimal integer. As each line is read
- from the input, a line-number counter is incremented; a
- line-number address matches (selects) the input line which
- causes the internal counter to equal the address line-
- number. The counter runs cumulatively through multiple
- input files; it is not reset when a new input file is
- opened.
-
- As a special case, the character $ matches the last line of
- the last input file.
-
- 2.2. Context Addresses
-
- A context address is a pattern (`regular expression')
- enclosed in slashes (`/'). The regular expressions recog-
- nized by sed are constructed as follows:
-
- 1) An ordinary character (not one of those discussed below)
- is a regular expression, and matches that character.
-
- 2) A circumflex `^' at the beginning of a regular expression
- matches the null character at the beginning of a line.
- 3) A dollar-sign `$' at the end of a regular expression
- matches the null character at the end of a line.
- 4) The characters `\n' match an imbedded newline character,
- but not the newline at the end of the pattern space.
- 5) A period `.' matches any character except the terminal
- newline of the pattern space.
- 6) A regular expression followed by an asterisk `*' matches
- any number (including 0) of adjacent occurrences of the reg-
- ular expression it follows.
- 7) A string of characters in square brackets `[ ]' matches
- any character in the string, and no others. If, however,
- the first character of the string is circumflex `^', the
- regular expression matches any character except the charac-
- ters in the string and the terminal newline of the pattern
- space.
- 8) A concatenation of regular expressions is a regular
-
-
-
-
-
-
-
-
-
- SED -- A Non-interactive Text Editor USD:18-5
-
-
- expression which matches the concatenation of strings
- matched by the components of the regular expression.
- 9) A regular expression between the sequences `\(' and `\)'
- is identical in effect to the unadorned regular expression,
- but has side-effects which are described under the s command
- below and specification 10) immediately below.
- 10) The expression `\d' means the same string of characters
- matched by an expression enclosed in `\(' and `\)' earlier
- in the same pattern. Here d is a single digit; the string
- specified is that beginning with the dth occurrence of `\('
- counting from the left. For example, the expression
- `^\(.*\)\1' matches a line beginning with two repeated
- occurrences of the same string.
- 11) The null regular expression standing alone (e.g., `//')
- is equivalent to the last regular expression compiled.
-
- To use one of the special characters (^ $ . * [ ] \ /) as a
- literal (to match an occurrence of itself in the input),
- precede the special character by a backslash `\'.
-
- For a context address to `match' the input requires that the
- whole pattern within the address match some portion of the
- pattern space.
-
- 2.3. Number of Addresses
-
- The commands in the next section can have 0, 1, or 2
- addresses. Under each command the maximum number of allowed
- addresses is given. For a command to have more addresses
- than the maximum allowed is considered an error.
-
- If a command has no addresses, it is applied to every line
- in the input.
-
- If a command has one address, it is applied to all lines
- which match that address.
-
- If a command has two addresses, it is applied to the first
- line which matches the first address, and to all subsequent
- lines until (and including) the first subsequent line which
- matches the second address. Then an attempt is made on sub-
- sequent lines to again match the first address, and the pro-
- cess is repeated.
-
- Two addresses are separated by a comma.
-
- Examples:
-
- /an/ matches lines 1, 3, 4 in our sample text
- /an.*an/ matches line 1
- /^an/ matches no lines
- /./ matches all lines
- /\./ matches line 5
- /r*an/ matches lines 1,3, 4 (number = zero!)
-
-
-
-
-
-
-
-
-
- USD:18-6 SED -- A Non-interactive Text Editor
-
-
- /\(an\).*\1/ matches line 1
-
-
- 3. FUNCTIONS
-
- All functions are named by a single character. In the fol-
- lowing summary, the maximum number of allowable addresses is
- given enclosed in parentheses, then the single character
- function name, possible arguments enclosed in angles (< >),
- an expanded English translation of the single-character
- name, and finally a description of what each function does.
- The angles around the arguments are not part of the argu-
- ment, and should not be typed in actual editing commands.
-
- 3.1. Whole-line Oriented Functions
-
- (2)d -- delete lines The d function deletes from the
- file (does not write to the output) all those
- lines matched by its address(es). It also has the
- side effect that no further commands are attempted
- on the corpse of a deleted line; as soon as the d
- function is executed, a new line is read from the
- input, and the list of editing commands is re-
- started from the beginning on the new line.
- (2)n -- next line The n function reads the next line
- from the input, replacing the current line. The
- current line is written to the output if it should
- be. The list of editing commands is continued
- following the n command.
- (1)a\
- <text> -- append lines
- The a function causes the argument <text> to be
- written to the output after the line matched by
- its address. The a command is inherently multi-
- line; a must appear at the end of a line, and
- <text> may contain any number of lines. To pre-
- serve the one-command-to-a-line fiction, the inte-
- rior newlines must be hidden by a backslash char-
- acter (`\') immediately preceding the newline.
- The <text> argument is terminated by the first
- unhidden newline (the first one not immediately
- preceded by backslash). Once an a function is
- successfully executed, <text> will be written to
- the output regardless of what later commands do to
- the line which triggered it. The triggering line
- may be deleted entirely; <text> will still be
- written to the output. The <text> is not scanned
- for address matches, and no editing commands are
- attempted on it. It does not cause any change in
- the line-number counter.
- (1)i\
- <text> -- insert lines
- The i function behaves identically to the a func-
- tion, except that <text> is written to the output
-
-
-
-
-
-
-
-
-
- SED -- A Non-interactive Text Editor USD:18-7
-
-
- before the matched line. All other comments about
- the a function apply to the i function as well.
- (2)c\
- <text> -- change lines
- The c function deletes the lines selected by its
- address(es), and replaces them with the lines in
- <text>. Like a and i, c must be followed by a
- newline hidden by a backslash; and interior new
- lines in <text> must be hidden by backslashes.
- The c command may have two addresses, and there-
- fore select a range of lines. If it does, all the
- lines in the range are deleted, but only one copy
- of <text> is written to the output, not one copy
- per line deleted. As with a and i, <text> is not
- scanned for address matches, and no editing com-
- mands are attempted on it. It does not change the
- line-number counter. After a line has been
- deleted by a c function, no further commands are
- attempted on the corpse. If text is appended
- after a line by a or r functions, and the line is
- subsequently changed, the text inserted by the c
- function will be placed before the text of the a
- or r functions. (The r function is described in
- Section 3.4.)
- Note: Within the text put in the output by these functions,
- leading blanks and tabs will disappear, as always in sed
- commands. To get leading blanks and tabs into the output,
- precede the first desired blank or tab by a backslash; the
- backslash will not appear in the output.
-
- Example:
-
- The list of editing commands:
-
- n
- a\
- XXXX
- d
-
- applied to our standard input, produces:
-
- In Xanadu did Kubhla Khan
- XXXX
- Where Alph, the sacred river, ran
- XXXX
- Down to a sunless sea.
-
- In this particular case, the same effect would be produced
- by either of the two following command lists:
-
- n n
- i\ c\
- XXXX XXXX
- d
-
-
-
-
-
-
-
-
-
- USD:18-8 SED -- A Non-interactive Text Editor
-
-
- 3.2. Substitute Function
-
- One very important function changes parts of lines selected
- by a context search within the line.
- (2)s<pattern><replacement><flags> -- substitute The s
- function replaces part of a line (selected by
- <pattern>) with <replacement>. It can best be
- read:
- Substitute for <pattern>, <replacement>
- The <pattern> argument contains a pattern, exactly
- like the patterns in addresses (see 2.2 above).
- The only difference between <pattern> and a con-
- text address is that the context address must be
- delimited by slash (`/') characters; <pattern> may
- be delimited by any character other than space or
- newline. By default, only the first string
- matched by <pattern> is replaced, but see the g
- flag below. The <replacement> argument begins
- immediately after the second delimiting character
- of <pattern>, and must be followed immediately by
- another instance of the delimiting character.
- (Thus there are exactly three instances of the de-
- limiting character.) The <replacement> is not a
- pattern, and the characters which are special in
- patterns do not have special meaning in <replace-
- ment>. Instead, other characters are special:
- & is replaced by the string matched by
- <pattern>
- \d (where d is a single digit) is replaced by
- the dth substring matched by parts of
- <pattern> enclosed in `\(' and `\)'. If
- nested substrings occur in <pattern>,
- the dth is determined by counting open-
- ing delimiters (`\('). As in patterns,
- special characters may be made literal
- by preceding them with backslash (`\').
- The <flags> argument may contain the following
- flags:
- g -- substitute <replacement> for all (non-
- overlapping) instances of <pattern> in
- the line. After a successful substitu-
- tion, the scan for the next instance of
- <pattern> begins just after the end of
- the inserted characters; characters put
- into the line from <replacement> are not
- rescanned.
- p -- print the line if a successful replace-
- ment was done. The p flag causes the
- line to be written to the output if and
- only if a substitution was actually made
- by the s function. Notice that if sev-
- eral s functions, each followed by a p
- flag, successfully substitute in the
- same input line, multiple copies of the
-
-
-
-
-
-
-
-
-
- SED -- A Non-interactive Text Editor USD:18-9
-
-
- line will be written to the output: one
- for each successful substitution.
- w <filename> -- write the line to a file if a
- successful replacement was done. The w
- flag causes lines which are actually
- substituted by the s function to be
- written to a file named by <filename>.
- If <filename> exists before sed is run,
- it is overwritten; if not, it is cre-
- ated. A single space must separate w
- and <filename>. The possibilities of
- multiple, somewhat different copies of
- one input line being written are the
- same as for p. A maximum of 10 differ-
- ent file names may be mentioned after w
- flags and w functions (see below), com-
- bined.
-
- Examples:
-
- The following command, applied to our standard input,
-
- s/to/by/w changes
-
- produces, on the standard output:
-
- In Xanadu did Kubhla Khan
- A stately pleasure dome decree:
- Where Alph, the sacred river, ran
- Through caverns measureless by man
- Down by a sunless sea.
-
- and, on the file `changes':
-
- Through caverns measureless by man
- Down by a sunless sea.
-
- If the nocopy option is in effect, the command:
-
- s/[.,;?:]/*P&*/gp
-
- produces:
-
- A stately pleasure dome decree*P:*
- Where Alph*P,* the sacred river*P,* ran
- Down to a sunless sea*P.*
-
- Finally, to illustrate the effect of the g flag, the com-
- mand:
-
- /X/s/an/AN/p
-
- produces (assuming nocopy mode):
-
-
-
-
-
-
-
-
-
-
- USD:18-10 SED -- A Non-interactive Text Editor
-
-
- In XANadu did Kubhla Khan
-
- and the command:
-
- /X/s/an/AN/gp
-
- produces:
-
- In XANadu did Kubhla KhAN
-
-
- 3.3. Input-output Functions
-
- (2)p -- print The print function writes the addressed
- lines to the standard output file. They are writ-
- ten at the time the p function is encountered,
- regardless of what succeeding editing commands may
- do to the lines.
- (2)w <filename> -- write on <filename> The write func-
- tion writes the addressed lines to the file named
- by <filename>. If the file previously existed, it
- is overwritten; if not, it is created. The lines
- are written exactly as they exist when the write
- function is encountered for each line, regardless
- of what subsequent editing commands may do to
- them. Exactly one space must separate the w and
- <filename>. A maximum of ten different files may
- be mentioned in write functions and w flags after
- s functions, combined.
- (1)r <filename> -- read the contents of a file The read
- function reads the contents of <filename>, and
- appends them after the line matched by the
- address. The file is read and appended regardless
- of what subsequent editing commands do to the line
- which matched its address. If r and a functions
- are executed on the same line, the text from the a
- functions and the r functions is written to the
- output in the order that the functions are exe-
- cuted. Exactly one space must separate the r and
- <filename>. If a file mentioned by a r function
- cannot be opened, it is considered a null file,
- not an error, and no diagnostic is given.
- NOTE: Since there is a limit to the number of files that can
- be opened simultaneously, care should be taken that no more
- than ten files be mentioned in w functions or flags; that
- number is reduced by one if any r functions are present.
- (Only one read file is open at one time.)
-
- Examples
-
- Assume that the file `note1' has the following contents:
-
- Note: Kubla Khan (more properly Kublai Khan;
- 1216-1294) was the grandson and most eminent
-
-
-
-
-
-
-
-
-
- SED -- A Non-interactive Text Editor USD:18-11
-
-
- successor of Genghiz (Chingiz) Khan, and founder
- of the Mongol dynasty in China.
-
- Then the following command:
-
- /Kubla/r note1
-
- produces:
-
- In Xanadu did Kubla Khan
- Note: Kubla Khan (more properly Kublai Khan;
- 1216-1294) was the grandson and most eminent suc-
- cessor of Genghiz (Chingiz) Khan, and founder of
- the Mongol dynasty in China.
- A stately pleasure dome decree:
- Where Alph, the sacred river, ran
- Through caverns measureless to man
- Down to a sunless sea.
-
-
- 3.4. Multiple Input-line Functions
-
- Three functions, all spelled with capital letters, deal spe-
- cially with pattern spaces containing imbedded newlines;
- they are intended principally to provide pattern matches
- across lines in the input.
- (2)N -- Next line The next input line is appended to
- the current line in the pattern space; the two
- input lines are separated by an imbedded newline.
- Pattern matches may extend across the imbedded
- newline(s).
- (2)D -- Delete first part of the pattern space Delete
- up to and including the first newline character in
- the current pattern space. If the pattern space
- becomes empty (the only newline was the terminal
- newline), read another line from the input. In
- any case, begin the list of editing commands again
- from its beginning.
- (2)P -- Print first part of the pattern space Print up
- to and including the first newline in the pattern
- space.
- The P and D functions are equivalent to their lower-case
- counterparts if there are no imbedded newlines in the pat-
- tern space.
-
- 3.5. Hold and Get Functions
-
- Four functions save and retrieve part of the input for pos-
- sible later use.
- (2)h -- hold pattern space The h functions copies the
- contents of the pattern space into a hold area
- (destroying the previous contents of the hold
- area).
- (2)H -- Hold pattern space The H function appends the
-
-
-
-
-
-
-
-
-
- USD:18-12 SED -- A Non-interactive Text Editor
-
-
- contents of the pattern space to the contents of
- the hold area; the former and new contents are
- separated by a newline.
- (2)g -- get contents of hold area The g function copies
- the contents of the hold area into the pattern
- space (destroying the previous contents of the
- pattern space).
- (2)G -- Get contents of hold area The G function
- appends the contents of the hold area to the con-
- tents of the pattern space; the former and new
- contents are separated by a newline.
- (2)x -- exchange The exchange command interchanges the
- contents of the pattern space and the hold area.
-
- Example
-
- The commands
- 1h
- 1s/ did.*//
- 1x
- G
- s/\n/ :/
- applied to our standard example, produce:
- In Xanadu did Kubla Khan :In Xanadu
- A stately pleasure dome decree: :In Xanadu
- Where Alph, the sacred river, ran :In Xanadu
- Through caverns measureless to man :In Xanadu
- Down to a sunless sea. :In Xanadu
-
- 3.6. Flow-of-Control Functions
-
- These functions do no editing on the input lines, but con-
- trol the application of functions to the lines selected by
- the address part.
- (2)! -- Don't The Don't command causes the next command
- (written on the same line), to be applied to all
- and only those input lines not selected by the
- adress part.
- (2){ -- Grouping The grouping command `{' causes the
- next set of commands to be applied (or not
- applied) as a block to the input lines selected by
- the addresses of the grouping command. The first
- of the commands under control of the grouping may
- appear on the same line as the `{' or on the next
- line.
-
- The group of commands is terminated by a matching `}' stand-
- ing on a line by itself.
-
- Groups can be nested.
- (0):<label> -- place a label The label function marks a
- place in the list of editing commands which may be referred
- to by b and t functions. The <label> may be any sequence of
- eight or fewer characters; if two different colon functions
-
-
-
-
-
-
-
-
-
- SED -- A Non-interactive Text Editor USD:18-13
-
-
- have identical labels, a compile time diagnostic will be
- generated, and no execution attempted.
- (2)b<label> -- branch to label The branch function causes
- the sequence of editing commands being applied to the cur-
- rent input line to be restarted immediately after the place
- where a colon function with the same <label> was encoun-
- tered. If no colon function with the same label can be
- found after all the editing commands have been compiled, a
- compile time diagnostic is produced, and no execution is
- attempted. A b function with no <label> is taken to be a
- branch to the end of the list of editing commands; whatever
- should be done with the current input line is done, and
- another input line is read; the list of editing commands is
- restarted from the beginning on the new line.
- (2)t<label> -- test substitutions The t function tests
- whether any successful substitutions have been made on the
- current input line; if so, it branches to <label>; if not,
- it does nothing. The flag which indicates that a successful
- substitution has been executed is reset by:
- 1) reading a new input line, or
- 2) executing a t function.
-
- 3.7. Miscellaneous Functions
-
- (1)= -- equals The = function writes to the standard
- output the line number of the line matched by its
- address.
- (1)q -- quit The q function causes the current line to
- be written to the output (if it should be), any
- appended or read text to be written, and execution
- to be terminated.
-
-
- Reference
-
- [1] Ken Thompson and Dennis M. Ritchie, The UNIX Program-
- mer's Manual. Bell Laboratories, 1978.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-