home *** CD-ROM | disk | FTP | other *** search
-
-
-
- RRRREEEEGGGGEEEEXXXXPPPP((((3333)))) UUUUNNNNIIIIXXXX 5555....0000 ((((llllooooccccaaaallll)))) RRRREEEEGGGGEEEEXXXXPPPP((((3333))))
-
-
-
- NNNNAAAAMMMMEEEE
- regcomp, regexec, regsub, regerror - regular expression
- handler
-
- SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
- ####iiiinnnncccclllluuuuddddeeee <<<<rrrreeeeggggeeeexxxxpppp....hhhh>>>>
-
- rrrreeeeggggeeeexxxxpppp ****rrrreeeeggggccccoooommmmpppp((((eeeexxxxpppp))))
- cccchhhhaaaarrrr ****eeeexxxxpppp;;;;
-
- iiiinnnntttt rrrreeeeggggeeeexxxxeeeecccc((((pppprrrroooogggg,,,, ssssttttrrrriiiinnnngggg))))
- rrrreeeeggggeeeexxxxpppp ****pppprrrroooogggg;;;;
- cccchhhhaaaarrrr ****ssssttttrrrriiiinnnngggg;;;;
-
- rrrreeeeggggssssuuuubbbb((((pppprrrroooogggg,,,, ssssoooouuuurrrrcccceeee,,,, ddddeeeesssstttt))))
- rrrreeeeggggeeeexxxxpppp ****pppprrrroooogggg;;;;
- cccchhhhaaaarrrr ****ssssoooouuuurrrrcccceeee;;;;
- cccchhhhaaaarrrr ****ddddeeeesssstttt;;;;
-
- rrrreeeeggggeeeerrrrrrrroooorrrr((((mmmmssssgggg))))
- cccchhhhaaaarrrr ****mmmmssssgggg;;;;
-
- DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
- These functions implement _e_g_r_e_p(1)-style regular expressions
- and supporting facilities.
-
- _R_e_g_c_o_m_p compiles a regular expression into a structure of
- type _r_e_g_e_x_p, and returns a pointer to it. The space has
- been allocated using _m_a_l_l_o_c(3) and may be released by _f_r_e_e.
-
- _R_e_g_e_x_e_c matches a NUL-terminated _s_t_r_i_n_g against the compiled
- regular expression in _p_r_o_g. It returns 1 for success and 0
- for failure, and adjusts the contents of _p_r_o_g's _s_t_a_r_t_p and
- _e_n_d_p (see below) accordingly.
-
- The members of a _r_e_g_e_x_p structure include at least the
- following (not necessarily in order):
-
- char *startp[NSUBEXP];
- char *endp[NSUBEXP];
-
- where _N_S_U_B_E_X_P is defined (as 10) in the header file. Once a
- successful _r_e_g_e_x_e_c has been done using the _r_e_g_e_x_p, each
- _s_t_a_r_t_p-_e_n_d_p pair describes one substring within the _s_t_r_i_n_g,
- with the _s_t_a_r_t_p pointing to the first character of the
- substring and the _e_n_d_p pointing to the first character
- following the substring. The 0th substring is the substring
- of _s_t_r_i_n_g that matched the whole regular expression. The
- others are those substrings that matched parenthesized
- expressions within the regular expression, with
- parenthesized expressions numbered in left-to-right order of
- their opening parentheses.
-
-
-
- Page 1 (printed 9/2/87)
-
-
-
-
-
-
- RRRREEEEGGGGEEEEXXXXPPPP((((3333)))) UUUUNNNNIIIIXXXX 5555....0000 ((((llllooooccccaaaallll)))) RRRREEEEGGGGEEEEXXXXPPPP((((3333))))
-
-
-
- _R_e_g_s_u_b copies _s_o_u_r_c_e to _d_e_s_t, making substitutions according
- to the most recent _r_e_g_e_x_e_c performed using _p_r_o_g. Each
- instance of `&' in _s_o_u_r_c_e is replaced by the substring
- indicated by _s_t_a_r_t_p[_0] and _e_n_d_p[_0]. Each instance of `\_n',
- where _n is a digit, is replaced by the substring indicated
- by _s_t_a_r_t_p[_n] and _e_n_d_p[_n]. To get a literal `&' or `\_n' into
- _d_e_s_t, prefix it with `\'; to get a literal `\' preceding `&'
- or `\_n', prefix it with another `\'.
-
- _R_e_g_e_r_r_o_r is called whenever an error is detected in _r_e_g_c_o_m_p,
- _r_e_g_e_x_e_c, or _r_e_g_s_u_b. The default _r_e_g_e_r_r_o_r writes the string
- _m_s_g, with a suitable indicator of origin, on the standard
- error output and invokes _e_x_i_t(2). _R_e_g_e_r_r_o_r can be replaced
- by the user if other actions are desirable.
-
- RRRREEEEGGGGUUUULLLLAAAARRRR EEEEXXXXPPPPRRRREEEESSSSSSSSIIIIOOOONNNN SSSSYYYYNNNNTTTTAAAAXXXX
- A regular expression is zero or more _b_r_a_n_c_h_e_s, separated by
- `|'. It matches anything that matches one of the branches.
-
- A branch is zero or more _p_i_e_c_e_s, concatenated. It matches a
- match for the first, followed by a match for the second,
- etc.
-
- A piece is an _a_t_o_m possibly followed by `*', `+', or `?'.
- An atom followed by `*' matches a sequence of 0 or more
- matches of the atom. An atom followed by `+' matches a
- sequence of 1 or more matches of the atom. An atom followed
- by `?' matches a match of the atom, or the null string.
-
- An atom is a regular expression in parentheses (matching a
- match for the regular expression), a _r_a_n_g_e (see below), `.'
- (matching any single character), `^' (matching the null
- string at the beginning of the input string), `$' (matching
- the null string at the end of the input string), a `\'
- followed by a single character (matching that character), or
- a single character with no other significance (matching that
- character).
-
- A _r_a_n_g_e is a sequence of characters enclosed in `[]'. It
- normally matches any single character from the sequence. If
- the sequence begins with `^', it matches any single
- character _n_o_t from the rest of the sequence. If two
- characters in the sequence are separated by `-', this is
- shorthand for the full list of ASCII characters between them
- (e.g. `[0-9]' matches any decimal digit). To include a
- literal `]' in the sequence, make it the first character
- (following a possible `^'). To include a literal `-', make
- it the first or last character.
-
- AAAAMMMMBBBBIIIIGGGGUUUUIIIITTTTYYYY
- If a regular expression could match two different parts of
- the input string, it will match the one which begins
-
-
-
- Page 2 (printed 9/2/87)
-
-
-
-
-
-
- RRRREEEEGGGGEEEEXXXXPPPP((((3333)))) UUUUNNNNIIIIXXXX 5555....0000 ((((llllooooccccaaaallll)))) RRRREEEEGGGGEEEEXXXXPPPP((((3333))))
-
-
-
- earliest. If both begin in the same place but match
- different lengths, or match the same length in different
- ways, life gets messier, as follows.
-
- In general, the possibilities in a list of branches are
- considered in left-to-right order, the possibilities for
- `*', `+', and `?' are considered longest-first, nested
- constructs are considered from the outermost in, and
- concatenated constructs are considered leftmost-first. The
- match that will be chosen is the one that uses the earliest
- possibility in the first choice that has to be made. If
- there is more than one choice, the next will be made in the
- same manner (earliest possibility) subject to the decision
- on the first choice. And so forth.
-
- For example, `(ab|a)b*c' could match `abc' in one of two
- ways. The first choice is between `ab' and `a'; since `ab'
- is earlier, and does lead to a successful overall match, it
- is chosen. Since the `b' is already spoken for, the `b*'
- must match its last possibility-the empty string-since it
- must respect the earlier choice.
-
- In the particular case where no `|'s are present and there
- is only one `*', `+', or `?', the net effect is that the
- longest possible match will be chosen. So `ab*', presented
- with `xabbbby', will match `abbbb'. Note that if `ab*' is
- tried against `xabyabbbz', it will match `ab' just after
- `x', due to the begins-earliest rule. (In effect, the
- decision on where to start the match is the first choice to
- be made, hence subsequent choices must respect it even if
- this leads them to less-preferred alternatives.)
-
- SSSSEEEEEEEE AAAALLLLSSSSOOOO
- egrep(1), expr(1)
-
- DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS
- _R_e_g_c_o_m_p returns NULL for a failure (_r_e_g_e_r_r_o_r permitting),
- where failures are syntax errors, exceeding implementation
- limits, or applying `+' or `*' to a possibly-null operand.
-
- HHHHIIIISSSSTTTTOOOORRRRYYYY
- Both code and manual page were written at U of T. They are
- intended to be compatible with the Bell V8 _r_e_g_e_x_p(3), but
- are not derived from Bell code.
-
- BBBBUUUUGGGGSSSS
- Empty branches and empty regular expressions are not
- portable to V8.
-
- The restriction against applying `*' or `+' to a possibly-
- null operand is an artifact of the simplistic
- implementation.
-
-
-
- Page 3 (printed 9/2/87)
-
-
-
-
-
-
- RRRREEEEGGGGEEEEXXXXPPPP((((3333)))) UUUUNNNNIIIIXXXX 5555....0000 ((((llllooooccccaaaallll)))) RRRREEEEGGGGEEEEXXXXPPPP((((3333))))
-
-
-
- Does not support _e_g_r_e_p's newline-separated branches; neither
- does the V8 _r_e_g_e_x_p(3), though.
-
- Due to emphasis on compactness and simplicity, it's not
- strikingly fast. It does give special attention to handling
- simple cases quickly.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Page 4 (printed 9/2/87)
-
-
-
-