home *** CD-ROM | disk | FTP | other *** search
- Regular Expression matching functions.
-
- Seven functions are supplied.
-
- 1. re_create(buffer,len,translate)
-
- Creates a regular expression buffer, re, of type (REGEX *).
- The user may supply three (optional) parameters. BUFFER and
- LEN define a user-supplied (fixed) buffer to hold the compiled
- regular expression. If this is not supplied, the package
- automatically allocates a buffer of the required size from the
- heap. TRANSLATE is a 256-character translation table for
- characters. If it is supplied, the package treats character c
- in all cases as if it were translate[c].
-
- 2. re_free(re)
-
- Frees all storage associated with the regular expression
- buffer RE.
-
- 3. re_compile(patt,re)
-
- Compiles the pattern string PATT into a regular expression
- buffer RE. Special characters (defined in the header file
- H.Chars) are as described below. The return value is 0 if the
- compile succeeds, or a pointer to an error message if it
- fails.
-
- 4. re_anchored_match(re,str,len,start,mem)
-
- Matches string STR against the regular expression buffer RE.
- The match starts at position START in STR, and must match
- starting from there. The value returned is the number of
- characters matched or RE_FAIL if the match fails, or RE_ERROR
- if an error occurs. The optional argument LEN is the length of
- STR, and will be calculated if it is omitted (-1). The optional
- argument MEM is a (REGMEM *) which will be filled with details
- of the matched portions of the string. mem[0] holds the whole
- matched string, and mem[1] to mem[9] hold the values matched
- by the regular expression memories \1 to \9.
-
- 5. re_match(re,str,len,mem)
-
- Matches string STR against the regular expression buffer RE.
- The match may occur at any position in STR, and the value
- returned is the position of the start of the match (0 to
- strlen(str)), or RE_FAIL if the match fails, or RE_ERROR if an
- error occurs. The optional argument LEN is the length of STR,
- and will be calculated if it is omitted (-1). The optional
- argument MEM is a (REGMEM *) which will be filled with details
- of the matched portions of the string. mem[0] holds the whole
- matched string, and mem[1] to mem[9] hold the values matched by
- the regular expression memories \1 to \9. Note that the only way
- of finding the matched string is via mem[0].
-
- 6. re_magic(string)
-
- Tests STRING to see if it has any regular expression operators
- in it. Returns 1 if so, 0 otherwise. Note that brackets ('(' and
- ')' ) are not treated as operators, as they are so common in
- normal strings.
-
- 7. re_dump(re)
-
- This is a debugging tool, and prints a formatted listing of the
- regular expression in buffer RE.
-
- Special characters.
-
- The special characters which can be used in a regular expression
- pattern are as follows:
-
- Character expressions (ce's)
-
- ^ Start of a line
- $ End of a line
- \` Start of the string
- \' End of the string
- \< Start of a word
- \> End of a word
- \@ A word boundary
- . Any character
- \w A word character
- [...] A set of characters.
- \c Where `c' is a special character. Matches c.
- c Where `c' is any non-special character. Matches itself.
-
- Operators
-
- ~ Not. `~ce' matches anything but ce.
- | Or. `re1 | re2' matches either re1 or re2.
- * Repeat. `re*' matches re repeated 0 or more times.
- + Many. `re+' matches re repeated 1 or more times.
- ? Optional. `re?' matches re or nothing.
- (...) Bracketing. `(re)' matches the same as re.
-
- Memory.
-
- \{ Start memory.
- \} End memory.
- \n Match the characters in memory 'n' (n is 1-9).
-
- The operators \{...\} do not affect matching, but the
- characters matched by the expression between the n'th \{ and
- its corresponding \} are saved in memory n, for use in \n
- matches, and to return in the MEM array.
-
- Words.
-
- Word characters are numbers and letters (ie. \w is the same as
- the expression [a-zA-Z0-9]). A word boundary is the position
- between a word character and a non-word character (in either
- order).
-
- Character sets.
-
- Within sets, [...], all characters are taken literally, except \,
- ], and -. The '-' character indicates a range, unless it is the
- first or last character, when it indicates a literal '-'. The '\'
- character causes the following character to be taken literally
- (as in \-, \\, \]). In addition the normal C escape sequences \b,
- \f, \n, \r, \t, \v are available.
-
-