home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- _4. _R_E_G_U_L_A_R _E_X_P_R_E_S_S_I_O_N_S
-
-
- Elvis uses regular expressions for searching and sub-
- stututions. A regular expression is a text string in which
- some characters have special meanings. This is much more
- powerful than simple text matching.
-
- _S_y_n_t_a_x
-
- Elvis' regexp package treats the following one- or
- two-character strings (called meta-characters) in special
- ways:
-
- \(_s_u_b_e_x_p_r_e_s_s_i_o_n\)
- The \( and \) metacharacters are used to delimit
- subexpressions. When the regular expression matches
- a particular chunk of text, Elvis will remember
- which portion of that chunk matched the _s_u_b_e_x_p_r_e_s_-
- _s_i_o_n. The :s/regexp/newtext/ command makes use of
- this feature.
-
- ^ The ^ metacharacter matches the beginning of a line.
- If, for example, you wanted to find "foo" at the
- beginning of a line, you would use a regular expres-
- sion such as /^foo/. Note that ^ is only a meta-
- character if it occurs at the beginning of a regular
- expression; anyplace else, it is treated as a normal
- character.
-
- $ The $ metacharacter matches the end of a line. It
- is only a metacharacter when it occurs at the end of
- a regular expression; elsewhere, it is treated as a
- normal character. For example, the regular expres-
- sion /$$/ will search for a dollar sign at the end
- of a line.
-
- \< The \< metacharacter matches a zero-length string at
- the beginning of a word. A word is considered to be
- a string of 1 or more letters and digits. A word
- can begin at the beginning of a line or after 1 or
- more non-alphanumeric characters.
-
- \> The \> metacharacter matches a zero-length string at
- the end of a word. A word can end at the end of the
- line or before 1 or more non-alphanumeric charac-
- ters. For example, /\<end\>/ would find any
- instance of the word "end", but would ignore any
- instances of e-n-d inside another word such as
- "calendar".
-
- . The . metacharacter matches any single character.
-
- [_c_h_a_r_a_c_t_e_r-_l_i_s_t]
-
-
-
- June 13, 1992
-
-
-
-
-
- 4-2 REGULAR EXPRESSIONS 4-2
-
-
- This matches any single character from the
- _c_h_a_r_a_c_t_e_r-_l_i_s_t. Inside the _c_h_a_r_a_c_t_e_r-_l_i_s_t, you can
- denote a span of characters by writing only the
- first and last characters, with a hyphen between
- them. If the _c_h_a_r_a_c_t_e_r-_l_i_s_t is preceded by a ^
- character, then the list is inverted -- it will
- match character that _i_s_n'_t mentioned in the list.
- For example, /[a-zA-Z]/ matches any letter, and /[^
- ]/ matches anything other than a blank.
-
- \{_n\} This is a closure operator, which means that it can
- only be placed after something that matches a single
- character. It controls the number of times that the
- single-character expression should be repeated.
-
- The \{_n\} operator, in particular, means that the
- preceding expression should be repeated exactly _n
- times. For example, /^-\{80\}$/ matches a line of
- eighty hyphens, and /\<[a-zA-Z]\{4\}\>/ matches any
- four-letter word.
-
- \{_n,_m\} This is a closure operator which means that the
- preceding single-character expression should be
- repeated between _n and _m times, inclusive. If the _m
- is omitted (but the comma is present) then _m is
- taken to be inifinity. For example, /"[^"]\{3,5\}"/
- matches any pair of quotes which contains three,
- four, or five non-quote characters.
-
- * The * metacharacter is a closure operator which
- means that the preceding single-character expression
- can be repeated zero or more times. It is
- equivelent to \{0,\}. For example, /.*/ matches a
- whole line.
-
- \+ The \+ metacharacter is a closure operator which
- means that the preceding single-character expression
- can be repeated one or more times. It is equivelent
- to \{1,\}. For example, /.\+/ matches a whole line,
- but only if the line contains at least one charac-
- ter. It doesn't match empty lines.
-
- \? The \? metacharacter is a closure operator which
- indicates that the preceding single-character
- expression is optional -- that is, that it can occur
- 0 or 1 times. It is equivelent to \{0,1\}. For
- example, /no[ -]\?one/ matches "no one", "no-one",
- or "noone".
-
- Anything else is treated as a normal character which
- must exactly match a character from the scanned text. The
- special strings may all be preceded by a backslash to force
- them to be treated normally.
-
-
-
-
- June 13, 1992
-
-
-
-
-
- 4-3 REGULAR EXPRESSIONS 4-3
-
-
- _S_u_b_s_t_i_t_u_t_i_o_n_s
-
- The :s command has at least two arguments: a regular
- expression, and a substitution string. The text that
- matched the regular expression is replaced by text which is
- derived from the substitution string.
-
- Most characters in the substitution string are copied
- into the text literally but a few have special meaning:
-
- & Insert a copy of the original text
- ~ Insert a copy of the previous replacement text
- \1 Insert a copy of that portion of the original text which
- matched the first set of \( \) parentheses
- \2-\9 Do the same for the second (etc.) pair of \( \)
- \U Convert all chars of any later & or \# to uppercase
- \L Convert all chars of any later & or \# to lowercase
- \E End the effect of \U or \L
- \u Convert the first char of the next & or \# to uppercase
- \l Convert the first char of the next & or \# to lowercase
-
-
- These may be preceded by a backslash to force them to
- be treated normally. If "nomagic" mode is in effect, then &
- and ~ will be treated normally, and you must write them as
- \& and \~ for them to have special meaning.
-
- _O_p_t_i_o_n_s
-
- Elvis has two options which affect the way regular
- expressions are used. These options may be examined or set
- via the :set command.
-
- The first option is called "[no]magic". This is a
- boolean option, and it is "magic" (TRUE) by default. While
- in magic mode, all of the meta-characters behave as
- described above. In nomagic mode, only ^ and $ retain their
- special meaning.
-
- The second option is called "[no]ignorecase". This is
- a boolean option, and it is "noignorecase" (FALSE) by
- default. While in ignorecase mode, the searching mechanism
- will not distinguish between an uppercase letter and its
- lowercase form. In noignorecase mode, uppercase and lower-
- case are treated as being different.
-
- Also, the "[no]wrapscan" option affects searches.
-
- _E_x_a_m_p_l_e_s
-
- This example changes every occurence of "utilize" to
- "use":
-
- :%s/utilize/use/g
-
-
-
- June 13, 1992
-
-
-
-
-
- 4-4 REGULAR EXPRESSIONS 4-4
-
-
- This example deletes all whitespace that occurs at the
- end of a line anywhere in the file. (The brackets contain a
- single space and a single tab.):
-
- :%s/[ ]\+$//
-
- This example converts the current line to uppercase:
-
- :s/.*/\U&/
-
- This example underlines each letter in the current
- line, by changing it into an "underscore backspace letter"
- sequence. (The ^H is entered as "control-V backspace".):
-
- :s/[a-zA-Z]/_^H&/g
-
- This example locates the last colon in a line, and
- swaps the text before the colon with the text after the
- colon. The first \( \) pair is used to delimit the stuff
- before the colon, and the second pair delimit the stuff
- after. In the substitution text, \1 and \2 are given in
- reverse order to perform the swap:
-
- :s/\(.*\):\(.*\)/\2:\1/
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- June 13, 1992
-
-
-