home *** CD-ROM | disk | FTP | other *** search
-
-
-
- TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
-
-
-
- NAME
- tr - translate or delete characters
-
- SYNOPSIS
- tr [-cst] [--complement] [--squeeze-repeats]
- [--truncate-set1] string1 string2
- tr {-s,--squeeze-repeats} [-c] [--complement] string1
- tr {-d,--delete} [-c] string1
- tr {-d,--delete} {-s,--squeeze-repeats} [-c] [--complement]
- string1 string2
-
- DESCRIPTION
- This manual page documents the GNU version of tr. tr copies
- the standard input to the standard output, performing one of
- the following operations:
-
- o+ translate, and optionally squeeze repeated characters
- in the result
- o+ squeeze repeated characters
- o+ delete characters
- o+ delete characters, then squeeze repeated characters
- from the result.
-
- The _s_t_r_i_n_g_1 and (if given) _s_t_r_i_n_g_2 arguments define ordered
- sets of characters, referred to below as set1 and set2.
- These sets are the characters of the input that tr operates
- on. The --_c_o_m_p_l_e_m_e_n_t (-_c) option replaces set1 with its
- complement (all of the characters that are not in set1).
-
- SPECIFYING SETS OF CHARACTERS
- The format of the _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 arguments resembles
- the format of regular expressions; however, they are not
- regular expressions, only lists of characters. Most charac-
- ters simply represent themselves in these strings, but the
- strings can contain the shorthands listed below, for con-
- venience. Some of them can be used only in _s_t_r_i_n_g_1 or
- _s_t_r_i_n_g_2, as noted below.
-
- Backslash excapes. A backslash followed by a character not
- listed below causes an error message.
-
- \a Control-G.
-
- \b Control-H.
-
- \f Control-L.
-
- \n Control-J.
-
- \r Control-M.
-
- \t Control-I.
-
-
-
- Sun Release 4.1 Last change: 1
-
-
-
-
-
-
- TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
-
-
-
- \v Control-K.
-
- \ooo The character with the value given by _o_o_o, which is 1
- to 3 octal digits.
-
- \\ A backslash.
-
- Ranges. The notation `_m-_n' expands to all of the characters
- from _m through _n, in ascending order. _m should collate
- before _n; if it doesn't, an error results. As an example,
- `0-9' is the same as `0123456789'. Ranges can optionally be
- enclosed in square brackets, which has no effect but is sup-
- ported for compatibility with historical System V versions
- of tr.
-
- Repeated characters. The notation `[_c*_n]' in _s_t_r_i_n_g_2
- expands to _n copies of character _c. Thus, `[y*6]' is the
- same as `yyyyyy'. The notation `[_c*]' in _s_t_r_i_n_g_2 expands to
- as many copies of _c as are needed to make set2 as long as
- set1. If _n begins with a 0, it is interpreted in octal,
- otherwise in decimal.
-
- Character classes. The notation `[:_c_l_a_s_s-_n_a_m_e:]' expands to
- all of the characters in the (predefined) class named
- _c_l_a_s_s-_n_a_m_e. The characters expand in no particular order,
- except for the `upper' and `lower' classes, which expand in
- ascending order. When the --_d_e_l_e_t_e (-_d) and
- --_s_q_u_e_e_z_e-_r_e_p_e_a_t_s (-_s) options are both given, any character
- class can be used in _s_t_r_i_n_g_2. Otherwise, only the character
- classes `lower' and `upper' are accepted in _s_t_r_i_n_g_2, and
- then only if the corresponding character class (`upper' and
- `lower', respectively) is specified in the same relative
- position in _s_t_r_i_n_g_1. Doing this specifies case conversion.
- The class names are given below; an error results when an
- invalid class name is given.
-
- alnum
- Letters and digits.
-
- alpha
- Letters.
-
- blank
- Horizontal whitespace.
-
- cntrl
- Control characters.
-
- digit
- Digits.
-
- graph
-
-
-
- Sun Release 4.1 Last change: 2
-
-
-
-
-
-
- TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
-
-
-
- Printable characters, not including space.
-
- lower
- Lowercase letters.
-
- print
- Printable characters, including space.
-
- punct
- Punctuation characters.
-
- space
- Horizontal or vertical whitespace.
-
- upper
- Uppercase letters.
-
- xdigit
- Hexadecimal digits.
-
- Equivalence classes. The syntax `[=_c=]' expands to all of
- the characters that are equivalent to _c, in no particular
- order. Equivalence classes are a recent invention intended
- to support non-English alphabets. But there seems to be no
- standard way to define them or determine their contents.
- Therefore, they are not fully implemented in GNU tr; each
- character's equivalence class consists only of that charac-
- ter, which makes this a useless construction currently.
-
- TRANSLATING
- tr performs translation when _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 are both
- given and the --delete (-_d) option is not given. tr
- translates each character of its input that is in set1 to
- the corresponding character in set2. Characters not in set1
- are passed through unchanged. When a character appears more
- than once in set1 and the corresponding characters in set2
- are not all the same, only the final one is used. For exam-
- ple, these two commands are equivalent:
- tr aaa xyz
- tr a z
-
- A common use of tr is to convert lowercase characters to
- uppercase. This can be done in many ways. Here are three
- of them:
- tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
- tr a-z A-Z
- tr '[:lower:]' '[:upper:]'
-
- When tr is performing translation, set1 and set2 should nor-
- mally have the same length. If set1 is shorter than set2,
- the extra characters at the end of set2 are ignored.
-
-
-
-
- Sun Release 4.1 Last change: 3
-
-
-
-
-
-
- TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
-
-
-
- On the other hand, making set1 longer than set2 is not port-
- able; POSIX.2 says that the result is undefined. In this
- situation, the BSD tr pads set2 to the length of set1 by
- repeating the last character of set2 as many times as neces-
- sary. The System V tr truncates set1 to the length of set2.
-
- By default, GNU tr handles this case like the BSD tr does.
- When the --truncate-set1 (-_t) option is given, GNU tr han-
- dles this case like the System V tr instead. This option is
- ignored for operations other than translation.
-
- Acting like the System V tr in this case breaks the rela-
- tively common BSD idiom:
- tr -cs A-Za-z0-9 '\012'
- because it converts only zero bytes (the first element in
- the complement of set1), rather than all non-alphanumerics,
- to newlines.
-
- SQUEEZING REPEATS AND DELETING
- When given just the --delete (-_d) option, tr removes any
- input characters that are in set1.
-
- When given just the --squeeze-repeats (-_s) option, tr
- replaces each input sequence of a repeated character that is
- in set1 with a single occurrence of that character.
-
- When given both the --delete and the --squeeze-repeats
- options, tr first performs any deletions using set1, then
- squeezes repeats from any remaining characters using set2.
-
- The --squeeze-repeats option may also be used when translat-
- ing, in which case tr first peforms translation, then
- squeezes repeats from any remaining characters using set2.
-
- Here are some examples to illustrate various combinations of
- options:
-
- Remove all zero bytes:
- tr -d '\000'
-
- Put all words on lines by themselves. This converts all
- non-alphanumeric characters to newlines, then squeezes each
- string of repeated newlines into a single newline:
- tr -cs '[a-zA-Z0-9]' '[\n*]'
-
- Convert each sequence of repeated newlines to a single new-
- line:
- tr -s '\n'
-
- WARNING MESSAGES
- Setting the environment variable POSIXLY_CORRECT turns off
- several warning and error messages, for strict compliance
-
-
-
- Sun Release 4.1 Last change: 4
-
-
-
-
-
-
- TR(1L) MISC. REFERENCE MANUAL PAGES TR(1L)
-
-
-
- with POSIX.2. The messages normally occur in the following
- circumstances:
-
- 1. When the --_d_e_l_e_t_e option is given but --_s_q_u_e_e_z_e-_r_e_p_e_a_t_s
- is not, and _s_t_r_i_n_g_2 is given, GNU tr by default prints a
- usage message and exits, because _s_t_r_i_n_g_2 would not be used.
- The POSIX specification says that _s_t_r_i_n_g_2 must be ignored in
- this case. Silently ignoring arguments is a bad idea.
-
- 2. When an ambiguous octal escape is given. For example,
- \400 is actually \40 followed by the digit 0, because the
- value 400 octal does not fit into a single byte.
-
- Note that GNU tr does not provide complete BSD or System V
- compatibility. For example, there is no option to disable
- interpretation of the POSIX constructs [:alpha:], [=c=], and
- [c*10]. Also, GNU tr does not delete zero bytes automati-
- cally, unlike traditional UNIX versions, which provide no
- way to preserve zero bytes.
-
- The long-named options can be introduced with `+' as well as
- `--', for compatibility with previous releases. Eventually
- support for `+' will be removed, because it is incompatible
- with the POSIX.2 standard.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Sun Release 4.1 Last change: 5
-
-
-
-