home *** CD-ROM | disk | FTP | other *** search
- TDS -- Transliteration, Deletion, and Squeeze filter 23Dec1990
-
- INVOCATION (some of all the myriad ways)
-
- tds [-cds] [SearchChars [ReplacementChars]] <inputfile
-
- tds [-cds] [SearchChars [ReplacementChars]] <inputfile >outputfile
-
- tds [-cds] [SearchChars [ReplacementChars]] <inputfile | outputstream
-
- inputstream | tds [-cds] [SearchChars [ReplacementChars]]
-
- inputstream | tds [-cds] [SearchChars [ReplacementChars]] >outputfile
-
- inputstream | tds [-cds] [SearchChars [ReplacementChars]] | outputstream
-
- Option switches may be anywhere within the scope of the
- invocation except inside the SearchChars or ReplacementChars
- strings, also known as s1 and s2 in the source code.
-
- DESCRIPTION
-
- The TDS filter is a fast superset of the standard Unix (tm)
- System V "tr" program. Improvements of TDS over TR are an
- increased flexibility in the interpretation of escape code and
- character ranges, an increase in the number of descriptive error
- messages, and a speed increase in most cases due to its buffering
- scheme and speed-optimal code. You can expect a 100% to 900%
- speed increase with files larger than 100KB. TDS also corrects a
- deficiency of the TR program in its inability to process NULL
- characters, (char)0.
-
- To get started quickly, skip ahead to the EXAMPLES section.
-
- TDS was written as a programming exercise that involved reverse
- engineering, automated program testing, fast file processing,
- some expression parsing and error handling, and filter program
- writing. By reverse engineering, I refer in this case to the
- process of writing a program that is functionally equivalent to
- and perhaps even better than an existing program without ever
- consulting the existing program's source code.
-
- The original TR program seems to be one of the first filter
- programs ever written for the Unix (tm) operating system and one
- of the most sparsely documented programs in the online MANual.
- It also seems, from viewing its nondescriptive name, that TR was
- originally designed to be just a TRansliterator filter that would
- (1) scan an input stream for any character matching a character
- in SearchChars, (2) replace the input stream character with the
- corresponding character in ReplacementChars, and (3) output the
- result. Any input stream characters not matching any of the
- characters in SearchChars would get passed unchanged into the
- output stream. The delete and squeeze functions were probably
- added later, reducing the mnemonic value of "TR[ransliterator]".
-
- Like the TR program, invoking TDS without any option switches
- causes it to act as a transliteration filter. In almost all
- practical cases, you will want TDS to be part of an explicit
- pipeline or for TDS to be given an inputfile as a source of
- data. If you invoke TDS from the command line like this:
-
- tds
-
- Then the program will just sit there and wait for data from the
- standard input stream -- in this case, the keyboard or terminal
- -- exactly like TR. Again, you probably will not want this and
- will instead want to invoke TDS in one of the forms listed above
- in the INVOCATION section.
-
- Setting the -d switch makes TDS act as a d)eletion filter. In
- this mode, TDS scans the input stream for characters that match
- the characters in SearchChars and then prevents the matching
- characters from reaching the output stream. For example, the
- following invocation will cause TDS to get data from inputfile
- and stop any spaces (octal code \040) from reaching the
- outputfile:
-
- tds -d \040 <inputfile >outputfile
-
- or
-
- tds -d \\040 <inputfile >outputfile (for some shells)
-
- Setting the -c option makes TDS c)omplement SearchChars so that
- it becomes the set of characters that SearchChars presently is
- NOT. For instance, if you want to specify SearchChars as the set
- of all characters EXCEPT the letters AB, then you can specify
- SearchChars as AB and activate the -c)omplement switch:
-
- tds -d -c AB <inputfile >outputfile
-
- In this example, all of the characters in the inputfile EXCEPT
- the letters A and B get blocked from appearing in the outputfile.
- The -c switch is simply a labor-saving feature that saves you the
- trouble of directly specifying a lot of characters. In other
- words, the -c switch lets you indirectly specify a lot of
- characters in a concise manner; it can be interpreted as the word
- "NOT" that affects only the first string, SearchChars.
-
- TR interprets the -c in "-d -c" as SearchChars while TDS
- interprets the -c as an option switch. It is recommended that
- you use a backslash (or two, for Unix shells) before the minus
- sign if you want TDS to treat the minus as an actual minus
- character instead of as an option switch indicator.
-
- Setting the -s option makes TDS s)queeze consecutively duplicate
- characters into one character. With this option set, TDS first
- performs transliteration. It then scans the input stream for
- duplicate, consecutive characters that match any of the
- characters in ReplacementChars. If any such duplicate,
- consecutive characters are found, they appear as one "squeezed"
- character in the output stream. Ex.:
-
- tds -s \t \d32 <textfile
-
- or
-
- tds -s \\t \\d32 <textfile (for some shells)
-
- In this example, TABs are first converted to spaces. Immediately
- afterwards, duplicate and contiguous spaces are s)queezed into
- one space in the processing stream. The -s switch only affects
- the characters in ReplacementChars that are found in the
- processing stream. If you invoke TDS like the following:
-
- tds -s \d32 <textfile
-
- or
-
- tds -s \\d32 <textfile
-
- it is equivalent to:
-
- tds -s \d32 \d32 <textfile
-
- or
-
- tds -s \\d32 \\d32 <textfile
-
- So, you see, the -s option still only affects the characters in
- ReplacementChars that are found in the processing stream.
-
- For the convenience of people who do not have reference manuals
- listing character codes in OCTAL (base-8) format, TDS can also
- accept DECIMAL (base-10) and HEXADECIMAL (base-16) escape codes.
-
- The format of a DECIMAL escape code is \dNNN, where the \d prefix
- (in upper or lower case) indicates that a DECIMAL escape code
- follows and NNN is a DECIMAL integer from 0 to 255. Ex:
-
- tds -s \d32 <file
-
- or
-
- tds -s \\d32 <file (The extra \ is needed for some shells.)
-
- The above command causes TDS to read the contents of the file and
- s)queeze all duplicate and consecutive SPACEs into one SPACE.
- \d32 is the DECIMAL number code for the SPACE character in the
- American Standard Code for Information Interchange (ASCII).
-
- The format of a HEXADECIMAL escape code is \hNN or \xNN, where
- the \h or \x prefix (in upper or lower case) informs TDS that a
- HEXADECIMAL escape code follows and NN is a HEXADECIMAL number
- from 00 to FF. Ex:
-
- tds -s \x20 <file
-
- or
-
- tds -s \\x20 <file (The extra \ is needed for some shells.)
-
- The above command also causes TDS to read the contents of the
- file and s)queeze all duplicate SPACEs from the output
- stream. \x20 is the HEXADECIMAL escape code for the ASCII
- SPACE character.
-
- Whether you use OCTAL or DECIMAL or HEXADECIMAL escape codes
- depends on your notational preference or the availability of
- of an ASCII table.
-
- The length of an escape code can affect its meaning:
-
- \d032 does NOT mean the same thing as \d0032,
- which is in fact equivalent to 2\d003 to TDS.
-
- In order to eliminate possible notational problems when using TDS
- escape codes, it is recommended that you always remove any zeros
- at the beginning of an escape code except when you want to
- specify the NULL character, e.g. \0 (recommended), \d0, \h0 or
- \x0.
-
- In addition to accepting numeric escape codes in different bases,
- TDS also accepts mnemonic escape codes in upper or lower case for
- your convenience, unlike TR:
-
- MNEMONIC DECIMAL
- ESCAPE ASCII
- CODE MEANING CODE
-
- \a Audible bell 7
- \b Backspace 8
- \t Tab 9
- \n Newline 10
- \v Vertical tab 11
- \f Formfeed 12
- \r carriage Return 13
- \s Space 32
- \\ backslash 92
-
- Ex.:
-
- tds \r \n <inputfile >outputfile
-
- or
-
- tds \\r \\n <inputfile >outputfile (for some shells)
-
- In the above example, TDS converts any Carriage Return in the
- inputfile into a n)ewline (Line Feed) and then puts the result in
- outputfile. This may be handy for converting some text files
- into Unix (tm) format.
-
- Two other notational conveniences absent from TR are TDS's
- ability to understand descending as well as ascending character
- ranges, e.g. [a-z][Z-A], and the ability to specify multiple
- character ranges within the same pair of brackets, e.g. [a-zZ-A]
- (recommended).
-
- TDS recognizes TR's "*" character multiplier operator in
- ReplacementChars. An "*" (asterisk) after a character within
- brackets indicates that the preceding character is to be
- repeated. For example:
-
- tds acegik [z*] <inputfile
-
- tds acegik '[z*]' <inputfile
-
- or
-
- tds acegik [z*6] <inputfile
-
- tds acegik '[z*6]' <inputfile
-
- The above makes all a, c, e, g, i and k characters read from the
- inputfile appear as z characters.
-
- When no decimal number is specified after a character multiplier,
- 256 minus the current length of ReplacementChars is assumed as
- the number of repetitions. When ReplacementChars is null in
- length, ReplacementChars is made identical to SearchChars.
-
-
- PROGRAM INVOCATION EXAMPLES
-
- Output a list of words taken from a file:
-
- tds -sc [a-zZ-A] [\n*] <inputfile (recommended)
-
- tds -sc '[a-zZ-A]' '[\n*]' <inputfile (recommended)
-
- versus
-
- tr -sc [a-z][A-Z] [\012*] <inputfile
- ~~~
- tr -sc "[a-z][A-Z]" "[\012*]" <inputfile
- ~~~
- A word in the case above is a set of contiguous alphabetic
- characters.
-
- Capitalize all letters in inputfile and send the result to
- outputfile:
-
- tds [a-z] [A-Z] <inputfile >outputfile
-
- tds '[a-z]' '[A-Z]' <inputfile >outputfile (quotes for some shells)
-
- Uncapitalize all letters in inputfile and send the result to
- outputfile:
-
- tds [A-Z] [a-z] <inputfile >outputfile
-
- tds '[A-Z]' '[a-z]' <inputfile >outputfile
-
- Squeeze multiple, contiguous spaces in inputfile into just one
- space and output the result to outputfile:
-
- tds -s \d32 <inputfile >outputfile
-
- tds -s \\d32 <inputfile >outputfile (extra \ for some shells)
-
- tds -s \s <inputfile >outputfile (recommended)
-
- tds -s \\s <inputfile >outputfile (recommended)
-
- The above may be used before or after obfuscating C code to make
- the code perhaps more unreadable; or, before beautifying code to
- make the code more readable.
-
- Read a text file with minimized spacing:
-
- tds -s \t \d32 <inputfile | more
-
- tds -s \\t \\d32 <inputfile | more (extra \ for some shells)
-
- tds -s \t \s <inputfile | more (recommended)
-
- tds -s \\t \\s <inputfile | more (recommended)
-
- Act as a Rot13 encryption filter:
-
- tds [a-zA-Z] [n-za-mN-ZA-M] <inputfile >outputfile
-
- tds '[a-zA-Z]' '[n-za-mN-ZA-M]' <inputfile >outputfile
-
- Act as a Rot13 decryption filter:
-
- tds [n-za-mN-ZA-M] [a-zA-Z] <inputfile >outputfile
-
- tds '[n-za-mN-ZA-M]' '[a-zA-Z]' <inputfile >outputfile
-
- Convert a Carriage Return(CR)-terminated inputfile into a
- Line Feed(LF, NewLine)-terminated file:
-
- tds \r \n <inputfile >outputfile
-
- tds \\r \\n <inputfile >outputfile
-
- The above is handy for converting some NON-Unix text files into
- Unix text files. The following is handy for doing the reverse:
-
- tds \n \r <inputfile >outputfile
-
- tds \\n \\r <inputfile >outputfile
-
- TRADEMARKS
-
- Unix is a trademark of AT&T.
-
- AUTHOR & PROGRAM LICENSE
-
- The TDS program and this documentation are Copyright 1990 by
- Edward Lee. You may not charge more than the cost of storage
- media for providing a copy of TDS to another. You may not make a
- profit from providing others with a copy or copies of TDS. You
- may, however, use the TDS executable file in personal or
- commercial environments to develop other products with the
- understanding that you, the user of TDS, accept any damage
- incurred from the use or misuse of TDS. This documentation in
- its entirety must accompany distributed copies of the TDS source
- file.
-
- You can consider this program and documentation as part of my
- resume.
-
- edlee@chinet.chi.il.us
-
- -Ed L
-