home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
- LIT TEXT UTILITY MANUAL
-
- Version 2.0, 11/19/86
- Unpublished (C) 1986 Donald J. Irving
-
- Lit is a command line invoked text utility which filters a text file to
- stdout printing printable characters as they are, and showing all
- non-printable characters in any one or more of three representation
- formats. The only character interpreted (acted upon) by lit is the line
- feed character which causes lit to issue a line feed. The inspiration for
- lit came from the "l" command in many of the UNIX line editors. Lit is not
- quite the same as any of these, however. For one thing, lit output is never
- ambiguous.
-
- Here is an example of what lit does:
-
- Say the file 'myfile' consists of the following ascii characters:
-
- HT, HT, h, e, l, l, o, space, w, o, r, l, d, BEL, LF
-
- Saying 'lit myfile' would produce the following output:
-
- \t\thello world\007\n
-
- And saying 'lit myfile [various options]' might produce any of:
-
- \t\thello world^G\n
- ^I^Ihello world^G^J
- \011\011hello world\007\012
- \09\09hello world\07\0A
- \009\009hello world\007\010
-
- You control the output with optional command line arguments which provide:
-
- 1. The name of the file to read as input.
- 2. What subset of the file lines to print.
- 3. In which format(s) to represent non-printable characters.
- 4. Which number base to use for numeric representations.
-
- If you do not supply these, they default to:
-
- 1. Stdin.
- 2. The whole file.
- 3. Backslash constructs if possible else numeric representations.
- 4. Octal.
-
- Here is the command line template. The arguments may be specified in any
- order. The -bcanohd options may be stacked after one minus sign, or they
- may appear as separate arguments.
-
- lit [<filename>] [-s<linenum>] [-p<numlines>] [-[bcan][ohd]]
-
-
-
-
-
-
-
-
-
- THE NAME OF THE INPUT FILE
-
- The first command line argument encountered which does not start with a
- minus sign is considered to be the input file name. Any subsequent command
- line argument which does not start with a minus sign is considered to be an
- error. If no command line argument is found which does not start with a
- minus sign lit uses <stdin> for input.
-
- PRINTING A SUBSET OF LINES OF THE FILE
-
- Lit prints the whole file by default. You can tell it on which line in the
- file to start printing and/or how many lines to print by supplying either
- of both of these command line arguments:
-
- -s<linenum> lit will start printing at line <linenum>
- -p<numlines> lit will print <numlines> lines
-
- There is no space between the 's' or 'p' and the number. There is no
- validity checking on the number values.
-
- FORMATS FOR REPRESENTING NON-PRINTABLE CHARACTERS
-
- There are three formats in which non-printable characters may be
- represented: C Language style backslash representations such as \n,
- control character representations such as ^J, and numeric value
- representations such as \012.
-
- C Language Backslash Representations
-
- The form is a backslash followed by a lower case letter. Here is the list
- of the applicable characters:
-
- line feed \n
- horizontal tab \t
- backspace \b
- carriage return \r
- form feed \f
-
- The ascii NUL character representation \0 is omitted. NUL is represented
- by its control character representation or as a numeric value.
-
- Control Character Representations
-
- The form is a caret followed by another symbol, where the second symbol is
- the keyboard control character of the character to be represented. For
- example, the ascii line feed character is represented as ^J. The ascii
- character DEL has an arbitrarily assigned representation of ^?.
-
-
-
-
-
-
-
-
-
-
-
-
-
- ASCII Numeric Value Representations
-
- The representation is in the form \num where num is the character's numeric
- value. (the unsigned integer value of its eight bits) displayed in any of
- the three number bases octal, decimal, or hexadecimal. For octal
- representations, num is exactly three octal digits; for hex
- representations, num is exactly two hexadecimal digits; and for decimal
- representations, num is exactly three decimal digits. Num is zero-padded on
- the left if necessary to make up the required number of digits. For
- example, the ESC char is represented as \033, \027, or \1B in octal,
- decimal, and hex respectively. NUL would be \000, \000, or \00. This
- format is not limited to ascii characters; any eight bits can be
- represented. Numbers of \200 (octal), \128 (decimal), \80 (hex), or greater
- are byte values beyond the upper end of the ascii character set. The
- largest byte value (all bits on) is \377 (octal), \255 (decimal), or \FF
- (hex).
-
- COMMAND LINE ARGUMENTS FOR SELECTING REPRESENTATION FORMATS
-
- You tell lit which representation format or combination of formats to use
- for non-printable characters by supplying one of the command line arguments
- -b, -c, -a, or -n. If you supply none of these, then -b is selected by
- default. If you supply more than one, then the latter supersedes the
- former.
-
- -b use backslash representations such as \n
- if possible, else use numeric representations.
-
- -c use control char representations such as ^J
- if possible, else use numeric representations.
-
- -a all; use backslash reps if possible, else use control
- char reps if possible, else use numeric representations.
-
- -n use numeric representations only.
-
-
- You tell lit which number base to use for numeric representations by
- providing one of the command line arguments -o, -h, or -d. If you supply
- none of these, then -o is selected by default. If you supply more than
- one, then the latter supersedes the former.
-
- -o octal
- -h hexadecimal
- -d decimal
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- EXCEPTIONAL CHARACTERS
-
- Two characters have special meaning in lit output. The backslash character
- \ always has special meaning. The caret character ^ has special meaning
- whenever control character representations are enabled.
-
- The Backslash Character \
-
- As already described, the \ character in lit output signals the beginning
- of either a special letter representation such as \n or a numeric
- representation such as \012. The \ is also used to relieve a subsequent \
- or ^ of its special meaning. \\ represents the actual character \, and
- (when control character representations are enabled) \^ represents the
- actual character ^.
-
- The Caret Character ^
-
- When control character representations are enabled, a ^ signals the
- beginning of a control character representation such as ^J. Note the
- implication therefore that ^^ means Control caret (ascii RS), and ^\ means
- Control backslash (ascii FS). In both of these cases the second character
- is relieved of its special meaning because it is part of the control
- character representation. If control character representations are not
- enabled, then ^ is just another printable character.
-
- CONCLUSION
-
- Lit fills the gap between text editors which usually interpret special
- characters in special ways, and hex dump utilities which make terrible
- reading for text files. One of lit's greatest strengths is that it
- interprets nothing but the linefeed character; everything else is just
- represented to the output stream.
-
- Although lit provides a variety of output formats, perhaps its main
- usefulness is in quickly locating U.F.O.s (Unidentified File Objects) that
- have gotten into your text files. (like that ESC char that's wierding out
- your printer) For this purpose, the default options are adequate, and, for
- C programmers at least, already familiar.
-
-
-
- Donald J. Irving
- 9812 Gardenwood Way
- Sacramento, CA 95827
- (916) 366-3225
-
- CIS: 73547,1335
- PL: ops158
-
-
-
-
-
-
-
-
-
-
-
-
- Post scripts:
-
- **
-
- One convenient way of getting to know lit is to use the default input file
- stdin. Just say 'lit [-options]' with no file name. Now you can type in
- lines one at a time and have lit filter them back to you. Try typing
- control characters to see how they come back. Keep in mind that in this
- configuration, the CLI is still trapping and interpreting (acting upon)
- what you type, so screen control characters like form feed, and tab, for
- example, actually cause form feeds and tabs to occur on the screen before
- lit has a chance to send you its output. This may make the screen look a
- little messy, but at least if the CLI is interpreting everything it can
- tell when you type Control C to break out.
-
- **
-
- Want to have lit give you a Usage statement? Say 'lit lskdmlsdm' where
- lskdmlsdm is any string of garbage which doesn't add up to the name of a
- real file.
-
- **
-
- Why not use \0 to represent NUL? Consider the following character sequence:
-
- BEL, space, NUL, 0, 7
-
- Using \0 for NUL would yield the output '\007 \007'. To avoid this
- ambiguity, the \0 construct is not included in the backslash
- representations.
-
- **
-
- Why use ^? for DEL? Keyboard control characters are always 64 places higher
- in the ascii table than the non-printable characters they represent. DEL
- is at the high end of the ascii character set, however, so there's no
- keyboard character to represent it. We need to arbitrarily choose some
- character. The ? seems to make at least some sense as a choice; it is 64
- places less than DEL, and that kind of satisfies ones desire for symmetry
- in the world. (Besides, some of the UNIX world tools already do it that
- way.)
-
- **
-
- Lit run pretty slow (on my Amiga). I designed it in a highly structured
- manner (many functions in vertical hierarchy) like I would for a large
- application on a big machine. In that arena, one tends to emphasize
- structure and readability over execution efficiency, but making several
- function calls for every input character is, I guess, not the way to do
- things on a micro. Any subsequent versions of lit should be optimized
- with less vertical structure. I'll let this version stand as an example of
- clean, neat, readable, and well stuctured code.
-
-
-
-
-