home *** CD-ROM | disk | FTP | other *** search
- \ TR PROGRAM
- \ TRANSLATES SOURCE FILE INTO DESTINATION FILE.
- \ WORKS LIKE UNIX tr WITH FOLLOWING EXCEPTIONS:
- \ 1. -A OPTION NEEDED FOR ASCII FILES.
- \ 2. HANDLES NULL CHARACTERS!
-
- \ This program Copyright (C) 1985 by Thomas Almy.
- \ Permission is granted to registered users of ForthCMP to sell or distribute
- \ computer programs incorporating the compiled contents of this file.
-
-
- 0 #IF
-
- ( note -- program has been modified since writing this paper, and
- this paper benchmarks the original CP/M version of the compiler)
-
- TRANSLATE PROGRAM
- by Tom Almy
- August 1985
-
- PROGRAM DESCRIPTION
-
- This program was designed to mimic the functionality of the "tr"
- program provided on UNIX (tm Bell Labs) systems. While written to be
- compiled with the author's ForthCMP Forth Compiler, it can be
- utilized on any 83 Standard system by providing an appropriate file
- system interface.
-
- TR is used to make one or more single character substitutions in
- a file. ForthCMP's FILTER file interface allows specifying an input
- file and an optional output file (if no output file is specified,
- output goes to the display). The file name(s) are followed by an
- optional option specification and one or two character specifying
- strings.
-
- Characters in the strings may be any character except "\"
- (backslash) or "-" (hyphen). Any of the 256 possible character codes
- can be specified by backslash followed by one, two, or three octal
- digits. Backslash followed by a lower case character becomes an
- upper-case character (done to allow putting the string on a CP/M
- command line). Backslash followed by any other character is that
- character, so "-" can be represented as "\-" and "\" can be
- represented as "\\". A range of characters can be represented by the
- first character followed by a hyphen followed by the last character.
-
- If no options are specified, a translation occurs in which each
- character which is in the first specification string is replaced with
- the character in the same position in the second specification
- string. If the second string is shorter than the first then the
- string is extended by appending copies of its last character.
-
- The option specifcation can contain any or all of the following
- characters:
-
- A ASCII mode: On input CR is deleted (leaving just LF of CR-LF pairs)
- and on output LF is replaced with CR-LF pairs. This allows
- translating to or from CR-LF pairs. Additionally, Control-Z denotes
- end of file.
-
- C Complement first string: The first string is replaced with a new
- string consisting of the characters in the range 0 through 255
- not in the first string. This string is sorted.
-
- D Delete instead of translate: No translation takes place;
- instead any characters in the first specification string are deleted.
-
- S Squeeze output: Sequential occurrences in the character stream
- (after translation/deletion) of two or more identical
- characters in the second specification string are squeezed to a
- single occurrence.
-
- Example commands:
-
- Options String1 String2 Function
- a-z A-Z Upcase file
- a-zA-Z A-Za-z Swap case file
- AS \12 \12 Delete blank lines
- ACS !-~ \12 Put all words on separate lines
- AS \40 \12 Put all words on separate lines
- ACDS A-Za-z\12\40 Delete all non alphabetics, except
- spaces and newlines.
- \200-\377 \0-\177 Clear parity bits.
-
- PERFORMANCE
-
- I compared the performance of the Forth program, using the
- ForthCMP compiler, with that of C, using the MANX (AZTEC) compiler.
- The system used was a LOBO MAX-80, which has a 5-Mhz Z-80 processor,
- 1.2 MByte 8" floppy drives, and runs CP/M+.
-
-
- Characteristic Forth C
-
- Source file lines (not blank) 163 139
-
- Compilation time Compile Step 44 44 seconds
- Assemble Step none 32
- Link Step none 38
- TOTAL 44 114
-
- COM file size 3584 9984 bytes
-
- Test case execution time 21 138 seconds
-
- The test case involved upcasing a 14k byte file. The PIP
- program (which is written in assembly language) took 16 seconds.
-
- The ForthCMP compiler compiles and links in a single step. 5
- seconds was spent producing a load map (not done in the C example),
- so the ForthCMP compilation time could really be considered to be 39
- seconds.
-
-
-
- READING THE LISTING
-
- First, ignore the INCLUDE, ROMABLE, and IN/OUT commands, as they
- are directives for the compiler. The definition of CARRAY is "CREATE
- ALLOT DOES> +". The definition of C<- is "SWAP C!". The non-standard
- words ?DO " <= >= ON OFF ASCII CONTROL SKIP and SCAN and Eaker' case
- statement (CASE OF ENDOF ENDCASE) have their usual definitions.
-
- The file interface redefines KEY and EXPECT to read from the
- input file. KEY returns -1 on end of file; otherwise it returns the
- next character as an integer in the range 0 through 255. Because the
- new EXPECT does not echo and has no editing, OLD- EXPECT (which is
- system dependent) had to be provided. Output (EMIT and words which
- call it) is rewritten to send output to the output file when FILTER
- is executed, or to the display when CONSOLE is executed.
-
- SETFILES is used to initialize the input and output files, and
- returns TRUE if successful. The double variable OPTIONSTRING is set
- to contain a pointer to and length of the command tail (that part
- excluding the file specifications). ENDFILES does any necessary file
- closing.
-
-
-
- #THEN
-
-
-
- \ Modified for new filter August, 1986
- \ Modified for newest DOS interface 12/91
- 100 MSDOS
- HEX 4000 DECIMAL CONSTANT BUFSIZ
- INCLUDE VARS
- INCLUDE DOS1
-
-
- 256 CARRAY TRTABLE \ translation table
- 256 CARRAY SQTABLE \ squeeze duplicates table
- CREATE INLIST 512 ALLOT \ instring values
- CREATE OUTLIST 512 ALLOT \ outstring values
- VARIABLE DEL-FLAG \ deletion flag specified
- VARIABLE COM-FLAG \ reverse sense flag specified
- VARIABLE SQU-FLAG \ squeeze output string flag
- VARIABLE ASC-FLAG \ ascii-mode --> CR dropped on input, added
- \ before LF's on output , CONTROL-Z terminates file
- VARIABLE ^LIST
- VARIABLE LASTCHAR
-
- -1 CONSTANT TRUE
- 0 CONSTANT FALSE
- CONTROL M CONSTANT ACR \ Carriage Return
- CONTROL J CONSTANT ALF \ Line Feed
-
-
- \ OUTPUT FILE HANDLING ( basically filter.4th )
-
- VARIABLE outhandle stderr outhandle !
- VARIABLE outbuffer
- VARIABLE outbufptr
- VARIABLE outbufend
-
- 0 0 IN/OUT
- : flushout outbuffer @ outbufptr @ <> IF
- outhandle @ outbuffer @ outbufptr @ outbuffer @ - DUP >R write
- outbuffer @ outbufptr ! R> <> IF stderr outhandle !
- ." DISK FULL " flushout 4 RETURN THEN
- THEN ;
-
- : EMIT outbufptr @ DUP outbuffer @ BUFSIZ + = IF flushout
- DROP outbuffer @ THEN C! 1 outbufptr +! ;
-
- 0 0 IN/OUT
- : CONSOLE flushout stderr outhandle ! ;
-
- 0 0 IN/OUT
- : FILTER flushout stdout outhandle ! ;
-
- 0 0 IN/OUT : BYE flushout bye ;
-
- 0 0 IN/OUT : ABORT flushout 4 RETURN ;
-
- \ INPUT FILE PROCESSING
- VARIABLE inbuffer ( pointer to allocated buffer )
- VARIABLE inbufptr
- VARIABLE inbufend
-
- 0 0 IN/OUT
- : SETBUFS ( must execute before any I/O to allocate buffers )
- 129 TIB 127 CMOVE ( parse from command line )
- 128 C@ #TIB !
- HERE inbuffer !
- BUFSIZ ALLOT
- HERE DUP outbuffer ! outbufptr !
- BUFSIZ ALLOT
- ;
-
-
-
- \ This version of KEY returns -1 on end of file!
- : KEY inbufptr @ inbufend @ = IF ( fetch block )
- stdin inbuffer @ BUFSIZ read ?DUP 0= IF ( EOF/ERROR ) -1 EXIT THEN
- inbuffer @ + inbufend !
- inbuffer @ inbufptr ! THEN
- inbufptr @ C@ 1 inbufptr +! ;
-
-
- \ Commentary
- 0 0 IN/OUT
- : HELLO
- ." TRANSLATE PROGRAM" CR
- ." Copyright (C) 1985 by Thomas Almy." CR ;
-
- 0 0 IN/OUT
- : USAGE
- CONSOLE
- CR ." [-[A][C][D][S]] str1 [str2]"
- CR ." Options are Ascii Complement-str1 Delete Squeeze"
- CR ." strings may have \octal or range specifications."
- ABORT
- ;
-
- \ List Accessing
- 1 0 IN/OUT
- : ISLIST ( list -- ) ^LIST ! ;
-
- 1 0 IN/OUT
- : !LIST ( char -- ) ^LIST @ ! 2 ^LIST +! ;
-
- 0 1 IN/OUT
- : @LIST ( -- char ) ^LIST @ @ 2 ^LIST +! ;
-
- \ Miscellaneous Subroutines
- 1 1 IN/OUT
- : UPCASE ( char -- char )
- DUP ASCII a >= IF DUP ASCII z <= IF BL - THEN THEN ;
-
- PRIMITIVE
- : NEXT-CHAR ( addr len -- addr+1 len-1 char, or zero if end )
- DUP IF 1- SWAP COUNT ROT SWAP ELSE FALSE THEN ;
-
- PRIMITIVE
- : OCTAL? ( addr len -- addr len boolean )
- OVER C@ DUP ASCII 0 >= SWAP ASCII 7 <= AND ;
-
- : ?BACKSLASH ( addr len char -- addr' len' value )
- DUP ASCII \ = IF DROP
- OCTAL? IF NEXT-CHAR ASCII 0 - >R
- OCTAL? IF NEXT-CHAR ASCII 0 - R> 8 * + >R
- OCTAL? IF NEXT-CHAR ASCII 0 - R> 8 * + >R
- THEN THEN R>
- ELSE
- NEXT-CHAR
- THEN THEN ;
-
- : FILL-LIST ( string length list -- )
- ISLIST
- BEGIN NEXT-CHAR ?DUP WHILE
- DUP ASCII - = IF DROP NEXT-CHAR ?BACKSLASH 1+
- ^LIST @ 2- @ 1+ DO I !LIST LOOP ELSE
- ?BACKSLASH !LIST THEN REPEAT
- -1 !LIST ( delimit list )
- 2DROP ;
-
-
- \ Handle option string
- 0 0 IN/OUT
- : DO-OPTION-STRING
- HERE COUNT SWAP 1+ SWAP 1 ?DO
- COUNT UPCASE CASE
- ASCII A OF ASC-FLAG ON ENDOF
- ASCII D OF DEL-FLAG ON ENDOF
- ASCII C OF COM-FLAG ON ENDOF
- ASCII S OF SQU-FLAG ON ENDOF
- ." UNKNOWN OPTION -- " EMIT USAGE ENDCASE
- LOOP
- DROP
- BL WORD DROP ( scan next word )
- ;
-
- 0 0 IN/OUT
- : SET-OPTIONS
- ASC-FLAG OFF
- DEL-FLAG OFF
- COM-FLAG OFF
- SQU-FLAG OFF
- BL WORD COUNT 0> SWAP C@ ASCII - = AND IF ( an option string )
- DO-OPTION-STRING
- THEN
- ;
-
-
- \ Various Table handling routines
- 1 0 IN/OUT
- : SET-SQUTABLE ( hostlist -- ) ISLIST
- ['] SQTABLE >BODY 256 0 FILL
- BEGIN @LIST DUP 0< NOT WHILE
- SQTABLE TRUE C<- ( set flag in byte )
- REPEAT DROP ;
-
- 0 0 IN/OUT
- : COMPLEMENT-LIST ( complements INLIST )
- INLIST SET-SQUTABLE INLIST ISLIST
- 256 0 DO I SQTABLE C@ 0= IF I !LIST THEN LOOP
- -1 !LIST ;
-
- 0 0 IN/OUT
- : FILL-TRTABLE ( TRTABLE gets filled from INLIST )
- ['] TRTABLE >BODY 256 0 FILL
- INLIST ISLIST BEGIN @LIST DUP 0< NOT WHILE
- TRTABLE TRUE C<- ( set flag in byte )
- REPEAT DROP ;
-
- 0 0 IN/OUT
- : SET-TRTABLE ( TRTABLE is translation table from INLIST to OUTLIST )
- 256 0 DO I DUP TRTABLE C! LOOP INLIST ISLIST
- OUTLIST BEGIN ^LIST @ @ 0< NOT WHILE
- DUP @ 0< IF DUP 2- @ ELSE DUP @ SWAP 2+ SWAP THEN
- @LIST TRTABLE C! REPEAT
- DROP ;
-
- \ Information from user?
- 0 0 IN/OUT
- : GET-RANGES
- HERE COUNT INLIST FILL-LIST
- COM-FLAG @ IF
- COMPLEMENT-LIST
- THEN
-
- BL WORD COUNT OUTLIST FILL-LIST
- SQU-FLAG @ IF
- OUTLIST SET-SQUTABLE
- THEN
- DEL-FLAG @ IF
- FILL-TRTABLE
- ELSE
- SET-TRTABLE
- THEN
- ;
-
- \ Translate functions
- PRIMITIVE
- : NOT-DELETED? ( key -- key TRUE OR FALSE )
- DUP TRTABLE C@ IF DROP FALSE ELSE TRUE THEN ;
-
- 1 0 IN/OUT
- : SEND-IT SQU-FLAG @ IF
- DUP SQTABLE C@ IF
- DUP LASTCHAR @ = IF ( a duplicate! )
- DROP EXIT THEN THEN
- DUP LASTCHAR ! THEN
- DUP ALF = IF
- ASC-FLAG @ IF
- ACR EMIT THEN THEN
- EMIT ;
-
- : NEW-KEY? ( -- key TRUE OR FALSE )
- ASC-FLAG @ IF
- BEGIN KEY DUP ACR = WHILE DROP REPEAT
- DUP 0< OVER CONTROL Z = OR
- ELSE
- KEY DUP 0<
- THEN
- IF DROP FALSE ELSE TRUE THEN ;
-
- 0 0 IN/OUT
- : TRANSLATE
- LASTCHAR ON
- BEGIN
- NEW-KEY?
- WHILE
- DEL-FLAG @ IF
- NOT-DELETED? IF SEND-IT THEN
- ELSE
- TRTABLE C@ SEND-IT
- THEN
- REPEAT
- ;
-
- \ TOP LEVEL
- : MAIN
- SETBUFS
- HELLO
- FILTER
- SET-OPTIONS
- GET-RANGES
- TRANSLATE
- BYE
- ;
-
- INCLUDE DOS2
- INCLUDE FORTHLIB
- END
-