home *** CD-ROM | disk | FTP | other *** search
- ============
- SQUEEZER.DOC
- ============
- 7/18/81
-
- USAGE AND RECOMPILATION DOCUMENTATION FOR:
-
- SQ.COM, Ver 1.3: File squeezer
- USQ.COM, Ver 1.4: File unsqueezer
- FLS.COM, Ver 1.1: Ambiguous file name expander
-
- --------------------
- DISTRIBUTION RIGHTS:
- --------------------
- I allow unrestricted non-profit distribution of this
- software, and invite users' groups to spread it around.
- However, any distribution for profit requires my permission
- in advance. This applies only to the above listed programs
- and their program source and documentation files. I do sell
- other software.
-
- --------
- PURPOSE:
- --------
- The file squeezer, SQ, compresses files into a more compact
- form. This provides:
- 1. Faster transmission by modem.
- 2. Fewer diskettes to distribute a program package.
- (Include USQ.COM and instructions, both unsqueezed.)
- 3. Fewer diskettes for archival storage.
-
- Any file can be squeezed, but program source files and text
- files benefit the most, typically shrinking by 35%. Files
- containing only a limited character set, such as dictionary
- files, may shrink as much as 48%. Squeezed files look like
- gibbersh and must be unsqueezed before they can be used.
-
- The unsqueezer, USQ, expands squeezed files into exact
- duplicates of the original or provides a quick, unsqueezed
- display of the tops of (or all of) squeezed files.
- Unsqueezing requires only a single pass.
-
- Both SQ and USQ accept batches of work specified by lists of
- file names (with drives if needed) and miscellaneous
- options. They accept these parameters in any of three ways:
-
- 1. On the CP/M command line.
- 2. From the console keyboard.
- 3. From a file.
-
- The FLS program can be used (on the same command line!) to
- expand parameter lists containing wild-card (ambiguous) file
- names into lists with the specific file names required by SQ
- and USQ.
-
- This combination of programs allows you to issue a single
- command which will produce many squeezed or unsqueezed files
- from and to various diskettes. For example, to unsqueeze all
- squeezed ASM files on drive B and send the results to drive
- C and also unsqueeze all squeezed TXT files on drive A and
- send the results to drive D:
- A>fls c: b:*.aqm d: *.tqt |usq
- For detailed instructions see USAGE.
- This DOES run under plain old vanilla CP/M! Many of the
- smarts are buried in the COM files in the form of library
- routines provided with the BDS C package (available from
- Lifeboat).
-
- The above example simulates a "pipe" (indicated by the "|")
- by sending the "console" output of the fls.com program to a
- temporary file and then running the sq.com program with
- options which cause it to read its parameters from its
- "console" input, which is really redirected to come from the
- temporary file.
-
- -------
- THEORY:
- -------
- The data in the file is treated at the byte level rather
- then the word level, and can contain absolutely anything.
- The compression is in two stages: first repeated byte values
- are compressed and then a Huffman code is dynamically
- generated to match the properties of each particular file.
- This requires two passes over the source data.
-
- The decoding table is included in the squeezed file, so
- squeezing short files can actually lengthen them. Fixed
- decoding tables are not used because English and various
- computer languages vary greatly as to upper and lower case
- proportions and use of special characters. Much of the
- savings comes from not assigning codes to unused byte
- values.
-
- More detailed comments are included in the source files.
-
- ---------------
- USAGE TUTORIAL:
- ---------------
- As usual, you have to learn how to tell the programs what to
- do (i.e., what parameters to type after the program name).
- First I will introduce the various possibilities by example.
- Then I will summarize the rules.
-
- In the simplest case either SQ or USQ can simply be given
- one or more file names (with or without drive names):
- A>sq xyz.asm
- A>sq thisfile.doc b:thatfile.doc
- will create squeezed files xyz.aqm, thisfile.dqc and
- thatfile.dqc, all on the current drive, A. The original
- files are not disturbed. Note that the names of the squeezed
- files are generated by rules - you don't specify them.
-
- Likewise,
- A>usq xyz.aqm
- will create file xyz.asm on the A drive, overwriting the
- original. (The original name is recreated from information
- stored in the squeezed version.) The squeezed version is not
- disturbed.
-
- Each file name is processed in order, and you can list all
- the files you can fit in a command. The file names given to
- SQ and USQ must be specific. You will learn below how to use
- the FLS program to expand patterns like *.asm (all files of
- type asm) into a list of specific names and feed them into
- SQ or USQ.
-
- The above examples let the destination drive default to the
- current logged drive, which was shown in the prompt to be A.
- You can change the destination drive as often as you like in
- the parameter list. For example,
- A>sq x.asm b: y.asm z.asm c: d:s.asm
- will create x.aqm on the current drive, A, y.aqm and z.aqm
- on the B drive and s.aqm on the C drive. Note that the first
- three originals are on drive A and the last one is on drive
- D. Remember that each parameter is processed in order, so
- you must change the destination drive before you specify the
- files to be created on that drive.
-
- Eventually you will have diskettes with many squeezed files
- on them and you will wonder what is in which file. If they
- weren't squeezed you would use the TYPE command to look at
- the comments at the beginning of the files. But squeezed
- files just make a mess on your CRT screen when you TYPE
- them, so I have provided the required feature as a preview
- option to the USQ program.
- A>usq -10 x.bas b:y.asm
- will not take the time to create unsqueezed files. Instead
- it will unsqueeze the first 10 lines of each file and
- display them on your console. The display from each file
- consists of the file names, the data and a formfeed (FF).
- Also,
- A>usq - c:xyz.mac
- will unsqueeze and display the first 65,535 lines of any
- files listed. That's the biggest number you can give it, and
- is intended to display the whole file.
-
- This preview option also ensures that the data is
- displayable. The parity bit is stripped off (some Wordstar
- files use it for format control) and any unusual control
- characters are converted to periods. You'll see some of
- these at the end of the files as the CP/M end of file is
- treated as data and the remainder of the sector is
- displayed.
-
- You are now familiar with all of the operational parameters
- of SQ and USQ. But so far you have always typed them on the
- command line which caused the program to be run. For reasons
- which will become apparent later, I have also provided an
- interactive mode. If there are no parameters (except
- directed i/o parameters, described later) on the command
- line, SQ and USQ will prompt with an asterisk and accept
- parameters from the console keyboard. Each parameter must be
- followed by RETURN and will be processed immediately. An
- empty command (just RETURN) will cause the program to exit
- back to CP/M. Try it - it will help you understand what
- follows.
-
- Now lets get into directed i/o, which will be new to most of
- you, but will save you so much work you will wonder how you
- ever got along without it.
-
- Perhaps you frequently squeeze or unsqueeze the same list of
- files and you would like to type the list once and be done
- with it. Use an editor (or FLS, described below) to create a
- file with one parameter per line. For example call it
- commands.lst.
-
- Then,
- A>sq <commands.lst
- will cause the command list file to be read as if you were
- typing it! You will see it on the console.
-
- That was redirected console input. Now assume that you have
- a very long list of files to squeeze or unsqueeze and while
- you are taking a nap the progress comments and maybe some
- error comments scroll off the screen. Redirecting the
- console output will let you capture the progress
- information in a file so you can check it later. The error
- comments will have the screen to themselves.
-
- For example,
- A>sq <commands.lst >out
- will send the progress comments to the file "out", which you
- can TYPE later. The routine display of the program name and
- version, etc., will still go to the console.
-
- A more practical example is to send that information to the
- console and to the file.
- A>sq <commands.lst +out
- will do that.
-
- Redirected input and output are independent - you can do
- either, both or neither.
-
- There is one more form of redirection called a "pipe". It is
- by far the most important to you. Recall that I promised to
- tell you how to use ambiguous file names such as *.asm (all
- files of type asm on the current default drive) or *.?q?
- (all files having a "q" as the second letter of their type).
- That last example just happens to mean "all squeezed files",
- assuming you don't have any other files with such a silly
- name (I hope).
-
- I have provided a program called FLS which is intended
- primarily for use in pipes. Here is an example:
- A>fls c: x.asm y*.asm >temp.$$$
- will simply pass the first two parameters through to the
- console output, which is being redirected to a file called
- temp.$$$. But the third parameter will be replaced by all
- the files on the current drive which are of type asm and
- have names beginning with y.
-
- FLS is smart enough to know that a letter followed by a
- colon and nothing else is a destination drive name intended
- for SQ or USQ. It will also treat any parameter beginning
- with a - (minus sign) as an option to be passed through.
- Anything else is considered a file name or pattern and is
- checked against the directory of the appropriate drive.
-
- Therefore you could use:
- A>fls b: c:*.aqm *.aqm -10 stuff.dqc >temp.$$$
- A>usq <temp.$$$
- A>era temp.$$$
- to unsqueeze all files of type aqm on drives C and A and put
- the unsqueezed files on drive B, and then preview the first
- 10 lines of file stuff.dqc.
-
- Here is where the pipe comes in. The above three commands
- can be abbreviated as:
- A>fls b: c:*.aqm *.aqm -10 stuff.dqc |usq
-
- That little "|" is the pipe option and it causes the FLS
- output to be redirected to a temporary file and when that is
- done it actually runs USQ for you with the proper input
- redirection and then erases the temporary file.
-
- If that isn't enough, you can still use the + or >
- redirection option at the end of that line to capture the
- console output from USQ.
- A>fls b: c:*.aqm *.aqm -10 stuff.dqc |usq >out
-
- If you plan your comments carefully you can produce a single
- file containing an abstract of an entire library of squeezed
- files in one step!
- A>fls -25 *.?q? |usq >abstract
-
- One final point. Anywhere you specify a file name you can
- specify a drive in front of it. That applies to redirection
- and well as files to be squeezed and unsqueezed. If a name
- begins with a - (minus sign) it will look like an option to
- FLS unless you put a drive name in front of it (b:-sq.077).
-
- --------------
- USAGE SUMMARY:
- --------------
- The previous section gradually presented the various options
- by example. This section gives a condensed and more abstract
- description and is intended for reference. If you couldn't
- see the forest for the trees, maybe this will give you a
- better view.
-
- The parameter handling of these programs is straightforward.
- Parameters fall into two classes: directed i/o options and
- operational parameters . Note that parameters read from files
- or from the console are not forced to upper case, but the
- internal file handling routines all treat lower case as
- upper case.
-
- When a file to be written already exists, it is quietly
- overwritten.
-
- Directed I/O parameters:
- The first action taken by these programs is to process
- directed i/o parameters from the CP/M command line. These
- parameters are optional and take the forms:
-
- <file read console input from file
- >file send most console output to file
- +file send most console output to file and console
- |pgm ... send most console output to a temporary file
- then run PGM.COM and take console input
- from the temporary file. "..." represent the
- parameters for PGM. This is called "piping".
-
- Only one input and one output redirection can apply to each
- program. After the program has arranged for any directed i/o
- parameters to be obeyed they are deleted from the parameter
- list seen by the rest of the program.
-
- Operational parameters:
- The program then checks if there are any remaining
- parameters from the CP/M command line. If there are, they
- are obeyed. If and only if there are no remaining parameters
- on the command line, the program prompts for them at the
- console. If console input has been directed to a file one
- parameter is read and obeyed from each line of the file.
- Otherwise, the user follows each typed parameter with a
- RETURN and an empty command exits the program.
-
- Each operational parameter is obeyed without looking ahead
- to other parameters, so options should precede the file
- names to which they apply.
-
- SQ operational parameters are a list of the following types:
- drive: set the current destination drive
- filename file to be squeezed
- drive:filename " " " "
-
- SQ does not change the files being squeezed. New, squeezed
- files are created on the destination drive (defaults to the
- current drive) with names derived from the original name but
- with the second letter of the file type (extention) changed
- to Q. When there is no type, QQQ is used. The original name
- is saved in the squeezed file.
-
- USQ operational parameters are a list of the following
- types:
- drive: set the current destination drive
- filename file to be squeezed
- drive:filename " " " "
- -count Preview (display on the console) the first
- "count" lines of each file, where
- "count" is a number from 1 to 65535.
-
- If the -count option IS NOT in effect then USQ creates
- unsqueezed versions of the listed files on the destination
- drive, which defaults to the current logged drive. Each
- unsqueezed file is CRC checked against the CRC value of the
- original file, which is part of the squeezed file.
-
- The -count option is for previewing squeezed files. It
- allows you to skim through a group of squeezed files,
- peeking at the first "count" lines in each. The > or +
- output redirection option could be used to capture this
- information in a file, along with the corresponding file
- names, thus forming an abstract of the files on a disk.
-
- When the -count option is used the CRC check is cancelled
- and the output is forced into printable form by stripping
- the parity bit and changing most unprintable characters to
- periods. The exceptions are CR, LF, TAB and FF. The output
- from each file is terminated by an FF. PIP can be used to
- strip FFs and provide formatted printing if desired. "Count"
- defaults to the maximum value, 65,535, in case you want to
- look at a whole file.
-
- FLS operational parameters: FLS is a "filter", which means
- it accepts input from the console input or command line and
- transforms the input according to a set of rules to produce
- console output. That's fine for getting familiar with FLS,
- but to make it useful you "pipe" its output to the input of
- SQ or USQ.
-
- Any FLS parameter which is of the form:
- drive:
- or -anything
- is copied to console output unchanged.
-
- Any other FLS operational parameter is treated as a file
- name and is checked against the directory of the appropriate
- drive. If it contains * or ? it is replaced by a list of all
- the files which fit the pattern. If nothing is found in the
- directory an error comment is sent to the console, even if
- normal console output has been redirected to a file.
-
- IMPORTANT: when using a pipe from FLS or any other input
- redirection to get the file list, etc., on which USQ or SQ
- are to operate you must NOT put any parameters other than
- redirection following the program name. They must be all
- together in the input parameter list. Example:
-
- A>fls -10 b:*.cq |usq +saveout
- is the proper way to preview the top (first 10 lines) of
- each squeezed .C file on the B drive. The -10 is passed
- through FLS to USQ. The results will be displayed on the
- console and saved in file "saveout" on the A drive. The
- saveout file lets you confirm the list of processed files
- even if the display scrolls off the screen while running
- unattended.
-
- In summary, i/o redirection parameters (those prefixed by +,
- <, >, or |) always follow the command to which they apply,
- but operational parameters (destination drive, -options)
- must be with the file name list.
-
- EXAMPLES:
-
- 1. Unsqueeze all squeezed files on the current drive and put
- the resulting unsqueezed files on the same drive.
- A>fls *.?q? |usq
-
- 2. Look at the first 10 lines of every squeezed file on
- drive B.
- A>fls -10 b:*.?Q? |usq
- note that since the file names for USQ came from FLS, the
- count option had to come from there too.
-
- 4. Squeeze all .ASM files on the B and C drives and put the
- squeezed files on the D drive.
- A>fls d: b:*.asm c:*.asm |sq
- Note that if d: had not been first the squeezed files would
- have gone to the A drive.
-
- 5. Squeeze file xyz.c on the A drive and put the results on
- the A drive.
- A>sq xyz.c
-
- 6. Build a parameter list of all ASM files on drive C in
- file XX
- R and view it on the console.
- A>fls c:*.asm +xx.par
-
- 7. Use the above list to squeeze the files to the A drive.
- A>sq <xx.par
-
- 8. As above, but results to the B drive.
- A>b:
- B>a:sq <a:xx.par
-
- 9. Squeeze all ASM and C files on the A drive and put the
- results on the B drive. Capture the progress comments in the
- file "out" without displaying them.
- A>fls b: *.asm *.c |sq >out
-
- 10. Preview the first 24 lines of each squeezed ASM file
- THEN unsqueeze them (unless stopped via cntl-C).
- A>fls -24 *.aqm a: *.aqm |usq
- Note that specification of a destination drive cancels
- previewing.
-
- --------------
- RECOMPILATION:
- --------------
- These programs are written in C, for the BDS C compiler, and
- have been adapted for directed I/O, as described in the BDS
- C DIO package. Instructions below are for compiling with
- the BDS C compiler.
-
- The files you will need are:
-
- SQ.C BDSCIO.H SQDIO.C
- USQ.C DIO.H USQDIO.C
- FLS.C DIO2.C
- TR1.C
- TR2.C
- UTR.C
- IO.C
- SQDEBUG.C
- SQ.H
- USQ.H
- SQCOM.H
-
- The files in the first column are included in this public
- domain release.
-
- BDSCIO.H and DIO.H are distributed with the BDS C compiler.
- The BDSCIO.H header file contains information about your
- system, including how much space to reserve for file buffers.
- You should use your own version of this file. The three
- files in the right column are identical; create them by
- copying DIO.C, which is also distributed with the BDS C
- compiler. Three copies with unique names are needed because
- each may be compiled with a different external variable
- option, as will be seen below. Use a version of DIO.C dated
- December, 1981 or later. Earlier versions contained a bug;
- the dioflush function failed to delete the TEMPIN.$$$ file
- before renaming another file to that name. (CP/M is stupid
- enough to make two files of the same name!)
-
- The procedures below indicate the various C language source
- files (file type C) required to recompile. Those files
- contain #include statements, which cause the header files
- (file type H) to be read and compiled.
-
- Each CC command produces a CRL file with specific addresses
- for external variables. If you recompile a file using the
- same value in the -e option, you don't have to recompile the
- other files; just do the desired CC and then repeat the
- entire CLINK.
-
- CLINK's -s option prints statistics. Top of memory means the
- current TPA. Stack space is what's left over. These programs
- require stack space for local variables, including some
- healthy I/O buffers. Also some functions are recursive. If
- SQ doesn't have several K of stack space it will probably go
- crazy and do almost anything.
-
- In the procedures below, the C compiler is named CC.COM;
- yours may be CC1.COM. Also, the files being compiled have
- their types (.c) included; that may not be necessary with
- your compiler. Before compiling, have BDSCIO.H and DIO.H on
- the diskette with your compiler. Place all the other files
- named above on one diskette, which may be your compiler
- diskette; here we have assumed that the compiler diskette is
- in drive A, and that the other files are on a diskette in
- drive b.
-
- To compile SQ (note: not all use -o):
- A>cc b:sq.c -o -e3c00
- A>cc b:sqdio.c -e3c00
- A>cc b:tr1.c -o -e3c00
- A>cc b:tr2.c -o -e3c00
- A>cc b:io.c -o -e3c00
- A>cc b:sqdebug.c -e3c00
- A>clink b:sq sqdio tr2 tr1 io -s
-
- The linker will display some statistics. Check that the
- start address of the external variables (3C00 in this
- example) is higher than the last code address. IF NOT,
- REPEAT THE ABOVE WITH A HIGHER ADDRESS IN THE -e OPTIONS.
-
- To compile USQ (Note: not all use -o):
- A>cc b:usq.c -o -e2d00
- A>cc b:usqdio.c -e2d00
- A>cc b:utr.c -o -e2d00
- A>clink b:usq usqdio utr -s
-
- CHECK THE ADDRESSES as described above.
-
- To compile FLS:
- A>cc b:fls.c
- A>cc b:dio2.c
- A>clink b:fls dio2
-
- That's it. However, you can save a lot of typing (and avoid
- many chances for error) by using the SUBMIT files for
- compiling, instead of entering all the above commands by
- hand. Put the three files SQ.SUB, USQ.SUB, and FLS.SUB on
- drive A, along with your compiler, BDSCIO.H, and DIO.H.
- Put the other files discussed above on one drive (let's say
- drive b). Then, to compile SQ:
-
- A>submit sq b: 3c00
-
- where b is the drive name, and 3c00 is the address for
- externals, as above. The submit file contains the compiler
- and linker commands, and will execute each, in turn. After
- the linkage is complete, check the addresses as described
- above, and increase the externals-address, if necessary.
-
- To compile USQ:
- A>submit usq b: 2d00
-
- Check the addresses as described above.
-
- And to compile FLS:
- A>submit fls b:
-
- Even if you put all the files on your compiler diskette
- for compiling, don't leave the drive name out of the submit
- command, or the system will get confused.
-
- -------------------
- IN CASE OF TROUBLE:
- -------------------
- I welcome suggestions and bug reports, but you must
- understand that some of the ideas I get would involve almost
- as much program development as the original package. I have
- what I want and (I hope) what most users want, so I am not
- motivated to spend many more months creating something
- entirely different which just happens to involve data
- compression. The data compression routines are probably less
- than half of this package, and are designed to operate on
- large blocks of data, such as files.
-
- Dick Greenlaw
- 251 Colony Ct.
- Gahanna, Ohio 43230
-
- ------------------------------------------------------------
- SQUEEZER.DOC revised November 3, 1985.
- Please send bug reports and comments to:
-
- John M. Smith
- Librarian, CUG Utilities IV diskette
- 21505 Evalyn Ave.
- Torrance CA 90503