home *** CD-ROM | disk | FTP | other *** search
- Files
-
- A file is a collection of related information. All programs, text,
- and data on your disk reside in files and each file has a unique name.
- You refer to files by their file names.
-
- You create a file each time you enter and save data or text at your
- terminal. Files are also created when you write programs and save them
- on your disks.
-
- The names of the files are kept in directories on a disk. These
- directories also contain information on the size of the files and may
- contain the dates that they were created, updated, and accessed.
-
- If you want to know what files are on your disk, you can use the DIR
- command. This command tells the operating system to display all the
- files in the working directory of a specific disk.
-
-
- File Names
-
- Each CP/M file has a unique name consisting of one to eight
- characters, optionally qualified by the drive and an extension. The
- three parts that make up file names follow:
-
- d:filename.ext
-
- The first part is the drive code and is optional. The drive code is
- a single letter followed by a colon. The drive code specifies the disk
- drive on which the file is currently to be found. CP/M provides up to
- 16 disk drives, named "A" thru "P". Some systems, such as CP/M Plus and
- ZCPR, allow you to specify the user area with the drive code. Many
- utility programs also allow the "drive/user" specification. If the
- drive code is not specified, the logged on drive is assumed.
-
- The second part is the actual name of the file. The file name is
- from one to eight characters, usually upper case alphabetic or numeric,
- but some other printable characters can be used. The characters < > . ,
- ; : = ? * [ ] have special meaning and may not be used. The name should
- be an abbreviated, descriptive name of what the file contains.
-
- The third part is the optional file type or file extension. It is
- separated form the file name by a period. It is good practice to
- include a file type even though it is optional. File types consist of
- up to three characters and are separated from the file name by a period.
- Some programs, such as "ASM" require a specific file type.
-
- EXAMPLES:
-
- A:FILENAME.INF
- ^ ^ ^
- | | |
- | | --- FILE TYPE --- optional 1-3 characters.
- | |
- | |
- | ------------ FILE NAME --- required 1-8 characters.
- |
- |
- -------------- DRIVE CODE -- optional 1 character.
-
-
-
- At the operating system prompt (the "A>") and from most programs,
- you can enter the filename in upper or lower case letters. The
- characters will be translated to upper case. The exception to this is
- when you issue the save command from BASIC. Lower case characters will
- actually be saved in the directory.
-
- Attached is a list of commonly used file type designations on RCP/M
- systems around the country. There is no RULE that these have to be used
- as described, but it is conventional to use them that way.
-
- Two special characters (called wildcards) can be used when you are
- searching the files on a disk: the asterisk (*) and the question mark
- (?). The question mark (?) in a file name or extension means that any
- valid character can occupy that position. An asterisk (*) in the file
- name or extension means that any character can occupy that position or
- any of the remaining positions in the file name or extension.
-
- ASCII (American Standard Code for Information Interchange) files are
- printable -- readable-by-human -- files. They consist of letters,
- numbers, and a few symbols such as periods, comas, !, @, #, $, %, &, *,
- etc. . . with which we are familiar. An ASCII file may be "TYPE"ed, and
- can be transferred over phone lines without error-checking, if desired.
- ASCII file should contain no word processor specific information.
- WordStar files should be saved in non-document mode. Each line should
- end with a Carriage Return, Line Feed sequence and an EOF (hex 1A)
- should pad the last block of data.
-
- Many of the popular Public Domain programs and information files
- are distributed in library (LBR) files. Below is a discussion of the
- structure of LBRs and the utilities needed to process them.
-
- A library is a group of files collected together into one file in
- such a way that the individual files may be recovered intact. A library
- file can be identified by the "LBR" as the extent of the file name. LU
- is a CP/M utility used to maintain libraries of files. LU does not
- perform any compression. Because of this, most people will squeeze or
- crunch files before adding them to a library if they want to save space.
- If you want to remove the component files (members) from a .LBR file,
- you should have a copy of LU.COM or other LBR extractor utility. At the
- end of this document is a list of the programs available on many Remote
- CP/M systems and in the CP/M RoundTable Software Libraries of GEnie that
- function with libraries.
-
- A library file usually takes up less space than the total of the
- individual member files which went into it. The reason for this is that
- CP/M allocates disk space in fixed blocks or groups, typically 2k bytes
- each. Any space after the last sector of a file up to the next 2k block
- boundary is wasted. The same files in a library use only the number of
- sectors they actually need, and though the library itself may have a
- partially wasted block at the end, and requires some space for directory
- information at the beginning, the net effect is usually a saving of
- total space. The best results are seen when many small files are
- combined into one library.
-
- A library file makes most efficient use of the CP/M disk directory,
- since it is treated as only one file by CP/M regardless of how many
- members it contains.
-
- Libraries can aid in transferring packages of software from one
- system to another using XMODEM or other file transfer protocol. Only
- one file is transferred, eliminating the need to run the XMODEM transfer
- program several times, the chance of overlooking a needed file, and the
- problems of naming conflicts, (such as READ.ME files) among unrelated
- packages.
-
- When members are added to a library, a CRC (Cyclic Redundancy
- Check) value is calculated and stored in the directory of the library.
- When the members are later extracted or the library is reorganized, the
- CRC value is again calculated and checked against the value in the
- directory. If a discrepancy occurs the operator is notified. (Caution:
- This CRC validation does not occur with some public domain file
- extractors and earlier versions of LU and NULU.)
-
- Members can be added to, renamed, and deleted to the library. The
- directory information of library is contained in the same file as the
- members. The amount of space to be allocated to the directory must be
- specified by the user when a new library is created, but can be changed
- when the file is reorganized.
-
-
- Recently popular CP/M Public Domain software files and information
- files are being distributed using ARCHIVE files. ARChive files are
- similar to library (LBR) files in that they take a logical group of
- files and put them together in a single file. The main difference, is
- that the members of the "ARC" file are automatically compressed. The
- compression algorithm chosen is one of three which will produce the
- smallest file.
-
- ARChive files have been available to the MS-DOS and PC-DOS areas,
- but, have been made useful in the CP/M environment with the introduction
- of the "UNARC" program. The current version is 1.6, and is available
- with assembly language source, extensive documentation, and two
- executable COM files, a 8080/8085 version and a Z80 version. The Z80
- version takes advantage of the expanded Z80 (and equivalent) instruction
- set for speed and size, and therefore is machine dependent.
-
- A CP/M utility has just recently been made available to make an
- "ARC" file. However, because of the resources required, it is still
- impractical to make Archives in the CP/M environment. ARChive files will
- be made on systems using other operating systems.
-
- ARChive files are identified by the "ARC" as the file extension.
- This is a packaging method that guarantees no growth during storage.
- The files contain a "marker", followed by file information, file-data,
- file information, file-data etc. File contents are analyzed before
- storage and either stored:
-
- 1. AS IS (typically files in the 1 to 200 byte range).
- 2. With repeat-compression (same range as above).
- 3. Using Huffman 8-byte encoding.
- 4. Using Lempel-Ziv-Welch encoding (all others).
- 8. Crunched - non-repeat packed (DLE encoded).
- 9. New squashed files created with PKARC.
-
- The ARChive technique frees the user from worrying about storage
- mechanisms and delivers practically all needed services (extract, store,
- list, type, check, execute and re-compress using "latest" state of
- compression technique). ARC is "downward" compatible. It is currently
- heavily used in the MSDOS/PCDOS world, although usage in RCP/M systems
- is starting with availability of a fast DE-ARCer.
-
- The MS/PC-DOS ARC utility belongs into the category of "Share-ware"
- or "Free-ware" - it is copyrighted by System Enhancement Associates
- (source-language C, system MSDOS). Phil Katz is the author of PKARC and
- the current version is 3.5. UNARC was written by Bob Freed for the
- Public Domain (source-language assembler, for CP/M systems).
-
-
- Some files on RCP/M systems and in the CP/M RoundTable Software
- Libraries have been compressed, using one of the standard public domain
- utilities, to minimize download time and save storage space. This topic
- briefly discusses these compression techniques.
-
- Files that have been compressed can be identified by the filetype
- (the last 3 letters of a filename after the ".") that signifies the
- compression. These are:
-
- .?Q? for Squeezed files (middle letter is a Q).
- .?Z? for Crunched files (middle letter is a Z).
-
- USQ120.COM is used to unsqueeze, or expand files that have a "Q" as
- the middle letter of the filetype. Such files have been squeezed, or
- compressed with SQ111.COM or similar utility. These programs use
- Huffman Encoding to reduce the size of the target file. Depending on
- the distribution of data in a file it can be reduced in size by 5% to
- 60% by squeezing it. If you download a file with a filetype indicating
- that it is squeezed, you will need USQ120.COM to expand it before you
- can use it. There are other programs available, written in different
- languages and take advantage of special hardware, but USQ is
- 8080/8085/Z80 compatible.
-
- Other utilities are available that have the unsqueeze coding
- imbedded and function with squeezed or unsqueezed files. There are
- programs that perform file maintenance functions (NSWP), bi-directional
- display utilities (BISHOW), and string search programs, (SEARCH and
- FINDU). This method of compressing files has been used for some time
- now and programs to uncompress the files are available to several micro
- processors and main frame computers.
-
-
- CRUNCH uses the Lempel-Ziv-Welch (LZW) techniques. This method is
- fast and offers compression ratios around 55%. Highest compression is
- achieved with graphics data, values of 90% are typical, followed by
- text, with 50%, and COM files around 20%. This method is relatively new
- to the CP/M environment. See CRUNCH24.LBR for the Z80 CRUNCH and
- UNCRUNCH utilities. FCRNCH11.LBR contains the utilities for 8080/8085
- compatible processors. CRUNCH Version v2.0 and higher embody all of the
- concepts employed in the UNIX COMPRESS / ARC512 algorithm, but is
- additionally enhanced by a "metastatic code reassignment" facility.
- This is one of several concepts the author, Steven Greenberg is
- developing as part of an effort to advance data compression techniques
- beyond current performance limits. He believes this is the first time
- this principle has been proposed and implemented.
-
- Since this method of file compression is relatively new, only a few
- utilities are available that process a crunched file directly. TYPELZW,
- TYPEQZ, and LT are display utilities, which also display members of
- libraries and squeezed files. SEARCH is a file searching program that
- allows you to search multiple text files for various words or phrases.
- SEARCH can directly search files within libraries, as well as squeezed
- and crunched files. Files may also be processed on other systems not
- using the Z80 processor.
-
- A mini comparison of Huffman Encoding and Lempel-Ziv-Welch (LZW)
- techniques follows.
-
- Huffman Encoding expresses each storage unit as a variable length
- pointer into a frequency-ordered tree. Compression is achieved by
- choosing a "native" storage unit (where repetitions are bound to occur)
- and (on the average) expressing the more frequent storage units with
- shorter pointers [although less used units might be presented by longer
- pointers]. The Encoding process needs two passes i.e., once reading all
- units (under CP/M and MSDOS 8 bit bytes) to build the frequency ordered
- tree (also called the "dictionary") and then translating all units into
- their respective pointer values. Original filename, dictionary and
- pointer values are stored - by convention the second character of the
- filename extension is changed to Q - reminder of a "squeezed" file.
-
- LZW expresses strings of 8-bit bytes by pointers into an "ordered"
- string-table. The rules for "constructing" the table are reversible, so
- that Compressor and De-Compressor can build their table on-the-fly. LZW
- is one-pass, although achieved speed is VERY dependent on language
- implementation and available physical memory (in general more than 90%
- of time spent in hashing and table searching). Although early
- implementations of LZW seemed to need more than 64K of physical memory,
- current enhancements make a maximum of 2**11 table entries sufficient to
- handle all cases. State of the art implementations check compression
- ratio on the fly - and rebuild the table if compression ratio decreases
- beyond a minimum or rebuild the table on table overflow.
-
- Typical Huffman compression ratios however around 33% (compressed
- file is 66% of original, whereby text is typically compressed a little
- better, and executable files less). Typical LZW compression ratios
- average 55%. Highest compression is achieved with pixel-information,
- values of 90% are typical, followed by text, with 50%, and executable
- files around 20%. Although the original paper on LZW suggested
- implementation between CPU and peripheral devices (terminal,
- disk-drives, mag-tapes) - current usage encompasses file-compression
- (Unix COMPRESS, MSDOS ARC, CPM UNArc) - high speed proprietary
- MODEM-protocols ("LZW in SILICON") and "picture transmission" at 1200
- baud.
-
- Thoughts on CP/M and MS-DOS filename compatibility.
-
- Many users now work with both CP/M and MS-DOS systems. Files of
- the two systems have a compatible file structure , (ASCII text,
- WordStar, dBase II, Archives, etc), and multi-format disk utilities,
- (Media Master, Uniform, etc). Unfortunately, although the file naming
- conventions for each of the systems are similar, there are some
- differences that demand attention if compatibility is to be assured.
-
- Below is a list of the LEGAL characters common to both CP/M and
- MS-DOS:
-
- A-Z 0-9 ! # $ & ' - @ ^ ` { } ~
-
- In ASCII sorting order (same characters):
-
- ! # $ & ' - 0-9 @ A-Z ^ ` { } ~
-
-
- MS-DOS illegal file names (reserved for device names):
-
- AUX, CON, PRN, NUL, COM1, COM2, LPT1, LPT2, LPT3
-
-
- Computer users that are interested in transferring files could
- standardize on the above characters, (while avoiding the reserved names
- when using CP/M). This would provide one more area of compatibility.
-
- File Types
-
- $$$ -- Temporary file, used by PIP and other copy programs as a work file.
- ACT -- ACT language source file.
- ADD -- Indicating an "addition" or new update.
- ADV -- Adventure game.
- ALG -- ALGOL language source files.
- APL -- APL language.
- ARC -- ARChive files.
- ARK -- ARChive files, used for CP/M files.
- ART -- Article files.
- ASC -- BASIC language source statements.
- ASM -- Assembly Language source code, usually for 8080 assemblers.
- AZM -- Assembly Language source code, used with Z80MR.
- BAD -- Bad sector directory entry file.
- BAK -- Backup file.
- BAS -- Basic language source statements. Normally saved as ASCII.
- BBS -- Bulletin board system file.
- BHB -- Heath Benton Harbor Basic language.
- BIN -- Binary file. Usually NOT a .COM file renamed.
- BSE -- E BASIC source. See also, "EBA" and "EBS".
- BUG -- Bug data/information file.
- C -- C Language source. Most often BDS C.
- CAL -- Calc or spreadsheet data file.
- CAT -- Catalog of file names.
- CCP -- Console command processor file.
- CHK -- Check file.
- CMD -- Command file CP/M 86.
- COB -- COBOL language source statements.
- COM -- Machine language COMMAND files for CP/M 80.
- CPR -- Compare file.
- CRC -- CRC data file.
- CRL -- C language relocatable/intermediate file:.
- DAT -- DATA file.
- DDT -- DDT file.
- DIF -- Difference file.
- DIR -- Directory file.
- DOC -- Documentation file.
- DSK -- Disk data file.
- EBA -- E BASIC source. See also "BSE" and "EBS".
- EBS -- E BASIC source. See also "EBA" and "BSE".
- ENV -- ZCPR3 Environment Descriptor file.
- ERL -- Relocatable pascal module.
- FCP -- ZCPR3 Flow Command Package.
- FEX -- Felix language source file.
- FIX -- Instructions for correct program errors.
- FMT -- Format file.
- FOR -- FORTRAN language source statements.
- GMR -- Grammar file.
- H -- C Language "header" source statements.
- HEX -- HEX intermediate file. Most often INTEL format.
- HLP -- File intended for use with the HELP utility.
- IDX -- Index file for data file.
- INF -- Information files.
- INP -- Input file.
- INT -- Intermediate code produced by compilers such as CBASIC.
- INV -- Invoice file.
- IOP -- ZCPR3 Input/Output Package.
- LBR -- Library file. Use NULU, LU, LDIR, LUX, LTYPE to manipulate.
- LIB -- Library file assembly source module.
- LST -- Listing files, intended for printing.
- LTR -- Letter/correspondence file.
- M80 -- Microsoft M80 Macro assembler source.
- MAC -- Macro assembly source file for M80.
- MAG -- Magazine file.
- MAP -- Map data file.
- MEM -- Memory file.
- MNU -- ZCPR3 MENU utility script.
- MOD -- Modification instructions.
- MSG -- Message file. Timely, not of permanent use.
- MSS -- Manuscript documents. Input to word processors.
- MUS -- Music language source file.
- NAM -- Name file.
- NDR -- ZCPR3 Named Directory Package.
- NEW -- Indicates proposed revision to an existing program/release.
- OBJ -- Object file or renamed COM.
- OUT -- Output file.
- OVL -- Overlay command file.
- OVR -- Overlay: a "part" of a multi-part .COM file.
- PAS -- PASCAL language source statements.
- PAT -- Patch for customizing or fixing programs.
- PGM -- Program file.
- PIC -- Picture file.
- PL1 -- PL/1 language source statements.
- PLM -- PLM language source file.
- PLT -- Pilot language source file.
- PRN -- Listing output of assemblers.
- PRT -- Print files, intended for printing.
- PTR -- Printer file.
- PUN -- Punch device file.
- RAT -- Ratfor language source file.
- RCP -- ZCPR3 Resident Command Package.
- REF -- Reference file.
- REL -- Relocatable/intermediate file. Output from.
- ROM -- Read only memory file.
- RPT -- Report file.
- SAM -- SAM language source file.
- SET -- Setup file.
- SIG -- SIG/M information file.
- SRC -- Pascal source file.
- SRT -- Sorted file.
- STC -- STOIC language source file.
- SUB -- File of commands for input to SUBMIT.
- SUB -- Submit command file.
- SYM -- Symbol table file.
- SYS -- System file.
- TEL -- Telephone number file.
- TEX -- Text file.
- TST -- Test file.
- TXT -- Text file.
- TYP -- Type file.
- UTL -- Utility file.
- VAR -- Variable file.
- VMN -- ZCPR3 VMENU utility script.
- WS -- Text document in WordStar format.
- Z3T -- ZCPR3 TCAP entry.
- Z80 -- Assembly Language source code, usually for Z80 assemblers.
- ZEX -- ZCPR3 ZEX utility script file.
- nnn -- Used to indicate "volume serial #".
- xQx -- Squeezed file. Needs to be "unsqueezed" before use.
- xZx -- Crunched file. Needs to be "uncrunched" before use.
-
- File Utilities
-
- File name K Description
-
- ARC-FILE.IQF 5 ARC file internal structure defined
- CPMSQV3.LBR 30 SQueeze/UnSQueeze - Turbo Pascal
- CRUNCH20.LBR 52 Data compression with LZW algorithm
- DELBR11.COM 13 LBR file extractor
- DELBR11A.CQ 6 LBR file extractor source code
- DLU12.PQS 11 A library utility in turbo pascal
- LBRDSK23.LBR 17 Treat libraries as a logical drive
- LDIR.COM 2 Directory lister for LBR files
- LDIR23.LBR 16 Lists directory of LBR file
- LRUN20.AQM 16 Run .COM files inside LBRs
- LRUN20.COM 2 Run .COM files inside LBRs
- LSTYPE.LBR 7 Print multiple files inside LBRs
- LSWEEP13.LBR 25 Library SWEEP utility extract/view
- LTYPE17.LBR 17 Types text files inside LBRs
- LU300.DQC 22 Documentation for LU
- LU310.COM 21 Library Utility version 3.10
- LU310.HLP 1 Help file for use with LU310
- LU310.UPD 3 Update info on LU310.COM
- LUDEF5.DQC 11 Internal structure of LBR files
- LZW.LBR 52 Compression/decompression Utilities
- NULU15.NOT 2 A note from the author of NULU151
- NULU15.WQ 40 Complete user's guide for NULU151
- NULU151.COM 16 Machine lang. Library Utility pgm
- NULUFIX.ASM 2 Bug fixes for NULU15.COM
- NULUTERM.AQM 2 Terminal configuration for NULU151
- SQ.PQS 13 File SQueezer
- SQ111.COM 6 Machine language SQueezer, very fast
- SQUEEZE.TXT 13 Tutorial on SQueeze/UnSQueeze
- SQUPORT2.LBR 35 Portable SQueeze/UnSQueeze in C lang
- TYPEQZ12 35 Squeezed/Crunched type utility
- UNARC-P1.NQT 2 UNARC12 patch for non-standard CP/M
- UNARC.COM 5 Z80 version of UNARChive utility
- UNARC12.LBR 108 UNARC utility for CP/M
- UNARCA.COM 5 Lists, types, extracts from ARChives for 8080
- UNCR20.COM 4 UNCRunch for CRUNCH20 and prior
- UNCR8080.COM 6 UNCRunch for 8080/8085 CPUs
- USQ.PQS 5 SQueezed file UnSQueezer
- USQ120.COM 2 Dave Rand's machine lang. UnSQueezer
- USQ120.DOC 3 Documentation for Dave Rand's USQ120
- USQFST20.LBR 28 Fast unsqueezer for Z80 computers
-
- November 11, 1987
-
- This text file consists of notes taken at the November meeting of
- D:KUG (The Detroit Metropolitan Kaypro Users Group). The subject of the
- meeting was about files, the format of file names, and the public domain
- programs available to process disk files.
-
- B.Duerr