home *** CD-ROM | disk | FTP | other *** search
-
- ARC-FILE.INF, created by Keith Petersen, W8SDZ, 21-Sep-86, extracted
- from UNARC.INF by Robert A. Freed.
-
- From: Robert A. Freed
- Subject: Technical Information for ARC files
- Date: June 24, 1986
-
- Note: In the following discussion, UNARC refers to my CP/M-80 program
- for extracting files from MSDOS ARCs. The definitions of the ARC file
- format are based on MSDOS ARC512.EXE.
-
- ARCHIVE FILE FORMAT
- -------------------
-
- Component files are stored sequentially within an archive. Each entry
- is preceded by a 29-byte header, which contains the directory
- information. There is no wasted space between entries. (This is in
- contrast to the centralized directory used by Novosielski libraries.
- Although random access to subfiles within an archive can be noticeably
- slower than with libraries, archives do have the advantage of not
- requiring pre-allocation of directory space.)
-
- Archive entries are normally maintained in sorted name order. The
- format of the 29-byte archive header is as follows:
-
- Byte 1: 1A Hex.
- This marks the start of an archive header. If this byte is not found
- when expected, UNARC will scan forward in the file (up to 64K bytes)
- in an attempt to find it (followed by a valid compression version).
- If a valid header is found in this manner, a warning message is
- issued and archive file processing continues. Otherwise, the file is
- assumed to be an invalid archive and processing is aborted. (This is
- compatible with MS-DOS ARC version 5.12). Note that a special
- exception is made at the beginning of an archive file, to accomodate
- "self-unpacking" archives (see below).
-
- Byte 2: Compression version, as follows:
-
- 0 = end of file marker (remaining bytes not present)
- 1 = unpacked (obsolete)
- 2 = unpacked
- 3 = packed
- 4 = squeezed (after packing)
- 5 = crunched (obsolete)
- 6 = crunched (after packing) (obsolete)
- 7 = crunched (after packing, using faster hash algorithm) (obsolete)
- 8 = crunched (after packing, using dynamic LZW variations)
-
- Bytes 3-15: ASCII file name, nul-terminated.
-
- (All of the following numeric values are stored low-byte first.)
-
- Bytes 16-19: Compressed file size in bytes.
-
- Bytes 20-21: File date, in 16-bit MS-DOS format:
- Bits 15:9 = year - 1980
- Bits 8:5 = month of year
- Bits 4:0 = day of month
- (All zero means no date.)
-
- Bytes 22-23: File time, in 16-bit MS-DOS format:
- Bits 15:11 = hour (24-hour clock)
- Bits 10:5 = minute
- Bits 4:0 = second/2 (not displayed by UNARC)
-
- Bytes 24-25: Cyclic redundancy check (CRC) value (see below).
-
- Bytes 26-29: Original (uncompressed) file length in bytes.
- (This field is not present for version 1 entries, byte 2 = 1.
- I.e., in this case the header is only 25 bytes long. Because
- version 1 files are uncompressed, the value normally found in
- this field may be obtained from bytes 16-19.)
-
-
- SELF-UNPACKING ARCHIVES
- -----------------------
-
- A "self-unpacking" archive is one which can be renamed to a .COM file
- and executed as a program. An example of such a file is the MS-DOS
- program ARC512.COM, which is a standard archive file preceded by a
- three-byte jump instruction. The first entry in this file is a simple
- "bootstrap" program in uncompressed form, which loads the subfile
- ARC.EXE (also uncompressed) into memory and passes control to it. In
- anticipation of a similar scheme for future distribution of UNARC, the
- program permits up to three bytes to precede the first header in an
- archive file (with no error message).
-
-
- CRC COMPUTATION
- ---------------
-
- Archive files use a 16-bit cyclic redundancy check (CRC) for error
- control. The particular CRC polynomial used is x^16 + x^15 + x^2 + 1,
- which is commonly known as "CRC-16" and is used in many data
- transmission protocols (e.g. DEC DDCMP and IBM BSC), as well as by
- most floppy disk controllers. Note that this differs from the CCITT
- polynomial (x^16 + x^12 + x^5 + 1), which is used by the XMODEM-CRC
- protocol and the public domain CHEK program (although these do not
- adhere strictly to the CCITT standard). The MS-DOS ARC program does
- perform a mathematically sound and accurate CRC calculation. (We
- mention this because it contrasts with some unfortunately popular
- public domain programs we have witnessed, which from time immemorial
- have based their calculation on an obscure magazine article which
- contained a typographical error!)
-
- Additional note (while we are on the subject of CRC's): The validity
- of using a 16-bit CRC for checking an entire file is somewhat
- questionable. Many people quote the statistics related to these
- functions (e.g. "all two-bit errors, all single burst errors of 16 or
- fewer bits, 99.997% of all single 17-bit burst errors, etc."), without
- realizing that these claims are valid only if the total number of bits
- checked is less than 32767 (which is why they are used in small-packet
- data transmission protocols). I.e., for file sizes in excess of about
- 4K bytes, a 16-bit CRC is not really as good as what is often claimed.
- This is not to say that it is bad, but there are more reliable methods
- available (e.g. the 32-bit AUTODIN-II polynomial). (End of lecture!)
-
- Bob Freed
- 62 Miller Road
- Newton Centre, MA 02159
- Telephone (617) 332-3533
-
-
-