home *** CD-ROM | disk | FTP | other *** search
- unit DOUGPACK;
-
-
- {----------------------------------------------------------------------}
- {------ Turbo Pascal DOUGPACK unit written by Douglas Webb --------}
- {----------------------------------------------------------------------}
- {------ DISCLAIMER: There shall be no guarantee of the --------}
- {------ suitability of this software for any purpose. The --------}
- {------ author shall not be liable for any damages arrising --------}
- {------ from the use of this software. --------}
- {----------------------------------------------------------------------}
-
-
- { This unit was written to demonstrate how LZW compression can be used to
- compress files. It's ability to do so depends highly on the type of data
- being compressed. Text files may compress to 30-50% their original size,
- .EXE files to 60-80% of their original size, database files to 20-40% of
- their original size, and unpatterned data may actually increase in size.
-
- This incarnation of the algorithm is optimized for speed, as much as is
- possible in a high level language like pascal, and to a lesser
- degree at this experimental stage, flexibility, not readability.
-
- CRC assembly language routines were furnished by:
- Edwin T. Floyd [76067,747]
-
- This unit allows the user to compress data using a variation on the
- standard LZW compression format, or conversely to decompress data that
- was previously compressed by this unit.
- This unit makes a few assumptions:
- 1) Data being compressed is being sent to a file.
- 2) Data being decompressed is coming from a file.
- There are however a number of options as to where the compressed data
- is coming from, and the decompressed data is going.
-
- In fact it requires that you pass the "Compress" procedure a procedural
- parameter of type 'GetBytesProc' (declared below) which will accept 3
- parameters and act in every way like a 'BlockRead' procedure call.
- Compress will ask for data in chunks of 4K or so at a time. Your
- procedure should return the data to be compressed:
-
- GetBytesProc = PROCEDURE(VAR DTA; NBytes:WORD; VAR Bytes_Got : WORD);
-
- DTA is the start of a memory location where the information returned
- should be. NBytes is the number of bytes requested. The actual number
- of bytes returned must be passed in Bytes_Got (if there is no more data
- then 0 should be returned).
-
- "Decompress" requires a procedural parameter of type 'PutBytesProc'
- which will accept 3 parameters, and must act in every way like a
- 'BlockWrite' procedure call. It must accept the decompressed data
- and do something with it.
-
- GetBytesProc = PROCEDURE(VAR DTA; NBytes:WORD; VAR Bytes_Got : WORD);
-
- Don't forget that as procedural parameters they must be compiled in the
- 'F+' state to avoid a catastrophe.
-
-
- Unpleasant NOTE: My provisions for maintaining a CRC for the compressed
- file seem to get into trouble if you try to compress/decompress multiple
- runs of data successively. You'll get a warning that the CRC is bad
- when in fact results appear to indicate that this is not so. So you may
- have to ignore the CRC unless you can figure out how it's broken.
-
- }
-
- interface
- uses crt,Dos,CRC;
-
- {#T The_LZW_Algorithm }
- { The compression algorithm :
-
- STRING = get input character
- WHILE there are still input characters DO
- CHARACTER = get input chracter
- IF STRING+CHARACTER is in the string table THEN
- STRING = STRING + character
- ELSE
- output the code for string
- add STRING+CHARACTER to the string table
- STRING = CHARACTER
- END of IF
- END of WHILE
- output the code for string
-
- The decompression algorithm:
-
- Read OLD_CODE
- output OLD_CODE
- WHILE there are still input characters DO
- Read New_CODE
- IF NEW_CODE is not in the translation table THEN
- STRING = get translation of OLD_CODE
- STRING = STRING+CHARACTER
- ELSE
- STRING = get translation of NEW_CODE
- END of IF
- output STRING
- CHARACTER = first character in STRING
- add OLD_CODE+CHARACTER to the translation table
- OLD_CODE = NEW_CODE
- END WHILE
-
-
- Wrinkles added to improve compression:
- 1: Sliding dictionary size, Always start with a 9 bit table, then
- when it's full increase the table size to 10 bits, and so on until
- the dictionary is as big as you intend to support (say 12-14 bits).
- 2: Empty the library after it fills up and start again. This is useful
- in files where the repetative elements in them may change positionally,
- such as in picture files of one sort or another, and many .EXE files.
- In some cases this may actually cost you some compression, but not often
- and not very much in any event. Even smarter (but not implemented)
- would be to monitor compression and clear/partial clear if compression
- appears to be dropping.
-
- }
-
-
-
-
- CONST
- Bits = 12; { This constant reflects the number of bits used to
- generate the dictionary to compress the data. Data
- must be decmpressed using the same dictionary size
- as was used when it was compressed. Setting the
- number of bits to 12, 13 or 14 affects several
- constants.
- Larger Files tend to compress better with a larger
- dictionaries.
-
- Memory used by this unit is a function of Bits &
- and the associated constant Table_Size:
- 14 : 110K; 13 : 65K; 12 : 45K
- All but about 5-10K of this is heap and is not
- needed (or in use) when the unit is not actually
- compressing or decompressing data.
-
- If you change this value, change the value of
- Table_Size appropriately.
- }
-
-
-
- TYPE
- PutBytesProc = PROCEDURE(VAR DTA; NBytes:WORD; VAR Bytes_Put : WORD);
- {#X GetBytesProc}
- { "Decompress" requires a procedural parameter of type 'PutBytesProc'
- which will accept 3 parameters, and must act in every way like a
- 'BlockWrite' procedure call. It must accept the decompressed data
- and do something with it (like save it to a file).
-
- Don't forget that as procedural parameters they must be compiled in the
- 'F+' state to avoid a catastrophe. }
-
-
- GetBytesProc = PROCEDURE(VAR DTA; NBytes:WORD; VAR Bytes_Got : WORD);
- {#X PutBytesProc}
- { The "Compress" procedure, requires that it be passed a procedural
- parameter of type 'GetBytesProc' which will accept 3
- parameters and act in every way like a 'BlockRead' procedure call.
- Compress will ask for data in chunks of 4K or so at a time. Your
- procedure should return the data to be compressed.
-
- DTA is the start of a memory location where the information returned
- should be. NBytes is the number of bytes requested. The actual number
- of bytes returned must be passed in Bytes_Got (if there is no more data
- then 0 should be returned).
-
- Don't forget that as procedural parameters they must be compiled in the
- 'F+' state to avoid a catastrophe.
- }
-
-
- Function Compress(VAR OutFile : File; VAR Bytes_Written:LongInt; GetBytes:GetBytesProc): Word;
- {#X Decompress The_LZW_Algorithm}
- { This function uses LZW compression to compress the contents of InFile,
- and write them to OutFile, a CRC value for the original value is returned.
- The size of the compressed output is returned in 'Bytes_Written'. }
-
- Function Decompress(VAR InFile : File; UsedBits: Word; NoBytes : Longint; PutBytes: PutBytesProc): Word;
- {#X Compress The_LZW_Algorithm}
- { This is the decompression routine. It takes a LZW format file, and
- expands it to an output file. The code here should be a fairly close
- match to the algorithm above.
-
- Usedbits - How many bits was the dictionary used during compression
- (this is to make sure decompression is the same.)
- NoBytes - How many bytes are being decompressed
- }
-
-
- implementation
-
- {$R-} { Error checking slows things down by 200% }
-
- CONST
- Hashing_Shift = Bits - 8;
- Max_Value = PRED((1 SHL Bits)); { Code indicating end of data. }
- Max_Code = PRED(Max_Value); { The maximum amount of table entries allowed. }
- Buffer_Size = 4096; { Buffer for file I/O }
- Terminator : Array[10..14] OF WORD = (1023,2047,4095,8191,16383);
-
-
- { IF Bits = 14 then define table size as 18041 } { The string table size }
- { IF Bits = 13 then define table size as 9029 } { must be a prime number }
- { IF Bits = 12 then define table size as 5021 } { about 25% larger than }
- Table_Size = 5021; { 2^Bits. }
-
-
-
-
- TYPE
-
-
- Buffer_Type = Array[1..Buffer_Size] of Byte; { I/O buffers. }
- Buffer_Ptr = ^Buffer_Type;
-
- Stack_Array = Array[1..4000] of Byte; { Decompression stack. }
- Stack_Ptr = ^Stack_Array;
-
- Word_Array = Array[0..Table_Size] OF Integer;
- Word_Ptr = ^Word_Array;
- Char_Array = Array[0..Table_Size] OF BYTE;
- Char_Ptr = ^Char_Array;
-
-
-
-
- VAR
- InBuf,OutBuf : Buffer_Ptr;
- Code_Value,Prefix_Code : Word_Ptr;
- Append_Character : Char_Ptr;
-
- Stack_Position : Word;
- Stringy : Stack_Ptr;
- NumRead : Word;
- BitsC : Word;
-
-
-
-
- Function Input_Code(VAR InFile : File; BitsC : WORD; Resetf : BOOLEAN): WORD;
-
- { This function feeds data to the decompression routine. }
-
-
- CONST
- Input_Bit_Count : Integer = 0;
- Input_Bit_Buffer : Longint = 0;
- IBuffer_Count : Integer = SUCC(Buffer_Size);
-
- VAR
- Return_Value : Word;
- Temp : LongInt;
- Numread : WORD;
-
- BEGIN
- IF Resetf THEN { Reset everything to initial values. }
- BEGIN
- Input_Bit_Count := 0;
- Input_Bit_Buffer := 0;
- IBuffer_Count := SUCC(Buffer_Size);
- END;
- While Input_Bit_Count < 25 DO { Input_Bit_Count <= 24 }
- BEGIN
- IF IBuffer_Count < SUCC(Buffer_Size) THEN
- BEGIN
- Temp := InBuf^[IBuffer_Count];
- INC(IBuffer_Count);
- END
- ELSE
- BEGIN
- BlockRead(InFile,InBuf^,Buffer_Size,NumRead);
- Temp := InBuf^[1];
- IBuffer_Count := 2;
- END;
-
- Input_Bit_Buffer := Input_Bit_Buffer OR (Temp SHL (24-Input_Bit_Count));
- INC(Input_Bit_Count,8);
- END;
- Return_Value := Input_Bit_Buffer SHR (32-BitsC);
- Input_Bit_Buffer := Input_Bit_Buffer SHL BitsC;
- DEC(Input_Bit_Count,BitsC);
- Input_Code := Return_Value;
- END; { end of the compressed data. }
-
-
-
-
-
- Procedure Output_Code(VAR OutFile: File; _Code,BitsC : Word; VAR Bytes_Written: Longint);
-
- { This procedure dumps the output of the compression routine to disk. }
-
- CONST
- Output_Bit_Count : Integer = 0;
- Output_Bit_Buffer : Longint = 0;
- OBuffer_Count : Integer = 1;
-
- VAR
- Code : LongInt;
- temp : LongInt;
- A : Byte;
-
-
- BEGIN
- Code := _Code; { Convert form Word to LONGINT. }
- Output_Bit_Buffer := Output_Bit_Buffer OR (Code SHL (32-BitsC-Output_Bit_Count));
- INC(Output_Bit_Count,BitsC);
- WHILE Output_Bit_Count >= 8 DO
- BEGIN
- OutBuf^[OBuffer_Count] := OutPut_Bit_Buffer SHR 24;
- IF (OBuffer_Count <> Buffer_Size) AND (Code <> Max_Value) THEN
- INC(OBuffer_Count)
- ELSE
- BEGIN
- IF _Code <> Max_Value THEN
- BEGIN
- BlockWrite(OutFile,OutBuf^,Buffer_Size,NumRead);
- OBuffer_Count := 1;
- INC(Bytes_Written,NumRead);
- END
- ELSE
- BEGIN (* Flushing out the last few bytes *)
- WHILE Output_Bit_Count > Bits - BitsC DO
- BEGIN
- DEC(Output_Bit_Count,8);
- INC(OBuffer_Count);
- Output_Bit_Buffer := Output_Bit_Buffer SHL 8;
- OutBuf^[OBuffer_Count] := OutPut_Bit_Buffer SHR 24;
- END;
- BlockWrite(OutFile,OutBuf^,PRED(OBuffer_Count),NumRead);
- INC(Bytes_Written,NumRead);
- Output_Bit_Buffer := 0; { Reset for next time. }
- Output_Bit_Count := 8; { Reset for next time. }
- END;
- OBuffer_Count := 1;
- END;
- Output_Bit_Buffer := Output_Bit_Buffer SHL 8;
- DEC(Output_Bit_Count,8);
- END;
- END;
-
-
-
-
-
- Function Compress(VAR OutFile : File; VAR Bytes_Written:LongInt; GetBytes:GetBytesProc): Word;
-
- { This function uses LZW compression to compress the contents of InFile,
- and write them to OutFile, a CRC value for the original value is returned. }
-
- LABEL
- 1;
-
-
- VAR
- NumRead,String_Code,Next_Code : WORD;
- I,J : INTEGER;
- Character,Temp: Byte;
- IBuffer_Count : WORD;
- X : Longint;
- Index,Offset : Integer;
- CRCVal : WORD;
- NotPacked : BOOLEAN;
-
-
- BEGIN
- New(InBuf); { Create all the structures that will }
- New(OutBuf); { be needed to compress the data. }
- New(Code_Value);
- New(Prefix_Code);
- New(Append_Character);
-
-
- NotPacked := TRUE;
- Bytes_Written := 0;
- BitsC := 9; { Starting size of library. }
- CRCVal := 0; { Initialize the CRC value. }
- Next_Code := 256;
- FOR I := 0 TO Table_Size DO { Clear the string table before starting. }
- Code_Value^[I] := -1;
- GetBytes(Temp,1,Numread); { Get the first Code. }
- GetBytes(InBuf^,1,Numread);
- IBuffer_Count := Numread; { Set Byte buffer empty. }
- CRCVal := UpdateCRCArc(CRCVal,Temp,1); { Update CRC value. }
- CRCVal := UpdateCRCArc(CRCVal,InBuf^,NumRead); { Update CRC value. }
- String_Code := Temp;
-
- { This is the main loop where it all happens. This loop runs until all
- of the input file has been read. Note that it clears the table
- and restarts once all possible codes have been defined. }
-
- While NotPacked DO
- BEGIN
- IF IBuffer_Count <> Numread THEN
- BEGIN
- Character := InBuf^[IBuffer_Count];
- INC(IBuffer_Count);
- END
- ELSE
- BEGIN
- Character := InBuf^[IBuffer_Count];
- GetBytes(InBuf^,Buffer_Size,Numread);
- CRCVal := UpdateCRCArc(CRCVal,InBuf^,NumRead); { Update CRC value. }
- IBuffer_Count := 1;
- If Numread = 0 THEN NotPacked := FALSE; { If there is no more data then stop.}
- END;
-
- { This is the hashing code routine. It tries to find a match for prefix+char
- string in the string table. If it finds it, the index is returned. IF
- the string is not found, the first available index in the string table is
- returned instead.
- }
- Index := (Character SHL Hashing_Shift) XOR String_Code;
- IF Index = 0 THEN Offset := 1
- ELSE Offset := Table_Size - Index;
- WHILE TRUE DO
- BEGIN
- IF Code_Value^[Index] = -1 THEN
- Goto 1;
- IF (Prefix_Code^[Index] = String_Code) AND
- (Append_Character^[Index] = Character) THEN
- Goto 1;
- DEC(Index,Offset);
- IF Index < 0 THEN INC(Index,Table_Size);
- END;
-
- { See if it's already in }
- 1: IF Code_Value^[Index] <> -1 THEN { the table. If it is, }
- String_Code := Code_Value^[Index] { get the code value. If }
- ELSE { the string is not in }
- BEGIN { table try to add it. }
- IF Next_Code < Max_Code THEN { Actually this IF is redundant, will NEVER be false. }
- BEGIN
- Code_Value^[Index] := Next_Code;
- INC(Next_Code);
- Prefix_Code^[Index] := String_Code;
- Append_Character^[Index] := Character;
- Output_Code(OutFile,String_Code,BitsC,Bytes_Written); { When a string is found }
- IF (Next_Code DIV (1 SHL BitsC)) = 1 THEN
- INC(BitsC); { Sliding window. }
- String_Code := Character; { that is not in the table }
- IF Next_Code = Max_Code THEN { Table is full. }
- BEGIN
- BitsC := 9; { Reset the sliding dictionary. }
- Next_Code := 256;
- FOR J := 0 TO Table_Size DO { Clear the string table before }
- Code_Value^[J] := -1; { starting to fill it again. }
- END;
- END;
-
- END; { I output the last string }
- END; { after adding the new one. }
-
- { End of the main loop }
-
- Output_Code(OutFile,String_Code,BitsC,Bytes_Written); { Output the last code. }
- BitsC := Bits;
- Output_Code(OutFile,Max_Value,BitsC,Bytes_Written); { Output the end of buffer code. }
-
- Dispose(OutBuf); { Deallocate all our structures. }
- Dispose(InBuf);
- Dispose(Code_Value);
- Dispose(Prefix_Code);
- Dispose(Append_Character);
-
- Compress := CRCVal;
- END;
-
-
-
-
-
-
-
-
- Function Decompress(VAR InFile : File; UsedBits: Word; NoBytes : LongInt; PutBytes: PutBytesProc): Word;
-
- { This is the decompression routine. It takes a LZW format file, and
- expands it to an output file. The code here should be a fairly close
- match to the algorithm above.
- }
-
-
-
- VAR
- Next_Code,Code,New_Code,Old_Code : WORD;
- I,Character : Integer;
- Temp : Byte;
- OBuffer_Count : WORD;
- CRCVal : WORD;
- CarryBits : Word;
- BytesDeCompressed : LongInt;
-
-
-
-
- BEGIN
- If UsedBits <> Bits THEN { If compressed with different # bits }
- BEGIN { abort now. }
- Decompress := 0;
- Exit;
- END;
-
-
- New(InBuf); { Create all the structure which }
- New(OutBuf); { will allow me to decompress }
- New(Stringy); { the data. }
- New(Code_Value);
- New(Prefix_Code);
- New(Append_Character);
-
- BytesDeCompressed := 0;
- CarryBits := 0;
- BitsC := 9; { Initialize a few variables. }
- CRCVal := 0;
- OBuffer_Count := 1;
- Stack_Position := 1;
- Next_Code := 256; { This is the next available code to define. }
- Old_Code := Input_Code(InFIle,BitsC,TRUE); { Read in the first code, initialize the }
- BytesDeCompressed := BytesDeCompressed + (CarryBits + BitsC) DIV 8;
- CarryBits := (CarryBits + BitsC) MOD 8;
- Character := Old_Code; { character variable, and send the first }
- Temp := Old_Code;
- PutBytes(Temp,1,NumRead); { code to the output file. }
- CRCVal := UpdateCRCArc(CRCVal,Temp,1); { Update CRC value. }
-
-
- { This is the main decompression loop. It read characters from the LZW file
- until it sees the special code used to indicate the end of the data.
- }
-
- New_Code := Input_Code(InFile,BitsC,FALSE);
- BytesDeCompressed := BytesDeCompressed + (CarryBits + BitsC) DIV 8;
- CarryBits := (CarryBits + BitsC) MOD 8;
- While (New_Code <> Terminator[BitsC]) OR (BytesDeCompressed < NoBytes -2) DO
- BEGIN
-
- { This code checks for special STRING+CHARACTER+STRING+CHARACTER+STRING
- case which generates an undefined code. It handles it by decoding
- the last code, adding a single character to the end of the decode string.
- }
- IF New_Code = Next_Code THEN
- BEGIN
- Stringy^[Stack_Position] := Character;
- INC(Stack_Position);
-
- { This routine simply decodes a string from the string table, storing
- it in a buffer. The buffer can then be output in reverse order by the
- expansion routine (below).
- }
- Code := Old_Code;
- While Code > 255 DO
- BEGIN
- Stringy^[Stack_Position] := Append_Character^[Code];
- INC(Stack_Position);
- Code := Prefix_Code^[Code];
- IF Stack_Position >= 4000 THEN
- BEGIN
- Writeln('Fatal Error during code decompression.');
- Halt;
- END;
- END;
- Stringy^[Stack_Position] := Code;
- END
- { Otherwise do a straight decode of the new code. }
- ELSE
- BEGIN
- Code := New_Code;
- While Code > 255 DO
- BEGIN
- Stringy^[Stack_Position] := Append_Character^[Code];
- INC(Stack_Position);
- Code := Prefix_Code^[Code];
- IF Stack_Position >= 4000 THEN
- BEGIN
- Writeln('Fatal Error during code decompression.');
- Halt;
- END;
- END;
- Stringy^[Stack_Position] := Code;
- END;
-
- { Now output the decoded string in reverse order. }
- Character := Stringy^[Stack_Position];
- While (Stack_Position >= 1) DO
- BEGIN
- IF (OBuffer_Count <> Buffer_Size) THEN
- BEGIN
- OutBuf^[OBuffer_Count] := Stringy^[Stack_Position];
- INC(OBuffer_Count);
- END
- ELSE
- BEGIN
- OutBuf^[Buffer_Size] := Stringy^[Stack_Position];
- PutBytes(OutBuf^,Buffer_Size,NumRead);
- CRCVal := UpdateCRCArc(CRCVal,OutBuf^,NumRead); { Update CRC value. }
- OBuffer_Count := 1;
- END;
-
- DEC(Stack_Position);
- END;
- INC(Stack_Position);
-
- { Finally, if possible add a new code to the string table. }
- IF Next_Code < Max_Code-1 THEN
- BEGIN
- Prefix_Code^[Next_Code] := Old_Code;
- Append_Character^[Next_Code] := Character;
- INC(Next_Code);
- IF (SUCC(Next_Code) DIV (1 SHL BitsC)) = 1 THEN
- IF BitsC < Bits THEN INC(BitsC);
- END;
- IF Next_Code = Max_Code-1 THEN { Table is now full. }
- BEGIN
- BitsC := 9; { Reset sliding dictionary. }
- Stack_Position := 1;
- Next_Code := 256; { This is the next available code to define. }
- New_Code := Input_Code(InFile,BitsC,FALSE);
- BytesDeCompressed := BytesDeCompressed + (CarryBits + BitsC) DIV 8;
- CarryBits := (CarryBits + BitsC) MOD 8;
- Character := New_Code; { Reinitialize the decode process. }
- IF (OBuffer_Count <> Buffer_Size) THEN
- BEGIN
- OutBuf^[OBuffer_Count] := New_Code;
- INC(OBuffer_Count);
- END
- ELSE
- BEGIN
- OutBuf^[Buffer_Size] := New_Code;
- CRCVal := UpdateCRCArc(CRCVal,OutBuf^,Buffer_Size); { Update CRC value. }
- PutBytes(OutBuf^,Buffer_Size,NumRead);
- OBuffer_Count := 1;
- END;
- END;
- Old_Code := New_Code;
- New_Code := Input_Code(Infile,BitsC,FALSE);
- BytesDeCompressed := BytesDeCompressed + (CarryBits + BitsC) DIV 8;
- CarryBits := (CarryBits + BitsC) MOD 8;
- END;
- CRCVal := UpdateCRCArc(CRCVal,OutBuf^,PRED(OBuffer_Count)); { Update CRC value. }
- PutBytes(OutBuf^,PRED(OBuffer_Count),NumRead);
-
- Dispose(OutBuf); { Deallocate all our structures. }
- Dispose(InBuf);
- Dispose(Stringy);
- Dispose(Code_Value);
- Dispose(Prefix_Code);
- Dispose(Append_Character);
- Decompress := CRCVal;
- END;
-
-
-
-
- END.