Liren Large Software Subsidy 7

home *** CD-ROM | disk | FTP | other *** search

/ Liren Large Software Subsidy 7 / 07.iso / c / c019 / 5.ddi / LZW.ZIP / DOUGPACK.PAS < prev next >

Wrap

Pascal/Delphi Source File | 1993-05-19 | 23.8 KB | 659 lines

unit DOUGPACK; {----------------------------------------------------------------------} {------ Turbo Pascal DOUGPACK unit written by Douglas Webb --------} {----------------------------------------------------------------------} {------ DISCLAIMER: There shall be no guarantee of the --------} {------ suitability of this software for any purpose. The --------} {------ author shall not be liable for any damages arrising --------} {------ from the use of this software. --------} {----------------------------------------------------------------------} { This unit was written to demonstrate how LZW compression can be used to compress files. It's ability to do so depends highly on the type of data being compressed. Text files may compress to 30-50% their original size, .EXE files to 60-80% of their original size, database files to 20-40% of their original size, and unpatterned data may actually increase in size. This incarnation of the algorithm is optimized for speed, as much as is possible in a high level language like pascal, and to a lesser degree at this experimental stage, flexibility, not readability. CRC assembly language routines were furnished by: Edwin T. Floyd [76067,747] This unit allows the user to compress data using a variation on the standard LZW compression format, or conversely to decompress data that was previously compressed by this unit. This unit makes a few assumptions: 1) Data being compressed is being sent to a file. 2) Data being decompressed is coming from a file. There are however a number of options as to where the compressed data is coming from, and the decompressed data is going. In fact it requires that you pass the "Compress" procedure a procedural parameter of type 'GetBytesProc' (declared below) which will accept 3 parameters and act in every way like a 'BlockRead' procedure call. Compress will ask for data in chunks of 4K or so at a time. Your procedure should return the data to be compressed: GetBytesProc = PROCEDURE(VAR DTA; NBytes:WORD; VAR Bytes_Got : WORD); DTA is the start of a memory location where the information returned should be. NBytes is the number of bytes requested. The actual number of bytes returned must be passed in Bytes_Got (if there is no more data then 0 should be returned). "Decompress" requires a procedural parameter of type 'PutBytesProc' which will accept 3 parameters, and must act in every way like a 'BlockWrite' procedure call. It must accept the decompressed data and do something with it. GetBytesProc = PROCEDURE(VAR DTA; NBytes:WORD; VAR Bytes_Got : WORD); Don't forget that as procedural parameters they must be compiled in the 'F+' state to avoid a catastrophe. Unpleasant NOTE: My provisions for maintaining a CRC for the compressed file seem to get into trouble if you try to compress/decompress multiple runs of data successively. You'll get a warning that the CRC is bad when in fact results appear to indicate that this is not so. So you may have to ignore the CRC unless you can figure out how it's broken. } interface uses crt,Dos,CRC; {#T The_LZW_Algorithm } { The compression algorithm : STRING = get input character WHILE there are still input characters DO CHARACTER = get input chracter IF STRING+CHARACTER is in the string table THEN STRING = STRING + character ELSE output the code for string add STRING+CHARACTER to the string table STRING = CHARACTER END of IF END of WHILE output the code for string The decompression algorithm: Read OLD_CODE output OLD_CODE WHILE there are still input characters DO Read New_CODE IF NEW_CODE is not in the translation table THEN STRING = get translation of OLD_CODE STRING = STRING+CHARACTER ELSE STRING = get translation of NEW_CODE END of IF output STRING CHARACTER = first character in STRING add OLD_CODE+CHARACTER to the translation table OLD_CODE = NEW_CODE END WHILE Wrinkles added to improve compression: 1: Sliding dictionary size, Always start with a 9 bit table, then when it's full increase the table size to 10 bits, and so on until the dictionary is as big as you intend to support (say 12-14 bits). 2: Empty the library after it fills up and start again. This is useful in files where the repetative elements in them may change positionally, such as in picture files of one sort or another, and many .EXE files. In some cases this may actually cost you some compression, but not often and not very much in any event. Even smarter (but not implemented) would be to monitor compression and clear/partial clear if compression appears to be dropping. } CONST Bits = 12; { This constant reflects the number of bits used to generate the dictionary to compress the data. Data must be decmpressed using the same dictionary size as was used when it was compressed. Setting the number of bits to 12, 13 or 14 affects several constants. Larger Files tend to compress better with a larger dictionaries. Memory used by this unit is a function of Bits & and the associated constant Table_Size: 14 : 110K; 13 : 65K; 12 : 45K All but about 5-10K of this is heap and is not needed (or in use) when the unit is not actually compressing or decompressing data. If you change this value, change the value of Table_Size appropriately. } TYPE PutBytesProc = PROCEDURE(VAR DTA; NBytes:WORD; VAR Bytes_Put : WORD); {#X GetBytesProc} { "Decompress" requires a procedural parameter of type 'PutBytesProc' which will accept 3 parameters, and must act in every way like a 'BlockWrite' procedure call. It must accept the decompressed data and do something with it (like save it to a file). Don't forget that as procedural parameters they must be compiled in the 'F+' state to avoid a catastrophe. } GetBytesProc = PROCEDURE(VAR DTA; NBytes:WORD; VAR Bytes_Got : WORD); {#X PutBytesProc} { The "Compress" procedure, requires that it be passed a procedural parameter of type 'GetBytesProc' which will accept 3 parameters and act in every way like a 'BlockRead' procedure call. Compress will ask for data in chunks of 4K or so at a time. Your procedure should return the data to be compressed. DTA is the start of a memory location where the information returned should be. NBytes is the number of bytes requested. The actual number of bytes returned must be passed in Bytes_Got (if there is no more data then 0 should be returned). Don't forget that as procedural parameters they must be compiled in the 'F+' state to avoid a catastrophe. } Function Compress(VAR OutFile : File; VAR Bytes_Written:LongInt; GetBytes:GetBytesProc): Word; {#X Decompress The_LZW_Algorithm} { This function uses LZW compression to compress the contents of InFile, and write them to OutFile, a CRC value for the original value is returned. The size of the compressed output is returned in 'Bytes_Written'. } Function Decompress(VAR InFile : File; UsedBits: Word; NoBytes : Longint; PutBytes: PutBytesProc): Word; {#X Compress The_LZW_Algorithm} { This is the decompression routine. It takes a LZW format file, and expands it to an output file. The code here should be a fairly close match to the algorithm above. Usedbits - How many bits was the dictionary used during compression (this is to make sure decompression is the same.) NoBytes - How many bytes are being decompressed } implementation {$R-} { Error checking slows things down by 200% } CONST Hashing_Shift = Bits - 8; Max_Value = PRED((1 SHL Bits)); { Code indicating end of data. } Max_Code = PRED(Max_Value); { The maximum amount of table entries allowed. } Buffer_Size = 4096; { Buffer for file I/O } Terminator : Array[10..14] OF WORD = (1023,2047,4095,8191,16383); { IF Bits = 14 then define table size as 18041 } { The string table size } { IF Bits = 13 then define table size as 9029 } { must be a prime number } { IF Bits = 12 then define table size as 5021 } { about 25% larger than } Table_Size = 5021; { 2^Bits. } TYPE Buffer_Type = Array[1..Buffer_Size] of Byte; { I/O buffers. } Buffer_Ptr = ^Buffer_Type; Stack_Array = Array[1..4000] of Byte; { Decompression stack. } Stack_Ptr = ^Stack_Array; Word_Array = Array[0..Table_Size] OF Integer; Word_Ptr = ^Word_Array; Char_Array = Array[0..Table_Size] OF BYTE; Char_Ptr = ^Char_Array; VAR InBuf,OutBuf : Buffer_Ptr; Code_Value,Prefix_Code : Word_Ptr; Append_Character : Char_Ptr; Stack_Position : Word; Stringy : Stack_Ptr; NumRead : Word; BitsC : Word; Function Input_Code(VAR InFile : File; BitsC : WORD; Resetf : BOOLEAN): WORD; { This function feeds data to the decompression routine. } CONST Input_Bit_Count : Integer = 0; Input_Bit_Buffer : Longint = 0; IBuffer_Count : Integer = SUCC(Buffer_Size); VAR Return_Value : Word; Temp : LongInt; Numread : WORD; BEGIN IF Resetf THEN { Reset everything to initial values. } BEGIN Input_Bit_Count := 0; Input_Bit_Buffer := 0; IBuffer_Count := SUCC(Buffer_Size); END; While Input_Bit_Count < 25 DO { Input_Bit_Count <= 24 } BEGIN IF IBuffer_Count < SUCC(Buffer_Size) THEN BEGIN Temp := InBuf^[IBuffer_Count]; INC(IBuffer_Count); END ELSE BEGIN BlockRead(InFile,InBuf^,Buffer_Size,NumRead); Temp := InBuf^[1]; IBuffer_Count := 2; END; Input_Bit_Buffer := Input_Bit_Buffer OR (Temp SHL (24-Input_Bit_Count)); INC(Input_Bit_Count,8); END; Return_Value := Input_Bit_Buffer SHR (32-BitsC); Input_Bit_Buffer := Input_Bit_Buffer SHL BitsC; DEC(Input_Bit_Count,BitsC); Input_Code := Return_Value; END; { end of the compressed data. } Procedure Output_Code(VAR OutFile: File; _Code,BitsC : Word; VAR Bytes_Written: Longint); { This procedure dumps the output of the compression routine to disk. } CONST Output_Bit_Count : Integer = 0; Output_Bit_Buffer : Longint = 0; OBuffer_Count : Integer = 1; VAR Code : LongInt; temp : LongInt; A : Byte; BEGIN Code := _Code; { Convert form Word to LONGINT. } Output_Bit_Buffer := Output_Bit_Buffer OR (Code SHL (32-BitsC-Output_Bit_Count)); INC(Output_Bit_Count,BitsC); WHILE Output_Bit_Count >= 8 DO BEGIN OutBuf^[OBuffer_Count] := OutPut_Bit_Buffer SHR 24; IF (OBuffer_Count <> Buffer_Size) AND (Code <> Max_Value) THEN INC(OBuffer_Count) ELSE BEGIN IF _Code <> Max_Value THEN BEGIN BlockWrite(OutFile,OutBuf^,Buffer_Size,NumRead); OBuffer_Count := 1; INC(Bytes_Written,NumRead); END ELSE BEGIN (* Flushing out the last few bytes *) WHILE Output_Bit_Count > Bits - BitsC DO BEGIN DEC(Output_Bit_Count,8); INC(OBuffer_Count); Output_Bit_Buffer := Output_Bit_Buffer SHL 8; OutBuf^[OBuffer_Count] := OutPut_Bit_Buffer SHR 24; END; BlockWrite(OutFile,OutBuf^,PRED(OBuffer_Count),NumRead); INC(Bytes_Written,NumRead); Output_Bit_Buffer := 0; { Reset for next time. } Output_Bit_Count := 8; { Reset for next time. } END; OBuffer_Count := 1; END; Output_Bit_Buffer := Output_Bit_Buffer SHL 8; DEC(Output_Bit_Count,8); END; END; Function Compress(VAR OutFile : File; VAR Bytes_Written:LongInt; GetBytes:GetBytesProc): Word; { This function uses LZW compression to compress the contents of InFile, and write them to OutFile, a CRC value for the original value is returned. } LABEL 1; VAR NumRead,String_Code,Next_Code : WORD; I,J : INTEGER; Character,Temp: Byte; IBuffer_Count : WORD; X : Longint; Index,Offset : Integer; CRCVal : WORD; NotPacked : BOOLEAN; BEGIN New(InBuf); { Create all the structures that will } New(OutBuf); { be needed to compress the data. } New(Code_Value); New(Prefix_Code); New(Append_Character); NotPacked := TRUE; Bytes_Written := 0; BitsC := 9; { Starting size of library. } CRCVal := 0; { Initialize the CRC value. } Next_Code := 256; FOR I := 0 TO Table_Size DO { Clear the string table before starting. } Code_Value^[I] := -1; GetBytes(Temp,1,Numread); { Get the first Code. } GetBytes(InBuf^,1,Numread); IBuffer_Count := Numread; { Set Byte buffer empty. } CRCVal := UpdateCRCArc(CRCVal,Temp,1); { Update CRC value. } CRCVal := UpdateCRCArc(CRCVal,InBuf^,NumRead); { Update CRC value. } String_Code := Temp; { This is the main loop where it all happens. This loop runs until all of the input file has been read. Note that it clears the table and restarts once all possible codes have been defined. } While NotPacked DO BEGIN IF IBuffer_Count <> Numread THEN BEGIN Character := InBuf^[IBuffer_Count]; INC(IBuffer_Count); END ELSE BEGIN Character := InBuf^[IBuffer_Count]; GetBytes(InBuf^,Buffer_Size,Numread); CRCVal := UpdateCRCArc(CRCVal,InBuf^,NumRead); { Update CRC value. } IBuffer_Count := 1; If Numread = 0 THEN NotPacked := FALSE; { If there is no more data then stop.} END; { This is the hashing code routine. It tries to find a match for prefix+char string in the string table. If it finds it, the index is returned. IF the string is not found, the first available index in the string table is returned instead. } Index := (Character SHL Hashing_Shift) XOR String_Code; IF Index = 0 THEN Offset := 1 ELSE Offset := Table_Size - Index; WHILE TRUE DO BEGIN IF Code_Value^[Index] = -1 THEN Goto 1; IF (Prefix_Code^[Index] = String_Code) AND (Append_Character^[Index] = Character) THEN Goto 1; DEC(Index,Offset); IF Index < 0 THEN INC(Index,Table_Size); END; { See if it's already in } 1: IF Code_Value^[Index] <> -1 THEN { the table. If it is, } String_Code := Code_Value^[Index] { get the code value. If } ELSE { the string is not in } BEGIN { table try to add it. } IF Next_Code < Max_Code THEN { Actually this IF is redundant, will NEVER be false. } BEGIN Code_Value^[Index] := Next_Code; INC(Next_Code); Prefix_Code^[Index] := String_Code; Append_Character^[Index] := Character; Output_Code(OutFile,String_Code,BitsC,Bytes_Written); { When a string is found } IF (Next_Code DIV (1 SHL BitsC)) = 1 THEN INC(BitsC); { Sliding window. } String_Code := Character; { that is not in the table } IF Next_Code = Max_Code THEN { Table is full. } BEGIN BitsC := 9; { Reset the sliding dictionary. } Next_Code := 256; FOR J := 0 TO Table_Size DO { Clear the string table before } Code_Value^[J] := -1; { starting to fill it again. } END; END; END; { I output the last string } END; { after adding the new one. } { End of the main loop } Output_Code(OutFile,String_Code,BitsC,Bytes_Written); { Output the last code. } BitsC := Bits; Output_Code(OutFile,Max_Value,BitsC,Bytes_Written); { Output the end of buffer code. } Dispose(OutBuf); { Deallocate all our structures. } Dispose(InBuf); Dispose(Code_Value); Dispose(Prefix_Code); Dispose(Append_Character); Compress := CRCVal; END; Function Decompress(VAR InFile : File; UsedBits: Word; NoBytes : LongInt; PutBytes: PutBytesProc): Word; { This is the decompression routine. It takes a LZW format file, and expands it to an output file. The code here should be a fairly close match to the algorithm above. } VAR Next_Code,Code,New_Code,Old_Code : WORD; I,Character : Integer; Temp : Byte; OBuffer_Count : WORD; CRCVal : WORD; CarryBits : Word; BytesDeCompressed : LongInt; BEGIN If UsedBits <> Bits THEN { If compressed with different # bits } BEGIN { abort now. } Decompress := 0; Exit; END; New(InBuf); { Create all the structure which } New(OutBuf); { will allow me to decompress } New(Stringy); { the data. } New(Code_Value); New(Prefix_Code); New(Append_Character); BytesDeCompressed := 0; CarryBits := 0; BitsC := 9; { Initialize a few variables. } CRCVal := 0; OBuffer_Count := 1; Stack_Position := 1; Next_Code := 256; { This is the next available code to define. } Old_Code := Input_Code(InFIle,BitsC,TRUE); { Read in the first code, initialize the } BytesDeCompressed := BytesDeCompressed + (CarryBits + BitsC) DIV 8; CarryBits := (CarryBits + BitsC) MOD 8; Character := Old_Code; { character variable, and send the first } Temp := Old_Code; PutBytes(Temp,1,NumRead); { code to the output file. } CRCVal := UpdateCRCArc(CRCVal,Temp,1); { Update CRC value. } { This is the main decompression loop. It read characters from the LZW file until it sees the special code used to indicate the end of the data. } New_Code := Input_Code(InFile,BitsC,FALSE); BytesDeCompressed := BytesDeCompressed + (CarryBits + BitsC) DIV 8; CarryBits := (CarryBits + BitsC) MOD 8; While (New_Code <> Terminator[BitsC]) OR (BytesDeCompressed < NoBytes -2) DO BEGIN { This code checks for special STRING+CHARACTER+STRING+CHARACTER+STRING case which generates an undefined code. It handles it by decoding the last code, adding a single character to the end of the decode string. } IF New_Code = Next_Code THEN BEGIN Stringy^[Stack_Position] := Character; INC(Stack_Position); { This routine simply decodes a string from the string table, storing it in a buffer. The buffer can then be output in reverse order by the expansion routine (below). } Code := Old_Code; While Code > 255 DO BEGIN Stringy^[Stack_Position] := Append_Character^[Code]; INC(Stack_Position); Code := Prefix_Code^[Code]; IF Stack_Position >= 4000 THEN BEGIN Writeln('Fatal Error during code decompression.'); Halt; END; END; Stringy^[Stack_Position] := Code; END { Otherwise do a straight decode of the new code. } ELSE BEGIN Code := New_Code; While Code > 255 DO BEGIN Stringy^[Stack_Position] := Append_Character^[Code]; INC(Stack_Position); Code := Prefix_Code^[Code]; IF Stack_Position >= 4000 THEN BEGIN Writeln('Fatal Error during code decompression.'); Halt; END; END; Stringy^[Stack_Position] := Code; END; { Now output the decoded string in reverse order. } Character := Stringy^[Stack_Position]; While (Stack_Position >= 1) DO BEGIN IF (OBuffer_Count <> Buffer_Size) THEN BEGIN OutBuf^[OBuffer_Count] := Stringy^[Stack_Position]; INC(OBuffer_Count); END ELSE BEGIN OutBuf^[Buffer_Size] := Stringy^[Stack_Position]; PutBytes(OutBuf^,Buffer_Size,NumRead); CRCVal := UpdateCRCArc(CRCVal,OutBuf^,NumRead); { Update CRC value. } OBuffer_Count := 1; END; DEC(Stack_Position); END; INC(Stack_Position); { Finally, if possible add a new code to the string table. } IF Next_Code < Max_Code-1 THEN BEGIN Prefix_Code^[Next_Code] := Old_Code; Append_Character^[Next_Code] := Character; INC(Next_Code); IF (SUCC(Next_Code) DIV (1 SHL BitsC)) = 1 THEN IF BitsC < Bits THEN INC(BitsC); END; IF Next_Code = Max_Code-1 THEN { Table is now full. } BEGIN BitsC := 9; { Reset sliding dictionary. } Stack_Position := 1; Next_Code := 256; { This is the next available code to define. } New_Code := Input_Code(InFile,BitsC,FALSE); BytesDeCompressed := BytesDeCompressed + (CarryBits + BitsC) DIV 8; CarryBits := (CarryBits + BitsC) MOD 8; Character := New_Code; { Reinitialize the decode process. } IF (OBuffer_Count <> Buffer_Size) THEN BEGIN OutBuf^[OBuffer_Count] := New_Code; INC(OBuffer_Count); END ELSE BEGIN OutBuf^[Buffer_Size] := New_Code; CRCVal := UpdateCRCArc(CRCVal,OutBuf^,Buffer_Size); { Update CRC value. } PutBytes(OutBuf^,Buffer_Size,NumRead); OBuffer_Count := 1; END; END; Old_Code := New_Code; New_Code := Input_Code(Infile,BitsC,FALSE); BytesDeCompressed := BytesDeCompressed + (CarryBits + BitsC) DIV 8; CarryBits := (CarryBits + BitsC) MOD 8; END; CRCVal := UpdateCRCArc(CRCVal,OutBuf^,PRED(OBuffer_Count)); { Update CRC value. } PutBytes(OutBuf^,PRED(OBuffer_Count),NumRead); Dispose(OutBuf); { Deallocate all our structures. } Dispose(InBuf); Dispose(Stringy); Dispose(Code_Value); Dispose(Prefix_Code); Dispose(Append_Character); Decompress := CRCVal; END; END.