home *** CD-ROM | disk | FTP | other *** search
-
- Run Length Encoding compressor program 16 bit header version
-
- Written by Shaun Case 1991 in Borland C++ 2.0
- with sizeof (int) == 2
-
- This program and its source code are Public Domain.
- This program should be portable to any machine with
- 2 byte short ints and 8 bit bytes.
-
-
- What is run length encoding?
-
- Run Length Encoding, also known as RLE, is a method of compressing data
- that has a lot of "runs" of bytes (or bits) in it. A "run" is a series
- of bytes that are all the same. For instance, the string "THIS IS A
- VEEEEEEEEEEEEEEEEEEEEEEEERY INTERESTING SENTENCE" has a run of 23 'E's
- in it. This could be compressed in the following manner:
-
- THIS IS A V23ERY INTERESTING SENTENCE
-
- resulting in a savings of 20 characters. A further savings of one
- character can be realized if the sequence "23" is replaced by a single
- byte with the value 23.
-
- However, if the text to be encoded is arbitrary, then it may contain
- numbers as well as letters, and bytes of all possible values. For this
- reason, there must be some way to let the decoder know when a compressed
- run is encountered, and when a sequence to be passed straight through is
- encountered. For this reason, the following file format was used:
-
-
- ========= tech info =========
-
- 16 bit header version.
-
- File format:
-
- 13 bytes : original filename, followed by:
-
- [ 16 bit header + data ][ 16 bit header + data ][16 bit header + data ]
- etc..
-
- header:
- [lo byte][hi byte] ==> turn into 16 bit int ==>
-
- bit 15 : 1 if following byte is a run
- bit 14 - 0 : length of run (max 32767, min 4)
-
- data: 1 byte : which character run consists of
-
- *** OR ***
-
- header:
- [lo byte][hi byte] ==> turn into 16 bit int ==>
-
- bit 15 : 0 if following bytes are sequence
- bit 14 - 0 : length of sequence (max 32767)
-
- data: (header & 0x7FFF) bytes of data
- : data bytes copied to output stream unchanged
-
- ===============================
-
- bugs:
-
- None known
-
-
- Nasty features :
-
- 1) In 8 bit version, when encoder reaches max run length, it is
- written out correctly, but is followed by a 1 length run of
- the next byte. Odd. Reason unknown. This version
- probably does it too, but I haven't tested it.
-
- 2) Better compression could be achieved by having min
- compression length and sequence length understood to be 4.
- This would allow an "understood" multiplication of the
- seq_len or run_len by 4, since 1 is never (should never be)
- used, allowing sequences of 131068 bytes.
-
- Implementing this requires fixing 1 above, too.
-
- 4) This 16 bit "improvement" is very unlikley to give better
- compresssion than the 8 bit version for data consisting of
- executable code, English or or other natural language text,
- netnews, BBS messages, or just about anything else other
- than large simple graphics images or other types of data
- that consist almost entirely of very long runs of
- characters. In most cases, this program will actually
- INCREASE the size of the data file slightly, due to the
- overhead needed for sequences between runs. It also runs
- about 10% slower than the 8 bit version due to the use of
- shifts and logical operators. Mainly it was an experiment
- that yielded disappointing results. However, here it is,
- for you to use and abuse.
-
- 5) Portability -- this program is "pretty portable." The stuff
- dealing with dos filenames should be removed, but that should
- be about it, unless you want to make the datafiles portable
- too, which is another story.
-
- 6) There is a lack of error checking when bytes are written out
- to the output file. Bytes are written in lots of places,
- and I didn't get around to doing error checking on any of them,
- since my application was going to run only in memory, and
- all the fputc() calls were going to be changed to pokes or
- pointer-directed stuffing. If you are going to use this program
- in a disk based application, you should really put EOF checks
- after each fputc().
-
- Author: atman%ecst.csuchico.edu@RELAY.CS.NET (internet)
- 1@9651 (WWIVnet)
- atman of 1:119/666.0 (fidonet)
-
-
- Tell me hi if you use this program!