In <1992Nov13.120505.29654@spectrum.xerox.com> richard@garfield.noname (richard_landells.sbd-e@rx.xerox.com) writes:
>I have an application that generates binary output. The output is relatively random, but there are approximately twice as many off bits as on bits. My objective is to compress this as much as possible.
>I have tried several 'standard' compressors, arj 2.2, lharc, pkzip 1.1, and have only managed to achieve very minimal compression in the order of 4% at best (on a 40K file). Now I know that a truly random binary datastream cannot be compressed, but I was kind of hoping for better than 4%. Am I missing something fundamental, or is this really the best that can be achieved?
>If there is a technique to compress this type of data, I would appreciate some pointers to some source code that implements it.
If the data is random but has two times more 0's than 1's use arithmetic
compression. This achieves better compression than the 4% you mentioned.
If there is some "logic" in the data (repeating patterns etc) you might
considder e.g. higher order arithmetic compression.
Something which also might work is converting bits to bytes. This makes the
file 8 times larger but allows ARJ and PKZIP to do their job. Both are
byte oriented. The resulting compressed file might become smaller than
the compressed origional one.
>Richard Landells (landells.sbd-e@rx.xerox.com)
>Rank Xerox System Centre
Nico E. de Vries (nevries@cc.ruu.nl) |------------------* AA III PPP
_ This text is supplied AS IS, no warranties of any kind | A A I P P
| apply. No rights can be derived from this text. This | AAAA I PPP
| text is likely to contain spelling and grammar errors. | A A I P
*---------------------------( Donate to GreenPeace! )----* A A III P
"The IBM PC is still waiting for a version of the CP/M OS.", G.M. Vose, 1982.