In article <1992Nov13.120505.29654@spectrum.xerox.com> landells.sbd-e@rx.xerox.com writes:
>I have an application that generates binary output. The output is relatively random, but there are approximately twice as many off bits as on bits. My objective is to compress this as much as possible.
>
>I have tried several 'standard' compressors, arj 2.2, lharc, pkzip 1.1, and have only managed to achieve very minimal compression in the order of 4% at best (on a 40K file). Now I know that a truly random binary datastream cannot be compressed, but I was kind of hoping for better than 4%. Am I missing something fundamental, or is this really the best that can be achieved?
>
>If there is a technique to compress this type of data, I would appreciate some pointers to some source code that implements it.
>
>
>Richard Landells (landells.sbd-e@rx.xerox.com)
>Rank Xerox System Centre
You might try a variation of arithmetic coding referred to as the BAC
(Binary Arithmetic Code). It is described in the paper
Glen G. Langdon, Jr. and Jorma Rissanenm, "A Simple General Binary Source
Code," _IEEE Transactions on Information Theory_, Vol. IT-28, No. 5,
Sept. 1982, pp. 800-803.
I implemented this once, and I remember that it did OK on binary data
streams where the frequency of one bit was substantially higher than the
frequency of the other bit (and I would classify 2 to 1 as substantially
higher) but I don't remember just what compression ratio OK would be.
It's a pretty simple algorithm, although the article is written in
math, instead of English B^(. A more straightforward explanation (in
near-English) is given in
Glen G. Langdon, Jr., "An Introduction to Arithmetic Coding," _IBM
Journal of Research and Development_, Vol. 28, No. 2, March 1984,
pp. 135-149.
This article also has some other references that might help, although I