home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!know!mips2!news.bbn.com!noc.near.net!news.Brown.EDU!qt.cs.utexas.edu!cs.utexas.edu!zaphod.mps.ohio-state.edu!uwm.edu!ogicse!das-news.harvard.edu!cantaloupe.srv.cs.cmu.edu!tgl
- From: tgl+@cs.cmu.edu (Tom Lane)
- Newsgroups: comp.compression
- Subject: Re: Need a compressor for sparse bit datastream
- Keywords: compression technique
- Message-ID: <Bxu5t8.D1B.2@cs.cmu.edu>
- Date: 17 Nov 92 01:07:53 GMT
- References: <1992Nov13.120505.29654@spectrum.xerox.com>
- Sender: news@cs.cmu.edu (Usenet News System)
- Organization: School of Computer Science, Carnegie Mellon
- Lines: 36
- Nntp-Posting-Host: g.gp.cs.cmu.edu
-
- richard@garfield.noname (richard_landells.sbd-e@rx.xerox.com) writes:
- > I have an application that generates binary output. The output is
- > relatively random, but there are approximately twice as many off bits as on
- > bits. My objective is to compress this as much as possible.
-
- > I have tried several 'standard' compressors, arj 2.2, lharc, pkzip 1.1, and
- > have only managed to achieve very minimal compression in the order of 4% at
- > best (on a 40K file). Now I know that a truly random binary datastream
- > cannot be compressed, but I was kind of hoping for better than 4%. Am I
- > missing something fundamental, or is this really the best that can be
- > achieved?
-
- Most 'standard' compressors work by looking for repeated byte strings.
- If your source is really random then this won't be very effective since
- you won't have much repetition at the byte level.
-
- However, you should be able to exploit the 2:1 ratio of 0s to 1s to get
- better than 4% compression. To do this will require arithmetic compression.
- Huffman would work if the symbols were bigger than a bit, but since Huffman
- codes can't be less than one bit, you can't get any compression with
- single-bit input symbols. Arithmetic coding wins because it can encode
- source symbols into "fractional bits". Arithmetic compression code is easy
- to come by, see the comp.compression FAQ. (But if you want to use this for
- any commercial purpose, beware of patents.)
-
- But wait... if all we know about your source is that 0's have probability
- .67 and 1's have probability .33, then the entropy is
- -(.67*log2(.67) + .33*log2(.33))
- or 0.92, which means you could theoretically achieve at most 8% compression
- this way. I'd say implementing arithmetic coding on individual bits is
- probably not worth the trouble. Are you sure the 0's and 1's are
- uncorrelated? If not, higher-order arithmetic coding might work.
- The fact that standard compressors can do as well as 4% suggests to me that
- there is more order in the input bitstream than you've let on.
-
- regards, tom lane
-