NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / compress / 3865 < prev next >

Wrap

Text File | 1992-11-17 | 2.6 KB | 57 lines

Newsgroups: comp.compression Path: sparky!uunet!caen!saimiri.primate.wisc.edu!usenet.coe.montana.edu!bithead From: bithead@cs.montana.edu (Bob Wall) Subject: Re: Need a compressor for sparse bit datastream Message-ID: <1992Nov16.232015.15970@coe.montana.edu> Keywords: compression technique Sender: usenet@coe.montana.edu (USENET News System) Organization: Dept. of Computer Science, MSU, Bozeman Mt 59717 References: <1992Nov13.120505.29654@spectrum.xerox.com> Date: Mon, 16 Nov 1992 23:20:15 GMT Lines: 44 In article <1992Nov13.120505.29654@spectrum.xerox.com> landells.sbd-e@rx.xerox.com writes: >I have an application that generates binary output. The output is relatively random, but there are approximately twice as many off bits as on bits. My objective is to compress this as much as possible. > >I have tried several 'standard' compressors, arj 2.2, lharc, pkzip 1.1, and have only managed to achieve very minimal compression in the order of 4% at best (on a 40K file). Now I know that a truly random binary datastream cannot be compressed, but I was kind of hoping for better than 4%. Am I missing something fundamental, or is this really the best that can be achieved? > >If there is a technique to compress this type of data, I would appreciate some pointers to some source code that implements it. > > >Richard Landells (landells.sbd-e@rx.xerox.com) >Rank Xerox System Centre You might try a variation of arithmetic coding referred to as the BAC (Binary Arithmetic Code). It is described in the paper Glen G. Langdon, Jr. and Jorma Rissanenm, "A Simple General Binary Source Code," _IEEE Transactions on Information Theory_, Vol. IT-28, No. 5, Sept. 1982, pp. 800-803. I implemented this once, and I remember that it did OK on binary data streams where the frequency of one bit was substantially higher than the frequency of the other bit (and I would classify 2 to 1 as substantially higher) but I don't remember just what compression ratio OK would be. It's a pretty simple algorithm, although the article is written in math, instead of English B^(. A more straightforward explanation (in near-English) is given in Glen G. Langdon, Jr., "An Introduction to Arithmetic Coding," _IBM Journal of Research and Development_, Vol. 28, No. 2, March 1984, pp. 135-149. This article also has some other references that might help, although I haven't tracked them down. Hope this helps, Bob -- ============================================================================= bithead@fubar.cs.montana.edu (Bob Wall, CS grad student) "Software means never having to say you're finished." --J. D. Hildebrand in UNIX REVIEW