home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.compression
- Path: sparky!uunet!cs.utexas.edu!swrinde!sdd.hp.com!think.com!spool.mu.edu!darwin.sura.net!Sirius.dfn.de!Urmel.Informatik.RWTH-Aachen.DE!tabaqui!dak
- From: dak@tabaqui.informatik.rwth-aachen.de (David Kastrup)
- Subject: Re: Q: How well do random files crunch? A: Not very.
- Message-ID: <dak.722386743@tabaqui>
- Sender: news@Urmel.Informatik.RWTH-Aachen.DE (Newsfiles Owner)
- Nntp-Posting-Host: tabaqui
- Organization: Rechnerbetrieb Informatik / RWTH Aachen
- References: <1992Nov13.135437.19044@decuac.dec.com> <83939@ut-emx.uucp>
- Date: 21 Nov 92 22:59:03 GMT
- Lines: 35
-
- dougmc@ccwf.cc.utexas.edu (Doug McLaren) writes:
-
- >In article <1992Nov13.135437.19044@decuac.dec.com> bell@ufp.enet.dec.com () writes:
-
- >>Ok, let's talk about random files. Say I produce a file filled with
- >>random bytes, based on a fairly equal-distribution function.
-
- >Well, the way I understand it, if your file is completely random, you
- >will not be able to compress it at all.
-
- >Basically, entropy must always increase. A random file already has the maximum
- >entropy for a file of that length, so you can't compress it. (compression
- >increases entropy while decreasing (or increasing -- sometimes a compression
- >scheme will increase the size of the file, but that's irrelavent here) the
- >size.
-
- >Now, if your file isn't completely random, then it's certainly possible
- >that it could be compressed
-
- >Of course, it's possible to have a 100,000 byte file thats completely
- >'0's come up randomly, but the odds against this are so astronomical ...
- >(the infinite number of monkeys idea is nice, but has nothing to do with the
- >real world. One million monkeys typing 100 wps for the total age of the
- >universe aren't likely to come up with even ONE work of Shakespear.)
-
- But iff such a file cropped up, it would compress better. The thing to note
- is, you cannot decrease the MEAN size. As soon as some of the n-byte
- patterns a n-byte file may assume 256^{8n} has different probabilities,
- a compression might increase entropy, decreasing mean size.
-
- Now you could just add, say, one bit to tell whether you tried compression
- or not on the remainder of the file, and compress only where you might gain.
- Information theory tells you that having random files, ANY compression
- will not provide a sufficient gain even that often that the prolongation
- by one bit would be compensated in the long run.
-