Loadstar 242

home *** CD-ROM | disk | FTP | other *** search

/ Loadstar 242 / 242.d81 / t.pack about < prev next >

Wrap

Text File | 2004-01-01 | 12.6 KB | 478 lines

u S T A R P A C K E R 1 . 2 by Lee Novak [ALL SECRETS REVEALED] [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}] This article is for anyone who is interested in how this packer works. While the actual code was very difficult to write, the ideas behind it are rather simple. You might think: "How can a file be compressed? How can you construct a file without having all the data?" It doesn't seem possible. The idea I got from LS #135's article on LZS COMPRESSION was that any duplicated data can be packed. What's needed is a way to signal when something is to repeated. You also need to tell the unpacker where the original data is, and how much of it you need to repeat. How you go about this doesn't matter, just as long as it works! The LZS article talks of an unpacker that reads from a stream of bits only. This means that specific information (where repeat data is, etc.) might be split up between two or more bytes. I'm guessing that this method of varying bit-lengths provides the best compression, and maybe that's why the BIT IMPLODER is so good. But it sure sounded difficult. I didn't consider using this method for long. [THE THEORY OF RARES] [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}] I knew this packer was going to use simpler byte-sized pieces of data. Every byte the unpacker would read in could have any value 0-255. At least one value must be reserved to tell the unpacker that compression information is to follow. What value to use? Zero? No way! There are lots of zeroes in almost every file! What about a number that is hardly ever used? Better. But some files would use this number more than others. What was needed was a custom value for each file. How about counting the number of occurrences for each byte? Aha! The least common byte, the RARE byte, is to signal when compression is used. The unpacker would read in the byte following the RARE and know it has special meaning. But there are so many possible ways that information might be packed. How would you tell the unpacker to duplicate five bytes (matches) located 117 bytes back? If two bytes followed the RARE, you COULD represent this information. But then, how would you tell the unpacker to copy 37 bytes located 9155 bytes back? We surely don't want to pass up the savings there! Hmm, maybe we need three bytes to follow the RARE. One byte would represent the number of matches, and the next two tell how far back to get them. So now, we would need FOUR bytes to signify even the shortest packing: one RARE then three bytes. So, what advantage is there in "compressing" a string of four matches? None at all. But there's bound to be lots and lots of four-byte matches in a file! Can't we use ANY of them? Hey, what if we use another RARE? The unpacker could check if a byte is either of these RARES, and follow a different unpacking formula for each. Why not three RARES? Okay! Why not four, or five, or even twenty RARES? Two problems with this. One: the unpacker code grows with each RARE you decide to add, because each RARE represents a compression all its own. And two: all normal, uncompressed bytes are just written to memory as- is. But sometimes, an uncompressed RARE is part of the file. On these occasions, TWO bytes are needed to represent that one byte. A RARE to say "Hey, here comes a rare!" and one byte to say "We want this one!". The second RARE is slightly less "rare" than the first. The third is even less so. Eventually, the bytes may become so common that it costs MORE to represent them as two bytes each than the potential savings using another type of compression may add. [WHAT EACH RARE DOES] [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}] I did a lot of experimentation using different numbers of RARES and ways of packing the information. The most favorable all-purpose method of packing turned out to be one that uses the top eight RARES. Pretend you're an unpacker. You read along, splatting uncompressed bytes down without thought. Then you come across one of these RARE bytes. It's time to take the next byte, or bytes, more seriously. Here's EXACTLY what to do in these situations! RARE1, BYTE1: % 0000 0000 M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 3 matches, 3-258 bytes away Every possible value for BYTE1 is useful. ONE byte is saved for each 3-byte match. There were so many of these that a second RARE was added: RARE2, BYTE1: % 0000 0000 M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 3 matches, 259-514 bytes away Gathering 3-byte matches wasn't profitable after that, so all others are ignored. RARE3, BYTE1: % 0000 0000 M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 4 matches, 4-259 bytes away Each one of these saves TWO bytes. There are still larger matches close to the target string, but they are becoming less common. Time to break down the data byte into bits: RARE4, BYTE1: P{SHIFT-*} 0=5 matches % 0000 0000 1=6 matches M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 5-132 bytes away or 6-133 bytes away RARE5, BYTE1: P{SHIFT-*} 0=7 matches % 0000 0000 1=8 matches M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 7-134 bytes away or 8-135 bytes away These 2-byte compressions save between 3 and 6 bytes apiece. Things are thinning out even more now, but we still want anything that will make a profit. Let's add a data byte: RARE6, BYTE1, BYTE2: % 0000 0000 % 0000 0000 MR{SHIFT-*}{SHIFT--} MR{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} high low {SHIFT--} nybble byte {SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 0-4095 bytes away {SHIFT--} 4-19 matches RARE7, BYTE1, BYTE2: % 0000 0000 % 0000 0000 MR{SHIFT-*}{SHIFT--} MR{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} high low {SHIFT--} nybble byte {SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 4096-8191 bytes away {SHIFT--} 4-19 matches These 3-byte compressions save between 1 and 16 bytes apiece. We still need one more RARE to handle the rest of the possibilities: RARE8, BYTE1 (0): End of compression. RUN program. RARE8, BYTE1 (1-8): Add a specific uncompressed RARE. RARE8, BYTE1 (9), BYTE2: This unusual type of compression is for font data ONLY. If the exact REVERSE of the target string can be found exactly 1K earlier in the file, just like reversed font data, then we go back to copy 4-255 REVERSE bytes as specified in BYTE2. This EOR compression is probably only useful to STAR LINKED files. The cost of the related unpacking code is 34 bytes. This cost is recovered with the first 5 characters that can be packed this way! RARE8, BYTE1 (10-127), BYTE2, BYTE3: %00000000 %00000000 %00000000 {SHIFT--}M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 5-122 high byte low byte {SHIFT--} matches M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 0-65535 bytes away Malways 0 All remaining profitable matches are compressed this way. Since using this method costs 4 bytes each time, at least 5 characters must be packed with it to save at least one byte. RARE8, BYTE1 (128-255), BYTE2: %00000000 %00000000 {SHIFT--}M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 4-131 byte to repeat {SHIFT--} repeats {SHIFT--} Malways 1 This last one is known as RLE, or Run-Length-Encoding. It can be pretty useful for packing fonts, sprites, or anything which may have 4 or more identical bytes in a row. Now you know what does what, you get a better idea of the information on the packing screen. Here are the RARES one last time, with the categories they fall into: RARE1 - RARE5 = "short" RARE6 - RARE7 = "medium" RARE8 with 10-127 = "long" RARE8 with 128-255 = "repeat" RARE8 with 9 = "rvs font" The number of uncompressed RARES in the file isn't shown anywhere. [TOIL AND TROUBLE] [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}] Why is this packer so slow? Actually it's not. You won't BELIEVE how much work is going on internally! The packer begins at the end of a file and works backwards. It searches for the MOST MATCHES as CLOSE TO the target string as possible. The packer begins this search for each 3-byte string a certain distance back from the string. How far back it goes depends on the CRUNCH MODE: Crunch Mode How Far Back {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} 1 1 page (256 bytes) 2 2 pages (512 bytes) 3 4 pages (1K) 4 8 pages (2K) 5 16 pages (4K) 6 32 pages (8K) 7 64 pages (16K) 8 128 pages (32K) 9 255 pages (64K) These distances are strictly used unless it overshoots the beginning of the file. This is why crunch mode "9" is the best setting. It ensures that, no matter how large the file, each target string is hunted for starting right from the beginning of the file. At the same time, STAR PACKER keeps its eyes open for opportunities to use RLE or EOR compression. If the target string cannot be packed, the 3rd byte is deemed "uncompressable" and left alone. If it's a RARE byte, the packer must then move memory to make room for the extra byte needed to signal this. That's why the screen seems to freeze up every now and then. If you watch, you'll see the NEW SIZE bytes increase by one. This is about ALL you'll see when you try to pack an already-packed file! When an uncompressable byte is found, searching begins anew for the next 3-byte target string, with the byte one down from the old string. This process continues until the beginning of the file is reached! If the target string CAN be compressed, the packer finds the best method its RARES will allow to give the best savings. For example, a 6-byte match 185 bytes away could be packed using RARE8 (saving 2 bytes), or RARE6 (saving 3 bytes). RARE4 is just out of range and can't be used. This also applies to RLE and EOR packing. If there's better savings to be had by packing them as a matching sequence that appears somewhere else in the file, that's what will happen. The program spends LOTS of time comparing memory. The main packing loop contains sleek, streamlined code for maximum speed. Every cycle shaved off this critical code represented SECONDS in the real world, because of just how OFTEN this code executes. I had STAR PACKER pack files of 39, 120, 136, 199, and 156 blocks, keeping track of how many comparisons it did during the main packing loop ALONE for crunch modes 1 and 9. This gives you an idea of how much work your little Commodore is doing! -- NUMBER OF COMPARES -- File Name CrunchMode1 CrunchMode9 {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} TEXT CHARTER 1,088,380 17,446,587 MUSEUM 5,123,930 169,539,568 QUICKSMITH 5,872,597 337,251,893 MOUSE MAKER 9,267,042 428,827,274 B.RAPTOR.MOV 10,006,987 773,521,079 Your little Commodore is working its tail off, comparing about 90,000 bytes per second! A SuperCPU does about 2 MILLION compares per second! So please, show a little patience. [FINAL TECHNICAL NOTES] [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}] The file doesn't actually get smaller until after it is all packed. It leaves "defragment" tokens all over the place and, when packing is complete, scans over the file one last time and collects the pieces. An error you probably will never experience is the BUFFER OVERFLOW. For this to happen, the file needs to temporarily expand 500 bytes PAST the 51K size limit. An uncompressale RARE will expand the file only ONE byte. Most files have only 10 to 75 uncompressable RARES. The ones with more are likely to packed, anyway. I had to feed STAR PACKER a bogus file of nearly 51K that I knew it would choke on just to induce this error. The usual trick LOADSTAR employs to check if a file already exists is to try to rename the file as itself. The disk error channel returns either error #63 (file exists) or error #62 (file not found). If the file cannot be found, that name is safe to use! But, what if the user tries to save the file to another PARTITION? He or she might use something like "4:my program" for a filename. By tacking on an "R0:", we accidentally force the drive to find and rename "my program" as "4:my program"! There needs to be a different way to check if the file already exists. STAR PACKER tries to OPEN the file for READing instead! 10 open2,dv,2,name$+",s,r":close2 Any partition prefix will be automatically obeyed. This tactic results in three likely responses from the error channel: Error #62 (file not found) means it's safe to go ahead and SAVE the file as name$. Error #64 (file type mismatch) means the file DOES exist, but not as a SEQuential file. The user will have to select another name. Error #0 (ok) means the file DOES exist, and was SEQuential, and opened properly without incident. The user will have to select another name. Any other error should be treated as such - tell the user immediately. A disk drive that is OFF is another matter altogether! [FENDER'S POSTMUMBLE:] This program is so great I'm mumbleless.