home *** CD-ROM | disk | FTP | other *** search
- u
- S T A R P A C K E R 1 . 2
- by Lee Novak
-
-
- [ALL SECRETS REVEALED]
- [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}]
-
- This article is for anyone who is
- interested in how this packer works.
- While the actual code was very
- difficult to write, the ideas behind
- it are rather simple.
-
- You might think: "How can a file
- be compressed? How can you construct
- a file without having all the data?"
- It doesn't seem possible.
-
- The idea I got from LS #135's
- article on LZS COMPRESSION was that
- any duplicated data can be packed.
- What's needed is a way to signal when
- something is to repeated. You also
- need to tell the unpacker where the
- original data is, and how much of it
- you need to repeat.
-
- How you go about this doesn't
- matter, just as long as it works! The
- LZS article talks of an unpacker that
- reads from a stream of bits only.
- This means that specific information
- (where repeat data is, etc.) might be
- split up between two or more bytes.
-
- I'm guessing that this method of
- varying bit-lengths provides the best
- compression, and maybe that's why the
- BIT IMPLODER is so good. But it sure
- sounded difficult. I didn't consider
- using this method for long.
-
-
- [THE THEORY OF RARES]
- [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}]
-
- I knew this packer was going to
- use simpler byte-sized pieces of
- data. Every byte the unpacker would
- read in could have any value 0-255.
- At least one value must be reserved
- to tell the unpacker that compression
- information is to follow.
-
- What value to use? Zero? No way!
- There are lots of zeroes in almost
- every file! What about a number that
- is hardly ever used? Better. But some
- files would use this number more than
- others. What was needed was a custom
- value for each file.
-
- How about counting the number of
- occurrences for each byte? Aha! The
- least common byte, the RARE byte, is
- to signal when compression is used.
-
- The unpacker would read in the
- byte following the RARE and know it
- has special meaning. But there are so
- many possible ways that information
- might be packed. How would you tell
- the unpacker to duplicate five bytes
- (matches) located 117 bytes back?
-
- If two bytes followed the RARE,
- you COULD represent this information.
- But then, how would you tell the
- unpacker to copy 37 bytes located
- 9155 bytes back? We surely don't want
- to pass up the savings there! Hmm,
- maybe we need three bytes to follow
- the RARE. One byte would represent
- the number of matches, and the next
- two tell how far back to get them.
-
- So now, we would need FOUR bytes
- to signify even the shortest packing:
- one RARE then three bytes. So, what
- advantage is there in "compressing" a
- string of four matches? None at all.
- But there's bound to be lots and lots
- of four-byte matches in a file! Can't
- we use ANY of them?
-
- Hey, what if we use another RARE?
- The unpacker could check if a byte is
- either of these RARES, and follow a
- different unpacking formula for each.
- Why not three RARES? Okay! Why not
- four, or five, or even twenty RARES?
-
- Two problems with this. One: the
- unpacker code grows with each RARE
- you decide to add, because each RARE
- represents a compression all its own.
-
- And two: all normal, uncompressed
- bytes are just written to memory as-
- is. But sometimes, an uncompressed
- RARE is part of the file. On these
- occasions, TWO bytes are needed to
- represent that one byte. A RARE to
- say "Hey, here comes a rare!" and one
- byte to say "We want this one!".
-
- The second RARE is slightly less
- "rare" than the first. The third is
- even less so. Eventually, the bytes
- may become so common that it costs
- MORE to represent them as two bytes
- each than the potential savings using
- another type of compression may add.
-
-
- [WHAT EACH RARE DOES]
- [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}]
-
- I did a lot of experimentation
- using different numbers of RARES and
- ways of packing the information. The
- most favorable all-purpose method of
- packing turned out to be one that
- uses the top eight RARES.
-
- Pretend you're an unpacker. You
- read along, splatting uncompressed
- bytes down without thought. Then you
- come across one of these RARE bytes.
- It's time to take the next byte, or
- bytes, more seriously. Here's EXACTLY
- what to do in these situations!
-
- RARE1, BYTE1:
-
- % 0000 0000
- M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- 3 matches, 3-258 bytes away
-
- Every possible value for BYTE1 is
- useful. ONE byte is saved for each
- 3-byte match. There were so many of
- these that a second RARE was added:
-
- RARE2, BYTE1:
-
- % 0000 0000
- M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- 3 matches, 259-514 bytes away
-
- Gathering 3-byte matches wasn't
- profitable after that, so all others
- are ignored.
-
- RARE3, BYTE1:
-
- % 0000 0000
- M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- 4 matches, 4-259 bytes away
-
- Each one of these saves TWO
- bytes. There are still larger matches
- close to the target string, but they
- are becoming less common. Time to
- break down the data byte into bits:
-
- RARE4, BYTE1:
- P{SHIFT-*} 0=5 matches
- % 0000 0000 1=6 matches
- M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- 5-132 bytes away
- or 6-133 bytes away
-
- RARE5, BYTE1:
- P{SHIFT-*} 0=7 matches
- % 0000 0000 1=8 matches
- M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- 7-134 bytes away
- or 8-135 bytes away
-
- These 2-byte compressions save
- between 3 and 6 bytes apiece. Things
- are thinning out even more now, but
- we still want anything that will make
- a profit. Let's add a data byte:
-
- RARE6, BYTE1, BYTE2:
-
- % 0000 0000 % 0000 0000
- MR{SHIFT-*}{SHIFT--} MR{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- {SHIFT--} high low
- {SHIFT--} nybble byte
- {SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- {SHIFT--} 0-4095 bytes away
- {SHIFT--}
- 4-19 matches
-
- RARE7, BYTE1, BYTE2:
-
- % 0000 0000 % 0000 0000
- MR{SHIFT-*}{SHIFT--} MR{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- {SHIFT--} high low
- {SHIFT--} nybble byte
- {SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- {SHIFT--} 4096-8191 bytes away
- {SHIFT--}
- 4-19 matches
-
- These 3-byte compressions save
- between 1 and 16 bytes apiece. We
- still need one more RARE to handle
- the rest of the possibilities:
-
- RARE8, BYTE1 (0):
-
- End of compression. RUN program.
-
- RARE8, BYTE1 (1-8):
-
- Add a specific uncompressed RARE.
-
- RARE8, BYTE1 (9), BYTE2:
-
- This unusual type of compression
- is for font data ONLY. If the exact
- REVERSE of the target string can be
- found exactly 1K earlier in the file,
- just like reversed font data, then we
- go back to copy 4-255 REVERSE bytes
- as specified in BYTE2.
-
- This EOR compression is probably
- only useful to STAR LINKED files. The
- cost of the related unpacking code is
- 34 bytes. This cost is recovered with
- the first 5 characters that can be
- packed this way!
-
- RARE8, BYTE1 (10-127), BYTE2, BYTE3:
-
- %00000000 %00000000 %00000000
- {SHIFT--}M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- {SHIFT--} 5-122 high byte low byte
- {SHIFT--} matches M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- {SHIFT--} 0-65535 bytes away
- Malways 0
-
- All remaining profitable matches
- are compressed this way. Since using
- this method costs 4 bytes each time,
- at least 5 characters must be packed
- with it to save at least one byte.
-
- RARE8, BYTE1 (128-255), BYTE2:
-
- %00000000 %00000000
- {SHIFT--}M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
- {SHIFT--} 4-131 byte to repeat
- {SHIFT--} repeats
- {SHIFT--}
- Malways 1
-
- This last one is known as RLE, or
- Run-Length-Encoding. It can be pretty
- useful for packing fonts, sprites, or
- anything which may have 4 or more
- identical bytes in a row.
-
- Now you know what does what, you
- get a better idea of the information
- on the packing screen. Here are the
- RARES one last time, with the
- categories they fall into:
-
- RARE1 - RARE5 = "short"
- RARE6 - RARE7 = "medium"
- RARE8 with 10-127 = "long"
- RARE8 with 128-255 = "repeat"
- RARE8 with 9 = "rvs font"
-
- The number of uncompressed RARES
- in the file isn't shown anywhere.
-
-
- [TOIL AND TROUBLE]
- [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}]
-
- Why is this packer so slow?
- Actually it's not. You won't BELIEVE
- how much work is going on internally!
- The packer begins at the end of a
- file and works backwards. It searches
- for the MOST MATCHES as CLOSE TO the
- target string as possible.
-
- The packer begins this search for
- each 3-byte string a certain distance
- back from the string. How far back it
- goes depends on the CRUNCH MODE:
-
- Crunch Mode How Far Back
- {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}
- 1 1 page (256 bytes)
- 2 2 pages (512 bytes)
- 3 4 pages (1K)
- 4 8 pages (2K)
- 5 16 pages (4K)
- 6 32 pages (8K)
- 7 64 pages (16K)
- 8 128 pages (32K)
- 9 255 pages (64K)
-
- These distances are strictly used
- unless it overshoots the beginning of
- the file. This is why crunch mode "9"
- is the best setting. It ensures that,
- no matter how large the file, each
- target string is hunted for starting
- right from the beginning of the file.
-
- At the same time, STAR PACKER
- keeps its eyes open for opportunities
- to use RLE or EOR compression.
-
- If the target string cannot be
- packed, the 3rd byte is deemed
- "uncompressable" and left alone. If
- it's a RARE byte, the packer must
- then move memory to make room for the
- extra byte needed to signal this.
-
- That's why the screen seems to
- freeze up every now and then. If you
- watch, you'll see the NEW SIZE bytes
- increase by one. This is about ALL
- you'll see when you try to pack an
- already-packed file!
-
- When an uncompressable byte is
- found, searching begins anew for the
- next 3-byte target string, with the
- byte one down from the old string.
- This process continues until the
- beginning of the file is reached!
-
- If the target string CAN be
- compressed, the packer finds the best
- method its RARES will allow to give
- the best savings. For example, a
- 6-byte match 185 bytes away could be
- packed using RARE8 (saving 2 bytes),
- or RARE6 (saving 3 bytes). RARE4 is
- just out of range and can't be used.
-
- This also applies to RLE and EOR
- packing. If there's better savings to
- be had by packing them as a matching
- sequence that appears somewhere else
- in the file, that's what will happen.
-
- The program spends LOTS of time
- comparing memory. The main packing
- loop contains sleek, streamlined code
- for maximum speed. Every cycle shaved
- off this critical code represented
- SECONDS in the real world, because of
- just how OFTEN this code executes.
-
- I had STAR PACKER pack files of
- 39, 120, 136, 199, and 156 blocks,
- keeping track of how many comparisons
- it did during the main packing loop
- ALONE for crunch modes 1 and 9. This
- gives you an idea of how much work
- your little Commodore is doing!
-
- -- NUMBER OF COMPARES --
- File Name CrunchMode1 CrunchMode9
- {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}
- TEXT CHARTER 1,088,380 17,446,587
- MUSEUM 5,123,930 169,539,568
- QUICKSMITH 5,872,597 337,251,893
- MOUSE MAKER 9,267,042 428,827,274
- B.RAPTOR.MOV 10,006,987 773,521,079
-
- Your little Commodore is working
- its tail off, comparing about 90,000
- bytes per second! A SuperCPU does
- about 2 MILLION compares per second!
- So please, show a little patience.
-
-
- [FINAL TECHNICAL NOTES]
- [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}]
-
- The file doesn't actually get
- smaller until after it is all packed.
- It leaves "defragment" tokens all
- over the place and, when packing is
- complete, scans over the file one
- last time and collects the pieces.
-
- An error you probably will never
- experience is the BUFFER OVERFLOW.
- For this to happen, the file needs to
- temporarily expand 500 bytes PAST the
- 51K size limit. An uncompressale RARE
- will expand the file only ONE byte.
-
- Most files have only 10 to 75
- uncompressable RARES. The ones with
- more are likely to packed, anyway. I
- had to feed STAR PACKER a bogus file
- of nearly 51K that I knew it would
- choke on just to induce this error.
-
- The usual trick LOADSTAR employs
- to check if a file already exists is
- to try to rename the file as itself.
- The disk error channel returns either
- error #63 (file exists) or error #62
- (file not found). If the file cannot
- be found, that name is safe to use!
-
- But, what if the user tries to
- save the file to another PARTITION?
- He or she might use something like
- "4:my program" for a filename. By
- tacking on an "R0:", we accidentally
- force the drive to find and rename
- "my program" as "4:my program"!
-
- There needs to be a different way
- to check if the file already exists.
- STAR PACKER tries to OPEN the file
- for READing instead!
-
- 10 open2,dv,2,name$+",s,r":close2
-
- Any partition prefix will be
- automatically obeyed. This tactic
- results in three likely responses
- from the error channel:
-
- Error #62 (file not found) means
- it's safe to go ahead and SAVE the
- file as name$.
-
- Error #64 (file type mismatch)
- means the file DOES exist, but not as
- a SEQuential file. The user will have
- to select another name.
-
- Error #0 (ok) means the file DOES
- exist, and was SEQuential, and opened
- properly without incident. The user
- will have to select another name.
-
- Any other error should be treated
- as such - tell the user immediately.
- A disk drive that is OFF is another
- matter altogether!
-
-
- [FENDER'S POSTMUMBLE:] This program
- is so great I'm mumbleless.
-
-