I'm hoping someone here can shed some further insight on the PKWare compression/decompression stuff. Back in 2001 Ben Rudiak-Gould posted an outline of how the PKWare decompression works on the Google group comp.compression BBS. In 2003 Mark Adler wrote a simple decompressor based off of Ben's post which included a correction to Ben's outline.
First off let me say that this is way out of my league of comprehension but I hope any replies may help me further inch along in understanding this stuff. Below is a small corrected portion of Ben's original post:
Here's a sample compressed stream and how it would be decoded.
The stream is 00 04 82 24 25 8f 80 7f.
The first byte of the header is 0, so the fixed-width representation
is used for literal bytes.
The second byte is 4, so the dictionary is 1K in size.
The bitstream portion breaks down as follows:
0 10000010 literal byte 41 (ASCII 'A')
0 10010010 literal byte 49 (ASCII 'I')
1 001001 111000 copy 11 bytes starting at dictionary byte 1
(counting from the end starting with 0)
1 000000011111111 end of stream
0 padding to multiple of 8 bits (ignored)
This stream would decompress to "AIAIAIAIAIAIA" (without the quotes).
What I don't understand is the correlation between the hex digits in the bit stream and ASCII characters? How does hex 82 become ASCII 'A' for the data part in the example stream? Just of of curiousity, would anyone know what the bit stream would look like for "ABCD" (without the quotes) given that there all literal bytes with no length/distance pair?
Thanks.