# compression

Show 80 post(s) from this thread on one page
Page 1 of 2 12 Last
• 01-09-2005
X PaYnE X
compression
youve probably noticed the post i made about 'beginner array question', well, heres why:

im working on a program that takes a black and white image, saves it as an array of 0s and 1s, and then compresses that array as small as it can go..

i thought of different ways to compress it, but since this is my first attempt at compression, i tried a simple method..

since my array was only 0s and 1s, i thought about converting each character into a bit... so i can could reduce the size of the file by 8, anyway, that went successfully... but i was wondering, is there any other method i can use to compress the string even further?

i know that my method is simple because the compression ratio is just pathetic compared to the RAR or ZIP compression ratio, so i did a little digging on google.. i found out they use the LZW (im thinking method?) to compress their data to such small sizes.. i did even more digging, and i found out it uses some sort of a dictionary, and that its a lossless compresser.. anyway, i looked up a few implementations of the LZW algorithm and i wanted to ask this, before i went any further:

should i use both my method and the LZW algorithm? and in which order? (i.e. divide by 8 then compress with LZW? or visa versa?)

or is there another compression algorithm that does this already?

if someone could point me in the direction of any such algorithm, it would be much appreciated
• 01-09-2005
Prelude
>should i use both my method and the LZW algorithm?
Does your method compress well enough for your needs? Why do more work than you have to?
• 01-09-2005
X PaYnE X
well.. it doesnt compress well enough for what im trying to do.. at the moment, it takes a 1024x768 image and compresses it down to about 96 kb... and thats black and white, i know for a fact that JPEG or PNG can compress that image well below 50 kb and still keep its color..

also, i need it go down by atleast 10-20 times.. instead of the 8 i get at the moment, if possible
• 01-09-2005
Prelude
Okay, now the question is: Is this an exercise or do you really need the functionality for real work? If it's the former then go ahead and research compression algorithms and see what you can squeeze out of your data. If it's the latter then why bother when there are plenty of excellent compression tools out there already? Unless you're extremely confident in your programming ability, you probably won't do better than any of the good ones (and if you were confident, you wouldn't be asking for help here). So it makes sense to use an external tool to help you out.
• 01-09-2005
X PaYnE X
it is an exercise.. more or less, im working on improving my compression/encryption skills, and im hoping that if i can achieve a good enough compression ratio, i can use this algorithm for all my future projects that involve compression..

i know i can get a better ratio, but the only problem is im not sure how to go about doing this.. ill try looking on google for compression algo's and see where it gets me...

by the way, the reason i know i can get more out of the compressed data is that when i open it in a hex editor, i find that there are alot of repeated values.. like if the picture had 100 pixels black, then 1 white dot and 200 pixels black again.. i wud end up with alot of 00s for the black pixels.. and i just feel thats a waste of space
• 01-09-2005
Prelude
>and i just feel thats a waste of space
Try a Run Length Encoding algorithm and see if it does better than your char-to-bit algorithm. If it doesn't then you could still probably work out an RLE scheme for the compressed file. I recommend starting easy and moving up, and RLE is relatively easy to implement.
• 01-09-2005
Jez
Yes. You could use RLE, then see if huffman coding would help. Dynamic huffman codes are kind of fun. I'm considering a strategy like this for my "Game of Life" code (for saving large files of single bits, essentially)
• 01-09-2005
X PaYnE X
i looked up RLE, and it was easy to implement, like you said.. but it only decreased the file size by under 10%.. i also found a modified LZW algo that decreased the size by around 60%, i think ill implement that if i cant find anything better.. i read about huffman, but i couldnt find any implementations :\ maybe i just didnt look hard enuff...

anyway, i figured i could cut down even more space if i cud find some way of saving only the white pixels.. i was thinking of maybe converting the shapes in the image into vectors or using points to identify where the white pixels are.. im still working on that, so far its not as good as the char-to-bit compression, but im hoping it will be better...
• 01-09-2005
Sebastiani
rle is good for repetitive chunks of data, but huffman gives the most even compression for just about anything (though it usually requires about 1/2 KB per message to accommodate the lookup table unless an adaptive algorythm is used). a combination of the two would probably produce results comparable to LZW.
• 01-09-2005
X PaYnE X
do you know where i can find an example that uses both algos? or an implementation of huffman, i cant seem to find an implementation of that anywhere :\ everytime i search for it, i end up with modified LZW algos...

by the way: what do you mean by adaptive huffman? is that just a modified huffman? or a whole different algo
• 01-09-2005
Sebastiani
>> do you know where i can find an example that uses both algos?

actually, both are simple to construct from a decent description of their theories. rle is just a matter of counting repetitive chunks that are either contigious or at random locations. huffman generates variable bit-length codewords based on 'letter' frequency and the binary tree that stores the codewords is implemented as an array. looking up 'letters' in one table emits encoded data, shifting codewords 'through' the tree produces decoded data. that's basically it.

>> what do you mean by adaptive huffman

I have it backwards actually. a fixed huffman frequency table would take up less space than the an adaptive one since the table for the former is predetermined and sent just once. sorry about that.
• 01-10-2005
Jez
LZW should give better compression than RLE + Huffman, since it also looks for repeating blocks of data.

Whether dynamic/adaptive huffman codes are better than "straight" huffman code for your application depends on the nature of the data to be compressed. If the probabilities of the run lengths are reasonably constant, then the normal approach might be best. Otherwise use the dynamic type.

The nice thing about dynamic codes is that you don't need to make a table.

yout might find this relevant -

http://en.wikipedia.org/wiki/Huffman_coding

BTW, I forgot to mention in my last post that fax machines use RLE with a tabled huffman code.
• 01-10-2005
A scheme I think works well is this

Run length encoder
Burrows Wheeler Transform
Run length encoder
Entropy coder
LZW

If I'm not mistaken this should beat ZIP and RAR but at the sacrifice of speed. Taking away the BWT and second RLE should make it fast enough but take away its compression advantage over ZIP, RAR

>> i know for a fact that JPEG... can compress
Comparing your compression ratio to JPEG is futile. For most (all?) pictures it is mathematically impossible to lossly beat JPEG.
• 01-11-2005
X PaYnE X
well, i did quite a lot of research on this, i found alot of explanations for the different algo's, and one of the sites had a benchmark of some of the famous algo's and apparently entropy came up number 1.. i looked it up, and it seems to be a formula by Shannon? im not sure if its a formula or an algo.. but anyway, i downloaded a few demos using different compression algo's and so far ive acheived the best compression with RAR compared to bz2, lzw, rle, and huffman...

ill try what u suggested madcow, is it in the right order? or does the order not matter
• 01-11-2005