Thread: compression

  1. #1
    Registered User
    Join Date
    Aug 2003
    Posts
    288

    compression

    youve probably noticed the post i made about 'beginner array question', well, heres why:

    im working on a program that takes a black and white image, saves it as an array of 0s and 1s, and then compresses that array as small as it can go..

    i thought of different ways to compress it, but since this is my first attempt at compression, i tried a simple method..

    since my array was only 0s and 1s, i thought about converting each character into a bit... so i can could reduce the size of the file by 8, anyway, that went successfully... but i was wondering, is there any other method i can use to compress the string even further?

    i know that my method is simple because the compression ratio is just pathetic compared to the RAR or ZIP compression ratio, so i did a little digging on google.. i found out they use the LZW (im thinking method?) to compress their data to such small sizes.. i did even more digging, and i found out it uses some sort of a dictionary, and that its a lossless compresser.. anyway, i looked up a few implementations of the LZW algorithm and i wanted to ask this, before i went any further:

    should i use both my method and the LZW algorithm? and in which order? (i.e. divide by 8 then compress with LZW? or visa versa?)

    or is there another compression algorithm that does this already?

    if someone could point me in the direction of any such algorithm, it would be much appreciated

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >should i use both my method and the LZW algorithm?
    Does your method compress well enough for your needs? Why do more work than you have to?
    My best code is written with the delete key.

  3. #3
    Registered User
    Join Date
    Aug 2003
    Posts
    288
    well.. it doesnt compress well enough for what im trying to do.. at the moment, it takes a 1024x768 image and compresses it down to about 96 kb... and thats black and white, i know for a fact that JPEG or PNG can compress that image well below 50 kb and still keep its color..

    also, i need it go down by atleast 10-20 times.. instead of the 8 i get at the moment, if possible

  4. #4
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    Okay, now the question is: Is this an exercise or do you really need the functionality for real work? If it's the former then go ahead and research compression algorithms and see what you can squeeze out of your data. If it's the latter then why bother when there are plenty of excellent compression tools out there already? Unless you're extremely confident in your programming ability, you probably won't do better than any of the good ones (and if you were confident, you wouldn't be asking for help here). So it makes sense to use an external tool to help you out.
    My best code is written with the delete key.

  5. #5
    Registered User
    Join Date
    Aug 2003
    Posts
    288
    it is an exercise.. more or less, im working on improving my compression/encryption skills, and im hoping that if i can achieve a good enough compression ratio, i can use this algorithm for all my future projects that involve compression..

    i know i can get a better ratio, but the only problem is im not sure how to go about doing this.. ill try looking on google for compression algo's and see where it gets me...

    by the way, the reason i know i can get more out of the compressed data is that when i open it in a hex editor, i find that there are alot of repeated values.. like if the picture had 100 pixels black, then 1 white dot and 200 pixels black again.. i wud end up with alot of 00s for the black pixels.. and i just feel thats a waste of space

  6. #6
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >and i just feel thats a waste of space
    Try a Run Length Encoding algorithm and see if it does better than your char-to-bit algorithm. If it doesn't then you could still probably work out an RLE scheme for the compressed file. I recommend starting easy and moving up, and RLE is relatively easy to implement.
    My best code is written with the delete key.

  7. #7
    The C-er
    Join Date
    Mar 2004
    Posts
    192
    Yes. You could use RLE, then see if huffman coding would help. Dynamic huffman codes are kind of fun. I'm considering a strategy like this for my "Game of Life" code (for saving large files of single bits, essentially)

  8. #8
    Registered User
    Join Date
    Aug 2003
    Posts
    288
    i looked up RLE, and it was easy to implement, like you said.. but it only decreased the file size by under 10%.. i also found a modified LZW algo that decreased the size by around 60%, i think ill implement that if i cant find anything better.. i read about huffman, but i couldnt find any implementations :\ maybe i just didnt look hard enuff...

    anyway, i figured i could cut down even more space if i cud find some way of saving only the white pixels.. i was thinking of maybe converting the shapes in the image into vectors or using points to identify where the white pixels are.. im still working on that, so far its not as good as the char-to-bit compression, but im hoping it will be better...

  9. #9
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    rle is good for repetitive chunks of data, but huffman gives the most even compression for just about anything (though it usually requires about 1/2 KB per message to accommodate the lookup table unless an adaptive algorythm is used). a combination of the two would probably produce results comparable to LZW.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  10. #10
    Registered User
    Join Date
    Aug 2003
    Posts
    288
    do you know where i can find an example that uses both algos? or an implementation of huffman, i cant seem to find an implementation of that anywhere :\ everytime i search for it, i end up with modified LZW algos...

    by the way: what do you mean by adaptive huffman? is that just a modified huffman? or a whole different algo

  11. #11
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    >> do you know where i can find an example that uses both algos?

    actually, both are simple to construct from a decent description of their theories. rle is just a matter of counting repetitive chunks that are either contigious or at random locations. huffman generates variable bit-length codewords based on 'letter' frequency and the binary tree that stores the codewords is implemented as an array. looking up 'letters' in one table emits encoded data, shifting codewords 'through' the tree produces decoded data. that's basically it.

    >> what do you mean by adaptive huffman

    I have it backwards actually. a fixed huffman frequency table would take up less space than the an adaptive one since the table for the former is predetermined and sent just once. sorry about that.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  12. #12
    The C-er
    Join Date
    Mar 2004
    Posts
    192
    LZW should give better compression than RLE + Huffman, since it also looks for repeating blocks of data.

    Whether dynamic/adaptive huffman codes are better than "straight" huffman code for your application depends on the nature of the data to be compressed. If the probabilities of the run lengths are reasonably constant, then the normal approach might be best. Otherwise use the dynamic type.

    The nice thing about dynamic codes is that you don't need to make a table.

    yout might find this relevant -

    http://en.wikipedia.org/wiki/Huffman_coding

    BTW, I forgot to mention in my last post that fax machines use RLE with a tabled huffman code.

  13. #13
    Slave MadCow257's Avatar
    Join Date
    Jan 2005
    Posts
    735
    A scheme I think works well is this

    Run length encoder
    Burrows Wheeler Transform
    Run length encoder
    Entropy coder
    LZW

    If I'm not mistaken this should beat ZIP and RAR but at the sacrifice of speed. Taking away the BWT and second RLE should make it fast enough but take away its compression advantage over ZIP, RAR

    >> i know for a fact that JPEG... can compress
    Comparing your compression ratio to JPEG is futile. For most (all?) pictures it is mathematically impossible to lossly beat JPEG.

  14. #14
    Registered User
    Join Date
    Aug 2003
    Posts
    288
    well, i did quite a lot of research on this, i found alot of explanations for the different algo's, and one of the sites had a benchmark of some of the famous algo's and apparently entropy came up number 1.. i looked it up, and it seems to be a formula by Shannon? im not sure if its a formula or an algo.. but anyway, i downloaded a few demos using different compression algo's and so far ive acheived the best compression with RAR compared to bz2, lzw, rle, and huffman...

    ill try what u suggested madcow, is it in the right order? or does the order not matter

  15. #15
    Slave MadCow257's Avatar
    Join Date
    Jan 2005
    Posts
    735
    or does the order not matter
    The ordering does matter, and I believe that is the optimal one.

    i looked it up, and it seems to be a formula by Shannon?
    I meant to say Arithmetic Coder rather then entropy coder. A tutorial on it can be found here
    http://www.arturocampos.com/abstracts.html
    as well as a few other tutorials

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Simple compression algorithm?
    By cyberfish in forum Tech Board
    Replies: 8
    Last Post: 05-02-2008, 09:10 AM
  2. Data Compression
    By Ezerhorden in forum C++ Programming
    Replies: 5
    Last Post: 02-11-2006, 10:19 AM
  3. efficient compression algorithm pls
    By gooddevil in forum C Programming
    Replies: 7
    Last Post: 05-08-2004, 03:33 AM
  4. Compression utility source code
    By Blizzarddog in forum C++ Programming
    Replies: 4
    Last Post: 11-07-2003, 08:15 PM
  5. help: compression algorithms...
    By Unregistered in forum C++ Programming
    Replies: 1
    Last Post: 02-24-2002, 06:10 AM