Thread: Compression Technique

  1. #1
    Registered User
    Join Date
    Oct 2007
    Posts
    6

    Unhappy Compression Technique

    Hi,

    just i want to know that what is the best compression available between C Server and Java Client so that C can compress it and java inturn decompresses it.

  2. #2
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Something like gzip like the HTTP protocol can use, http://www.websiteoptimization.com/s...weak/compress/

    While it may not be the 'best' you surely could get gzip C libraries and same goes for the java client.

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Define "best"...

    Best in "highest compression ratio" or "compresses in the least amount of time", "runs on the most number of architectures" or some other measure, perhaps?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Registered User
    Join Date
    Oct 2007
    Posts
    6
    compression interms of least amount of time

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    zip is probably not your best method here, because it uses really long computation to make the BEST compression ratio.

    Not sure what is the best performing compression method, perhaps something like "Bytepair compression" would be a candidate.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    Just curious, but how do compression algorithms work? I once tried to search on google and found a few algorithms but I couldn't understand them. How can you make 10 bytes 7? I really don't even get the basic concept

  7. #7
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    Quote Originally Posted by Abda92 View Post
    Just curious, but how do compression algorithms work? I once tried to search on google and found a few algorithms but I couldn't understand them. How can you make 10 bytes 7? I really don't even get the basic concept
    For example string
    "1111111111"
    could be compacted as 10x1 (4 bytes insted of 10)
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  8. #8
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    In a real, real, real basic explanation:

    Take a typical paragraph (written words). A single byte can can represent 256 distinct characters/values. In that paragraph however, there are unlikely to be an instance of every single character available. What if you only used 30 or so distinct characters/values out of the 256 that can be represented by a single byte in that paragraph? You are wasting all the unused bit values possible with a byte. Those 30 characters can be represented using only 4 of the usual 8 bits found in a byte. So, you can define a new mapping (in essence) of your used character values to those 4 bits and save about 50% of the space by basically storing 2 characters into the space typically used by 1 character and a 500 byte file can suddenly become 250 bytes (you have to store the new mapping somehow so the data can be extracted and that takes some additional space).

    Additionally, you can look for patterns in data and replace them with shorter sequences. Say you've determined that there is a large file with a small number of repeated bit patterns and that said bit patterns can be quite long. Again, you can define a mapping of sorts that allows you to substitute those long bit patterns with much shorter ones. If you have a theoretical character string containing the letters ABBCDDABABBCDDAB for example, possible patterns are ABBCDDABABBCDDAB. So we've got, in this example, two groups of data that are repeated a number of times throughout the whole set of data. We can define a mapping in this instance that says bit 0 is the string AB and bit 1 is the string BCDD. So, in this extreme example, the above 16 bytes can be represented as the bit sequence 010010 which is less than a single byte and we have a huge savings in size of around 96% (again, the mapping must be represented/saved somehow so this eats up some space). On the decompression side, we'd read the mapping and then go bit-by-bit and replace the smaller sequences with the larger ones, replace all bit sequences of 0 with the 16 bits representing the string AB and all bit sequences of 1 with the 32 bits representing the string BCDD.

    There are many different methods of compressing data. Text tends to be very compressible while many binary data files (movies/music) are not because in many cases they are already compressed to a certain degree. Compression tends to reduce the patterning which make initial rounds of compression with these techniques successful and makes further uses of such methods less and less useful (you can compress something once but thereafter trying to compress it again using the same algorithm does not usually produce meaningful results).

    Some compression is lossless which means that the original data can be recreated from the compressed data exactly (100%). This tends to be critical/typical with text. Some algorithms are lossy however (JPG image compression for example) in which the algorithm basically irrevocably alters the data in such a way that the original can never be recreated exactly. This tends to be typical with many sound/image compression algorithms. Repeatedly compressing/saving a JPG can drastically reduce size each time it is done however the image quality rapidly deteriorates.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    There are three fundamental principles (that are often used in combinations):
    * Replacement of long items with shorter items.
    If a text contains "Adam" many times, and no "x", we can replace all "Adam" with "x". If "Eve" occurs a number of times, but no "X", then we can replace "Eve" with "X". Etc.

    * Removing "empty space".
    If we use only english ascii text, numbers and a few other symbols, the ASCII characters only need [at most] 7 bits to represent the full alphabet. But usually, text is stored as one byte per character (8 bits)- because it makes it very easy for the computer to access a byte at a time. But if we want to save some space, we can remove that un-necessary zero bit at the top of the byte. Of coruse, now it's no longer easy to extract a character from a long stream of characters, because each byte consists of parts of two characters [7 bits + 1 bit of the next, 6 bits + 2 of the next, 5 + 3 bits, 4 + 4 btis, 3 + 5, etc. etc]

    * Repeat reduction [run length encoding].
    As vart described. If something is repeated, then reduce it to a "this is x number of y".

    There is also "desctructive compression". In destructive compresison, the uncompressed data after compression isn't exactly identical to the original data before compression. It is a bit like crumbling a coke-can to make it smaller, and then "uncrumbling it" again - it will not look exactly the same, but it will take up much less space in the meantime. JPEG images are "destructively compressed" [along with a number of the above techniques of storing data in smaller space than originally].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    Thanks, I understand now. But it sure sounds hard to make a good compression algorithm.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Data Compression
    By Ezerhorden in forum C++ Programming
    Replies: 5
    Last Post: 02-11-2006, 10:19 AM
  2. compression
    By X PaYnE X in forum C Programming
    Replies: 20
    Last Post: 01-11-2005, 05:14 PM
  3. efficient compression algorithm pls
    By gooddevil in forum C Programming
    Replies: 7
    Last Post: 05-08-2004, 03:33 AM
  4. File Compression
    By Gr3g in forum C++ Programming
    Replies: 1
    Last Post: 04-16-2002, 04:59 PM
  5. help: compression algorithms...
    By Unregistered in forum C++ Programming
    Replies: 1
    Last Post: 02-24-2002, 06:10 AM