Compression Technique

**vikranth** · 11-06-2007

Hi,

just i want to know that what is the best compression available between C Server and Java Client so that C can compress it and java inturn decompresses it.

**zacs7** · 11-06-2007

Something like gzip like the HTTP protocol can use, http://www.websiteoptimization.com/s...weak/compress/

While it may not be the 'best' you surely could get gzip C libraries and same goes for the java client.

**matsp** · 11-06-2007

Define "best"...

Best in "highest compression ratio" or "compresses in the least amount of time", "runs on the most number of architectures" or some other measure, perhaps?

--
Mats

**vikranth** · 11-06-2007

compression interms of least amount of time

**matsp** · 11-06-2007

zip is probably not your best method here, because it uses really long computation to make the BEST compression ratio.

Not sure what is the best performing compression method, perhaps something like "Bytepair compression" would be a candidate.

--
Mats

**Abda92** · 11-06-2007

Just curious, but how do compression algorithms work? I once tried to search on google and found a few algorithms but I couldn't understand them. How can you make 10 bytes 7? I really don't even get the basic concept

**vart** · 11-06-2007

Originally Posted by Abda92

Just curious, but how do compression algorithms work? I once tried to search on google and found a few algorithms but I couldn't understand them. How can you make 10 bytes 7? I really don't even get the basic concept

For example string
"1111111111"
could be compacted as 10x1 (4 bytes insted of 10)

**hk_mp5kpdw** · 11-06-2007

In a real, real, real basic explanation:

Take a typical paragraph (written words). A single byte can can represent 256 distinct characters/values. In that paragraph however, there are unlikely to be an instance of every single character available. What if you only used 30 or so distinct characters/values out of the 256 that can be represented by a single byte in that paragraph? You are wasting all the unused bit values possible with a byte. Those 30 characters can be represented using only 4 of the usual 8 bits found in a byte. So, you can define a new mapping (in essence) of your used character values to those 4 bits and save about 50% of the space by basically storing 2 characters into the space typically used by 1 character and a 500 byte file can suddenly become 250 bytes (you have to store the new mapping somehow so the data can be extracted and that takes some additional space).

Additionally, you can look for patterns in data and replace them with shorter sequences. Say you've determined that there is a large file with a small number of repeated bit patterns and that said bit patterns can be quite long. Again, you can define a mapping of sorts that allows you to substitute those long bit patterns with much shorter ones. If you have a theoretical character string containing the letters ABBCDDABABBCDDAB for example, possible patterns are ABBCDDABABBCDDAB. So we've got, in this example, two groups of data that are repeated a number of times throughout the whole set of data. We can define a mapping in this instance that says bit 0 is the string AB and bit 1 is the string BCDD. So, in this extreme example, the above 16 bytes can be represented as the bit sequence 010010 which is less than a single byte and we have a huge savings in size of around 96% (again, the mapping must be represented/saved somehow so this eats up some space). On the decompression side, we'd read the mapping and then go bit-by-bit and replace the smaller sequences with the larger ones, replace all bit sequences of 0 with the 16 bits representing the string AB and all bit sequences of 1 with the 32 bits representing the string BCDD.

There are many different methods of compressing data. Text tends to be very compressible while many binary data files (movies/music) are not because in many cases they are already compressed to a certain degree. Compression tends to reduce the patterning which make initial rounds of compression with these techniques successful and makes further uses of such methods less and less useful (you can compress something once but thereafter trying to compress it again using the same algorithm does not usually produce meaningful results).

Some compression is lossless which means that the original data can be recreated from the compressed data exactly (100%). This tends to be critical/typical with text. Some algorithms are lossy however (JPG image compression for example) in which the algorithm basically irrevocably alters the data in such a way that the original can never be recreated exactly. This tends to be typical with many sound/image compression algorithms. Repeatedly compressing/saving a JPG can drastically reduce size each time it is done however the image quality rapidly deteriorates.

**matsp** · 11-06-2007

There are three fundamental principles (that are often used in combinations):
* Replacement of long items with shorter items.
If a text contains "Adam" many times, and no "x", we can replace all "Adam" with "x". If "Eve" occurs a number of times, but no "X", then we can replace "Eve" with "X". Etc.

* Removing "empty space".
If we use only english ascii text, numbers and a few other symbols, the ASCII characters only need [at most] 7 bits to represent the full alphabet. But usually, text is stored as one byte per character (8 bits)- because it makes it very easy for the computer to access a byte at a time. But if we want to save some space, we can remove that un-necessary zero bit at the top of the byte. Of coruse, now it's no longer easy to extract a character from a long stream of characters, because each byte consists of parts of two characters [7 bits + 1 bit of the next, 6 bits + 2 of the next, 5 + 3 bits, 4 + 4 btis, 3 + 5, etc. etc]

* Repeat reduction [run length encoding].
As vart described. If something is repeated, then reduce it to a "this is x number of y".

There is also "desctructive compression". In destructive compresison, the uncompressed data after compression isn't exactly identical to the original data before compression. It is a bit like crumbling a coke-can to make it smaller, and then "uncrumbling it" again - it will not look exactly the same, but it will take up much less space in the meantime. JPEG images are "destructively compressed" [along with a number of the above techniques of storing data in smaller space than originally].

--
Mats

**Abda92** · 11-06-2007

Thanks, I understand now. But it sure sounds hard to make a good compression algorithm.

Thread: Compression Technique

Thread Tools

Search Thread

Display

Compression Technique

Similar Threads

Data Compression

compression

efficient compression algorithm pls

File Compression

help: compression algorithms...