# what's the best compression algorithm made so far?

Printable View

• 05-12-2002
toaster
what's the best compression algorithm made so far?
see topic.
• 05-12-2002
Prelude
What's the best sorting method? There's not a "best" algorithm simply because no algorithm can can meet the criteria for being the best in every situation.

-Prelude
• 05-12-2002
ygfperson
what about for maximum compressability?
• 05-12-2002
Prelude
>what about for maximum compressability?
What about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless. You have to find the best algorithm for the data you wish to compress, one would compress an application for storage differently than streamed video because the video can lose data without a noticeable change, but the application would be destroyed if it lost just about anything.

Compression is very much like sorting in which algorithm is better, it really depends on what you are compressing for what purpose.

-Prelude
• 05-12-2002
ygfperson
let's say... lossless
you're compressing a non-text file, probably a computer binary, but it could be a movie file or a jpeg. (both of which have been compressed with lossy methods already).
• 05-12-2002
Prelude
I'd probably go with the Deflate algorithm. It's something like a mix between LZ77 and Huffman, but not. This will make more sense than I could by butchering an explanation. ;)

-Prelude
• 05-13-2002
toaster
Quote:

I could write a routine that could compress anything you gave me to almost nothing, ...
-Prelude
I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?
• 05-13-2002
Prelude
>Can you show a sample algorithm that does this?
Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless. ;)

-Prelude
• 05-13-2002
Unregistered
Quote:

Originally posted by toaster

I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?

think of it this way you have a text file filled with one google of the letter "g" i write my program so i count how many g's there are then i delete everything and stream to the file 100000000g... then when you want to go the other way you just erase everything and stream g's utntil they reach a google

:P there you go i know have a super compression algo that isnt worth 2 cents even if it did compress 10gigs to 1kb <- aproximative :P
• 05-14-2002
toaster
Quote:

Originally posted by Prelude
>Can you show a sample algorithm that does this?
Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless. ;)

-Prelude

but when you do that, I don't think I can decompress that.
:(
• 05-14-2002
Prelude
>but when you do that, I don't think I can decompress that.
Of course not, if you'll read up a bit you'll see this. I was trying to make a point.
Quote:

What about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless.
-Prelude
• 05-14-2002
toaster
I'm sure there's got to be a way there is a good lossless compression algorithm without the use of sorting including use of binary trees. All the compression algorithms I've seen require this.
:mad:
• 05-15-2002
QuestionC
There are two parts to lossless compression. First there is the sampling, deciding what all the different occourances of each character are. This is a tricky bit, different sampling methods work well for different kinds of files. I believe that compression programs generally check what type of file they are compressing before choosing their sampling method. I do not, however, know of any sampling method where a tree is involved. Basically, you're just counting the number of times each character occours within the file (or within a window of recently accessed characters).

Next is the encoding scheme, and to the best of my knowledge, this is a choice between Huffman encoding, or Arithmetic encoding. Arithmetic encoding is flat out superiour (in terms of compression, it requires more processing though).

Huffman encoding uses a binary tree, but Arithmetic encoding does not. Basically, the 'theory' behind arithmetic encoding is this...

Let's say I want to compress with a frequency of 50% As, 25%Bs, and 25%Cs. Let's also assume that these frequencies do not change as we read the file.

Using these frequencies, let's say I want to compress the string...
AABCBBCAA

We let A correspond to the range 0 - .5 (bottom half)
B = .5 - .75 (middle fourth)
C = .75 - 1 (top fourth)

Compressing this string would go like so...

Read the first A, our number will be from 0 - .5 (Split the interval 0 - 1 into it's bottom half)
Read the second A, our number will be from 0 - .25 (Split the interval 0 - .5 into it's bottom half)
Read the B, our number will be from .125 - .1857 (split the interval 0 - .5 into it's middle fourth)
Read the C, our number will be from .170525 - .1857 (Split the interval .125 - .1857 into it's top fourth)
And so on...

Two things are clearly occouring here. First, the farther we go, the closer we get to an actuall number. After just four characters, with a very broad range of frequencies, we have gotten to four 6 decimal places, so clearly a float is not the answer. Instead, you kinda have to brew your own way of making exact decimals (floats are just approximations) using number streams that take up to a kilobyte to store.

Also, the further we go, we don't have to worry about previous decimal places. Notice how in the example above, the initial decimal is clearly going to be .1 In binary, the effect is more noticable. So, we don't have to keep track of the entire number really, we just have to keep an eye on a window of previous decimals.

Suppose you have a file that consists of say, 1000 As, 1 B, and nothing else (it's an extreme example, I know). The best you could do with Huffmann encoding is one bit per character, so 1001 bits. Arithmetic encoding actually achieves the theoretical limit, so it would only take something like 10 bits (This is only taking into account the compressed data, not the storage required to keep track of sampling information, which shouldn't vary with the encoding algorithm).

If you want to learn this stuff, check your local (university) library for "The Data Compression Book". Even if you don't get through the code (I couldn't personally), it does a good job at explaining the ideas behind compression.
• 05-15-2002
sean
Well prelude, I can tell you their is a WORST in every situation. If you look at the code (wotsit.org or .com or seomthing), in most photographic situations, bitmaps will actually expand the information of a pciture.