toaster

05-12-2002, 04:34 PM

see topic.

View Full Version : what's the best compression algorithm made so far?

toaster

05-12-2002, 04:34 PM

see topic.

Prelude

05-12-2002, 04:53 PM

What's the best sorting method? There's not a "best" algorithm simply because no algorithm can can meet the criteria for being the best in every situation.

-Prelude

-Prelude

ygfperson

05-12-2002, 08:12 PM

what about for maximum compressability?

Prelude

05-12-2002, 09:00 PM

>what about for maximum compressability?

What about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless. You have to find the best algorithm for the data you wish to compress, one would compress an application for storage differently than streamed video because the video can lose data without a noticeable change, but the application would be destroyed if it lost just about anything.

Compression is very much like sorting in which algorithm is better, it really depends on what you are compressing for what purpose.

-Prelude

What about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless. You have to find the best algorithm for the data you wish to compress, one would compress an application for storage differently than streamed video because the video can lose data without a noticeable change, but the application would be destroyed if it lost just about anything.

Compression is very much like sorting in which algorithm is better, it really depends on what you are compressing for what purpose.

-Prelude

ygfperson

05-12-2002, 09:35 PM

let's say... lossless

you're compressing a non-text file, probably a computer binary, but it could be a movie file or a jpeg. (both of which have been compressed with lossy methods already).

you're compressing a non-text file, probably a computer binary, but it could be a movie file or a jpeg. (both of which have been compressed with lossy methods already).

Prelude

05-12-2002, 10:11 PM

I'd probably go with the Deflate algorithm. It's something like a mix between LZ77 and Huffman, but not. This (http://www.gzip.org/zlib/rfc-deflate.html) will make more sense than I could by butchering an explanation. ;)

-Prelude

-Prelude

toaster

05-13-2002, 06:29 PM

I could write a routine that could compress anything you gave me to almost nothing, ...

-Prelude

I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?

-Prelude

I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?

Prelude

05-13-2002, 07:22 PM

>Can you show a sample algorithm that does this?

Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless. ;)

-Prelude

Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless. ;)

-Prelude

Unregistered

05-13-2002, 09:13 PM

Originally posted by toaster

I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?

think of it this way you have a text file filled with one google of the letter "g" i write my program so i count how many g's there are then i delete everything and stream to the file 100000000g... then when you want to go the other way you just erase everything and stream g's utntil they reach a google

:P there you go i know have a super compression algo that isnt worth 2 cents even if it did compress 10gigs to 1kb <- aproximative :P

I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?

think of it this way you have a text file filled with one google of the letter "g" i write my program so i count how many g's there are then i delete everything and stream to the file 100000000g... then when you want to go the other way you just erase everything and stream g's utntil they reach a google

:P there you go i know have a super compression algo that isnt worth 2 cents even if it did compress 10gigs to 1kb <- aproximative :P

toaster

05-14-2002, 05:29 PM

Originally posted by Prelude

>Can you show a sample algorithm that does this?

Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless. ;)

-Prelude

but when you do that, I don't think I can decompress that.

:(

>Can you show a sample algorithm that does this?

Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless. ;)

-Prelude

but when you do that, I don't think I can decompress that.

:(

Prelude

05-14-2002, 05:37 PM

>but when you do that, I don't think I can decompress that.

Of course not, if you'll read up a bit you'll see this. I was trying to make a point.

What about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless.

-Prelude

Of course not, if you'll read up a bit you'll see this. I was trying to make a point.

What about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless.

-Prelude

toaster

05-14-2002, 06:39 PM

I'm sure there's got to be a way there is a good lossless compression algorithm without the use of sorting including use of binary trees. All the compression algorithms I've seen require this.

:mad:

:mad:

QuestionC

05-15-2002, 10:47 AM

There are two parts to lossless compression. First there is the sampling, deciding what all the different occourances of each character are. This is a tricky bit, different sampling methods work well for different kinds of files. I believe that compression programs generally check what type of file they are compressing before choosing their sampling method. I do not, however, know of any sampling method where a tree is involved. Basically, you're just counting the number of times each character occours within the file (or within a window of recently accessed characters).

Next is the encoding scheme, and to the best of my knowledge, this is a choice between Huffman encoding, or Arithmetic encoding. Arithmetic encoding is flat out superiour (in terms of compression, it requires more processing though).

Huffman encoding uses a binary tree, but Arithmetic encoding does not. Basically, the 'theory' behind arithmetic encoding is this...

Let's say I want to compress with a frequency of 50% As, 25%Bs, and 25%Cs. Let's also assume that these frequencies do not change as we read the file.

Using these frequencies, let's say I want to compress the string...

AABCBBCAA

We let A correspond to the range 0 - .5 (bottom half)

B = .5 - .75 (middle fourth)

C = .75 - 1 (top fourth)

Compressing this string would go like so...

Read the first A, our number will be from 0 - .5 (Split the interval 0 - 1 into it's bottom half)

Read the second A, our number will be from 0 - .25 (Split the interval 0 - .5 into it's bottom half)

Read the B, our number will be from .125 - .1857 (split the interval 0 - .5 into it's middle fourth)

Read the C, our number will be from .170525 - .1857 (Split the interval .125 - .1857 into it's top fourth)

And so on...

Two things are clearly occouring here. First, the farther we go, the closer we get to an actuall number. After just four characters, with a very broad range of frequencies, we have gotten to four 6 decimal places, so clearly a float is not the answer. Instead, you kinda have to brew your own way of making exact decimals (floats are just approximations) using number streams that take up to a kilobyte to store.

Also, the further we go, we don't have to worry about previous decimal places. Notice how in the example above, the initial decimal is clearly going to be .1 In binary, the effect is more noticable. So, we don't have to keep track of the entire number really, we just have to keep an eye on a window of previous decimals.

Suppose you have a file that consists of say, 1000 As, 1 B, and nothing else (it's an extreme example, I know). The best you could do with Huffmann encoding is one bit per character, so 1001 bits. Arithmetic encoding actually achieves the theoretical limit, so it would only take something like 10 bits (This is only taking into account the compressed data, not the storage required to keep track of sampling information, which shouldn't vary with the encoding algorithm).

If you want to learn this stuff, check your local (university) library for "The Data Compression Book". Even if you don't get through the code (I couldn't personally), it does a good job at explaining the ideas behind compression.

Next is the encoding scheme, and to the best of my knowledge, this is a choice between Huffman encoding, or Arithmetic encoding. Arithmetic encoding is flat out superiour (in terms of compression, it requires more processing though).

Huffman encoding uses a binary tree, but Arithmetic encoding does not. Basically, the 'theory' behind arithmetic encoding is this...

Let's say I want to compress with a frequency of 50% As, 25%Bs, and 25%Cs. Let's also assume that these frequencies do not change as we read the file.

Using these frequencies, let's say I want to compress the string...

AABCBBCAA

We let A correspond to the range 0 - .5 (bottom half)

B = .5 - .75 (middle fourth)

C = .75 - 1 (top fourth)

Compressing this string would go like so...

Read the first A, our number will be from 0 - .5 (Split the interval 0 - 1 into it's bottom half)

Read the second A, our number will be from 0 - .25 (Split the interval 0 - .5 into it's bottom half)

Read the B, our number will be from .125 - .1857 (split the interval 0 - .5 into it's middle fourth)

Read the C, our number will be from .170525 - .1857 (Split the interval .125 - .1857 into it's top fourth)

And so on...

Two things are clearly occouring here. First, the farther we go, the closer we get to an actuall number. After just four characters, with a very broad range of frequencies, we have gotten to four 6 decimal places, so clearly a float is not the answer. Instead, you kinda have to brew your own way of making exact decimals (floats are just approximations) using number streams that take up to a kilobyte to store.

Also, the further we go, we don't have to worry about previous decimal places. Notice how in the example above, the initial decimal is clearly going to be .1 In binary, the effect is more noticable. So, we don't have to keep track of the entire number really, we just have to keep an eye on a window of previous decimals.

Suppose you have a file that consists of say, 1000 As, 1 B, and nothing else (it's an extreme example, I know). The best you could do with Huffmann encoding is one bit per character, so 1001 bits. Arithmetic encoding actually achieves the theoretical limit, so it would only take something like 10 bits (This is only taking into account the compressed data, not the storage required to keep track of sampling information, which shouldn't vary with the encoding algorithm).

If you want to learn this stuff, check your local (university) library for "The Data Compression Book". Even if you don't get through the code (I couldn't personally), it does a good job at explaining the ideas behind compression.

sean

05-15-2002, 03:20 PM

Well prelude, I can tell you their is a WORST in every situation. If you look at the code (wotsit.org or .com or seomthing), in most photographic situations, bitmaps will actually expand the information of a pciture.

Powered by vBulletin® Version 4.2.5 Copyright © 2019 vBulletin Solutions Inc. All rights reserved.