see topic.
This is a discussion on what's the best compression algorithm made so far? within the A Brief History of Cprogramming.com forums, part of the Community Boards category; see topic....
see topic.
think only with code.
write only with source.
What's the best sorting method? There's not a "best" algorithm simply because no algorithm can can meet the criteria for being the best in every situation.
-Prelude
My best code is written with the delete key.
>what about for maximum compressability?
What about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless. You have to find the best algorithm for the data you wish to compress, one would compress an application for storage differently than streamed video because the video can lose data without a noticeable change, but the application would be destroyed if it lost just about anything.
Compression is very much like sorting in which algorithm is better, it really depends on what you are compressing for what purpose.
-Prelude
My best code is written with the delete key.
let's say... lossless
you're compressing a non-text file, probably a computer binary, but it could be a movie file or a jpeg. (both of which have been compressed with lossy methods already).
I'd probably go with the Deflate algorithm. It's something like a mix between LZ77 and Huffman, but not. This will make more sense than I could by butchering an explanation.
-Prelude
My best code is written with the delete key.
I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?I could write a routine that could compress anything you gave me to almost nothing, ...
-Prelude
think only with code.
write only with source.
>Can you show a sample algorithm that does this?
Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless.
-Prelude
My best code is written with the delete key.
think of it this way you have a text file filled with one google of the letter "g" i write my program so i count how many g's there are then i delete everything and stream to the file 100000000g... then when you want to go the other way you just erase everything and stream g's utntil they reach a googleOriginally posted by toaster
I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?
:P there you go i know have a super compression algo that isnt worth 2 cents even if it did compress 10gigs to 1kb <- aproximative :P
but when you do that, I don't think I can decompress that.Originally posted by Prelude
>Can you show a sample algorithm that does this?
Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless.
-Prelude
think only with code.
write only with source.
>but when you do that, I don't think I can decompress that.
Of course not, if you'll read up a bit you'll see this. I was trying to make a point.
-PreludeWhat about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless.
My best code is written with the delete key.
I'm sure there's got to be a way there is a good lossless compression algorithm without the use of sorting including use of binary trees. All the compression algorithms I've seen require this.
think only with code.
write only with source.
There are two parts to lossless compression. First there is the sampling, deciding what all the different occourances of each character are. This is a tricky bit, different sampling methods work well for different kinds of files. I believe that compression programs generally check what type of file they are compressing before choosing their sampling method. I do not, however, know of any sampling method where a tree is involved. Basically, you're just counting the number of times each character occours within the file (or within a window of recently accessed characters).
Next is the encoding scheme, and to the best of my knowledge, this is a choice between Huffman encoding, or Arithmetic encoding. Arithmetic encoding is flat out superiour (in terms of compression, it requires more processing though).
Huffman encoding uses a binary tree, but Arithmetic encoding does not. Basically, the 'theory' behind arithmetic encoding is this...
Let's say I want to compress with a frequency of 50% As, 25%Bs, and 25%Cs. Let's also assume that these frequencies do not change as we read the file.
Using these frequencies, let's say I want to compress the string...
AABCBBCAA
We let A correspond to the range 0 - .5 (bottom half)
B = .5 - .75 (middle fourth)
C = .75 - 1 (top fourth)
Compressing this string would go like so...
Read the first A, our number will be from 0 - .5 (Split the interval 0 - 1 into it's bottom half)
Read the second A, our number will be from 0 - .25 (Split the interval 0 - .5 into it's bottom half)
Read the B, our number will be from .125 - .1857 (split the interval 0 - .5 into it's middle fourth)
Read the C, our number will be from .170525 - .1857 (Split the interval .125 - .1857 into it's top fourth)
And so on...
Two things are clearly occouring here. First, the farther we go, the closer we get to an actuall number. After just four characters, with a very broad range of frequencies, we have gotten to four 6 decimal places, so clearly a float is not the answer. Instead, you kinda have to brew your own way of making exact decimals (floats are just approximations) using number streams that take up to a kilobyte to store.
Also, the further we go, we don't have to worry about previous decimal places. Notice how in the example above, the initial decimal is clearly going to be .1 In binary, the effect is more noticable. So, we don't have to keep track of the entire number really, we just have to keep an eye on a window of previous decimals.
Suppose you have a file that consists of say, 1000 As, 1 B, and nothing else (it's an extreme example, I know). The best you could do with Huffmann encoding is one bit per character, so 1001 bits. Arithmetic encoding actually achieves the theoretical limit, so it would only take something like 10 bits (This is only taking into account the compressed data, not the storage required to keep track of sampling information, which shouldn't vary with the encoding algorithm).
If you want to learn this stuff, check your local (university) library for "The Data Compression Book". Even if you don't get through the code (I couldn't personally), it does a good job at explaining the ideas behind compression.
Callou collei we'll code the way
Of prime numbers and pings!
Well prelude, I can tell you their is a WORST in every situation. If you look at the code (wotsit.org or .com or seomthing), in most photographic situations, bitmaps will actually expand the information of a pciture.