Thread: what's the best compression algorithm made so far?

  1. #1
    Registered User toaster's Avatar
    Join Date
    Apr 2002
    Posts
    161

    what's the best compression algorithm made so far?

    see topic.
    think only with code.
    write only with source.

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    What's the best sorting method? There's not a "best" algorithm simply because no algorithm can can meet the criteria for being the best in every situation.

    -Prelude
    My best code is written with the delete key.

  3. #3
    Just because ygfperson's Avatar
    Join Date
    Jan 2002
    Posts
    2,490
    what about for maximum compressability?

  4. #4
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >what about for maximum compressability?
    What about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless. You have to find the best algorithm for the data you wish to compress, one would compress an application for storage differently than streamed video because the video can lose data without a noticeable change, but the application would be destroyed if it lost just about anything.

    Compression is very much like sorting in which algorithm is better, it really depends on what you are compressing for what purpose.

    -Prelude
    My best code is written with the delete key.

  5. #5
    Just because ygfperson's Avatar
    Join Date
    Jan 2002
    Posts
    2,490
    let's say... lossless
    you're compressing a non-text file, probably a computer binary, but it could be a movie file or a jpeg. (both of which have been compressed with lossy methods already).

  6. #6
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    I'd probably go with the Deflate algorithm. It's something like a mix between LZ77 and Huffman, but not. This will make more sense than I could by butchering an explanation.

    -Prelude
    My best code is written with the delete key.

  7. #7
    Registered User toaster's Avatar
    Join Date
    Apr 2002
    Posts
    161
    I could write a routine that could compress anything you gave me to almost nothing, ...
    -Prelude
    I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?
    think only with code.
    write only with source.

  8. #8
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >Can you show a sample algorithm that does this?
    Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless.

    -Prelude
    My best code is written with the delete key.

  9. #9
    Unregistered
    Guest
    Originally posted by toaster


    I'd been trying to come up with some solution that does this but was unsuccessful. Can you show a sample algorithm that does this?
    think of it this way you have a text file filled with one google of the letter "g" i write my program so i count how many g's there are then i delete everything and stream to the file 100000000g... then when you want to go the other way you just erase everything and stream g's utntil they reach a google

    :P there you go i know have a super compression algo that isnt worth 2 cents even if it did compress 10gigs to 1kb <- aproximative :P

  10. #10
    Registered User toaster's Avatar
    Join Date
    Apr 2002
    Posts
    161
    Originally posted by Prelude
    >Can you show a sample algorithm that does this?
    Take the first bit of each byte and add the integer value (0 or 1) to a running total. Instant super-compression that is totally worthless.

    -Prelude
    but when you do that, I don't think I can decompress that.
    think only with code.
    write only with source.

  11. #11
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >but when you do that, I don't think I can decompress that.
    Of course not, if you'll read up a bit you'll see this. I was trying to make a point.
    What about data integrity and lost information? I could write a routine that could compress anything you gave me to almost nothing, but it would be so lossy that it would be worthless.
    -Prelude
    My best code is written with the delete key.

  12. #12
    Registered User toaster's Avatar
    Join Date
    Apr 2002
    Posts
    161
    I'm sure there's got to be a way there is a good lossless compression algorithm without the use of sorting including use of binary trees. All the compression algorithms I've seen require this.
    think only with code.
    write only with source.

  13. #13
    Registered User
    Join Date
    Sep 2001
    Posts
    752
    There are two parts to lossless compression. First there is the sampling, deciding what all the different occourances of each character are. This is a tricky bit, different sampling methods work well for different kinds of files. I believe that compression programs generally check what type of file they are compressing before choosing their sampling method. I do not, however, know of any sampling method where a tree is involved. Basically, you're just counting the number of times each character occours within the file (or within a window of recently accessed characters).

    Next is the encoding scheme, and to the best of my knowledge, this is a choice between Huffman encoding, or Arithmetic encoding. Arithmetic encoding is flat out superiour (in terms of compression, it requires more processing though).

    Huffman encoding uses a binary tree, but Arithmetic encoding does not. Basically, the 'theory' behind arithmetic encoding is this...

    Let's say I want to compress with a frequency of 50% As, 25%Bs, and 25%Cs. Let's also assume that these frequencies do not change as we read the file.

    Using these frequencies, let's say I want to compress the string...
    AABCBBCAA

    We let A correspond to the range 0 - .5 (bottom half)
    B = .5 - .75 (middle fourth)
    C = .75 - 1 (top fourth)

    Compressing this string would go like so...

    Read the first A, our number will be from 0 - .5 (Split the interval 0 - 1 into it's bottom half)
    Read the second A, our number will be from 0 - .25 (Split the interval 0 - .5 into it's bottom half)
    Read the B, our number will be from .125 - .1857 (split the interval 0 - .5 into it's middle fourth)
    Read the C, our number will be from .170525 - .1857 (Split the interval .125 - .1857 into it's top fourth)
    And so on...

    Two things are clearly occouring here. First, the farther we go, the closer we get to an actuall number. After just four characters, with a very broad range of frequencies, we have gotten to four 6 decimal places, so clearly a float is not the answer. Instead, you kinda have to brew your own way of making exact decimals (floats are just approximations) using number streams that take up to a kilobyte to store.

    Also, the further we go, we don't have to worry about previous decimal places. Notice how in the example above, the initial decimal is clearly going to be .1 In binary, the effect is more noticable. So, we don't have to keep track of the entire number really, we just have to keep an eye on a window of previous decimals.

    Suppose you have a file that consists of say, 1000 As, 1 B, and nothing else (it's an extreme example, I know). The best you could do with Huffmann encoding is one bit per character, so 1001 bits. Arithmetic encoding actually achieves the theoretical limit, so it would only take something like 10 bits (This is only taking into account the compressed data, not the storage required to keep track of sampling information, which shouldn't vary with the encoding algorithm).

    If you want to learn this stuff, check your local (university) library for "The Data Compression Book". Even if you don't get through the code (I couldn't personally), it does a good job at explaining the ideas behind compression.
    Callou collei we'll code the way
    Of prime numbers and pings!

  14. #14
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    Well prelude, I can tell you their is a WORST in every situation. If you look at the code (wotsit.org or .com or seomthing), in most photographic situations, bitmaps will actually expand the information of a pciture.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. algorithm for duplicate file checking help
    By geekoftheweek in forum C Programming
    Replies: 1
    Last Post: 04-04-2009, 01:46 PM
  2. Average Algorithm
    By tuurb046 in forum C Programming
    Replies: 3
    Last Post: 01-20-2007, 12:58 PM
  3. Algorithm - Complexity.
    By visitant... in forum C Programming
    Replies: 5
    Last Post: 05-13-2003, 02:24 AM
  4. Replies: 1
    Last Post: 11-17-2002, 10:52 AM
  5. Algorithm question
    By PJYelton in forum C++ Programming
    Replies: 2
    Last Post: 10-28-2002, 10:52 AM