Thread: Data Compression

  1. #1
    Registered User
    Join Date
    Jan 2006
    Posts
    5

    Data Compression

    I'm planning on making a data compression software for my thesis which is 2 years from now, can you tell me what problems I could face or the difficulties of making a data compression software which is almost if not better than the best today? And any good sites on how to learn data compression? I want to make a kick-ass algorithm

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    http://www.faqs.org/faqs/compression-faq/

    First you need to decide wether to go for lossless (like zip) or lossy (like mpeg) compression.

    > I want to make a kick-ass algorithm
    Then you need to be aware of theoretical limits of compressability.
    http://en.wikipedia.org/wiki/Information_theory

  3. #3
    Registered User
    Join Date
    Aug 2003
    Posts
    1,218
    Problems you will face:
    How will you do the actual compression? Compression is not an easy task, and by the sound of it you dont have enough experience to do it (yet).

    What algorithms will you use and how will you implement them?
    Some algorightms you may want to look into are Huffmans compressionalgorithm and RLE.

    How will you store the files?
    Many softwares of today use archives, how will you do it?

    Remember, the softwares that are today have almost pushed the limits, there are only so many ways you can store the data in, and in the end it all comes down to reducing the number of bits it takes to represent 1 byte.
    STL Util a small headers-only library with various utility functions. Mainly for fun but feedback is welcome.

  4. #4
    Registered User
    Join Date
    Jan 2006
    Posts
    5
    I was curious too as to what file types are the hardest or easiest to compress, didn't find it in the FAQ link above (or maybe I didn't look hard enough )

  5. #5
    Registered User
    Join Date
    Aug 2003
    Posts
    1,218
    Easiest to compress:
    Either an empty file or a file which consists of only 1 type of byte, for example this little text:
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaa'
    would with RLE be compressed to '5a' or possbly something like '5*a'. This assumes that the ascii number for '5' is 53. So with that you just squezed 53 bytes into 2.

    Hardest to compress are files which uses almost all characters in the ascii set (yes i know, i make an assumption), and where all characters are in a pseudorandom order, and each character can find roughly the same amount of times in the file.
    STL Util a small headers-only library with various utility functions. Mainly for fun but feedback is welcome.

  6. #6
    Slave MadCow257's Avatar
    Join Date
    Jan 2005
    Posts
    735
    I love compression too, this scheme might be good. Since BWT and MTF don't actually compress it might need another step that actually compresses

    -Burroughs Wheeler Transform
    -Move to Front
    -RLE
    -Range Encoder

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. singly linked circular list
    By DarkDot in forum C++ Programming
    Replies: 0
    Last Post: 04-24-2007, 08:55 PM
  2. simultaneously waiting for data on FIFO and UDP using select call
    By yogesh3073 in forum Networking/Device Communication
    Replies: 2
    Last Post: 01-05-2007, 09:53 AM
  3. Program Crashing
    By Pressure in forum C Programming
    Replies: 3
    Last Post: 04-18-2005, 10:28 PM
  4. Errors
    By Rhidian in forum C Programming
    Replies: 10
    Last Post: 04-04-2005, 12:22 PM
  5. Replies: 1
    Last Post: 07-31-2002, 11:35 AM