How do file compressors work?

This is a discussion on How do file compressors work? within the C++ Programming forums, part of the General Programming Boards category; I was always wondering how do all the file compressors work? I have always thought of writing one. Thanks....

  1. #1
    Registered User
    Join Date
    Oct 2007
    Posts
    242

    How do file compressors work?

    I was always wondering how do all the file compressors work?
    I have always thought of writing one. Thanks.

  2. #2
    Registered User
    Join Date
    Oct 2006
    Location
    Canada
    Posts
    1,243
    file compression is a very complex area, involving heavy use of science (statistics and mathematics).

    a primitive example is this: say we have the (binary) string "0100 1001 0010 0100 10010 0100 1000 0000 0000 0000". this string happens to be 40 bytes long. using a simple algorithm, we could compress it by representing it as "0n3r9100n1r120", only 13 bytes. note this is a fictitious compression algorithm that i just made up. the 'n' stands for "next", the 'r' stands for "repeat". so the compress string could be read as: "0", next 3 characters, repeat 9 times "100", next 1 character repeat 12 times "0".

    you can make up your own compression algorithm, as long as there is a way, or you write another program, to decompress a file compressed with your algorithm.

    and of course i used binary as a simple example--any language (ie numbers or english) can be compressed. of course, there are limits. if you have a text file in it that has the character "A" in it, this cant be compressed. similarly with a file with contents "ABC".

    edit: check out the wikipedia article on file compression for lots more information.
    Last edited by nadroj; 10-23-2008 at 07:18 PM.

  3. #3
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    Have a look at "run length encoding" and "huffman compression" to get started with some simple examples.

  4. #4
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    > any language (ie numbers or english) can be compressed.

    For example, "h4ck3r-spe4k" could be considered lossy compression.

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,854
    How do they work?

    Or why you didn't start with a web search
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Problems passing a file pointer to functions
    By smitchell in forum C Programming
    Replies: 4
    Last Post: 09-30-2008, 03:29 PM
  2. help with text input
    By Alphawaves in forum C Programming
    Replies: 8
    Last Post: 04-08-2007, 05:54 PM
  3. Encryption program
    By zeiffelz in forum C Programming
    Replies: 1
    Last Post: 06-15-2005, 04:39 AM
  4. Simple File encryption
    By caroundw5h in forum C Programming
    Replies: 2
    Last Post: 10-13-2004, 11:51 PM
  5. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 10:54 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21