I was always wondering how do all the file compressors work?
I have always thought of writing one. Thanks.
I was always wondering how do all the file compressors work?
I have always thought of writing one. Thanks.
file compression is a very complex area, involving heavy use of science (statistics and mathematics).
a primitive example is this: say we have the (binary) string "0100 1001 0010 0100 10010 0100 1000 0000 0000 0000". this string happens to be 40 bytes long. using a simple algorithm, we could compress it by representing it as "0n3r9100n1r120", only 13 bytes. note this is a fictitious compression algorithm that i just made up. the 'n' stands for "next", the 'r' stands for "repeat". so the compress string could be read as: "0", next 3 characters, repeat 9 times "100", next 1 character repeat 12 times "0".
you can make up your own compression algorithm, as long as there is a way, or you write another program, to decompress a file compressed with your algorithm.
and of course i used binary as a simple example--any language (ie numbers or english) can be compressed. of course, there are limits. if you have a text file in it that has the character "A" in it, this cant be compressed. similarly with a file with contents "ABC".
edit: check out the wikipedia article on file compression for lots more information.
Last edited by nadroj; 10-23-2008 at 06:18 PM.
Have a look at "run length encoding" and "huffman compression" to get started with some simple examples.
> any language (ie numbers or english) can be compressed.
For example, "h4ck3r-spe4k" could be considered lossy compression.
How do they work?
Or why you didn't start with a web search
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper.