I was always wondering how do all the file compressors work?
I have always thought of writing one. Thanks.
Printable View
I was always wondering how do all the file compressors work?
I have always thought of writing one. Thanks.
file compression is a very complex area, involving heavy use of science (statistics and mathematics).
a primitive example is this: say we have the (binary) string "0100 1001 0010 0100 10010 0100 1000 0000 0000 0000". this string happens to be 40 bytes long. using a simple algorithm, we could compress it by representing it as "0n3r9100n1r120", only 13 bytes. note this is a fictitious compression algorithm that i just made up. the 'n' stands for "next", the 'r' stands for "repeat". so the compress string could be read as: "0", next 3 characters, repeat 9 times "100", next 1 character repeat 12 times "0".
you can make up your own compression algorithm, as long as there is a way, or you write another program, to decompress a file compressed with your algorithm.
and of course i used binary as a simple example--any language (ie numbers or english) can be compressed. of course, there are limits. if you have a text file in it that has the character "A" in it, this cant be compressed. similarly with a file with contents "ABC".
edit: check out the wikipedia article on file compression for lots more information.
Have a look at "run length encoding" and "huffman compression" to get started with some simple examples.
> any language (ie numbers or english) can be compressed.
For example, "h4ck3r-spe4k" could be considered lossy compression.
How do they work?
Or why you didn't start with a web search