A Method for the Construction of Minimum-Redundancy Codes
This, of course, was the title of David A. Huffman's now-famous term paper describing one of the most important compression models ever. And even as some algorithms available today far exceed it's capabilities, it nonetheless remains a crucial achievement in the field of information theory that cannot be ignored. Many modern encoders even use Huffman encoding as a preprocessing step due to it's simplicity and general performance characteristics.
The challenge is to implement the algorithm in such a way that produces the smallest output file, and this will be the primary factor in deciding the winner. Bragging rights will also be rewarded to the fastest implementation, as well as the one requiring the least amount of overhead.
There's no prize for being the winner, unfortunately, EXCEPT that if the implementation submitted produces smaller output files* than the solution that I will be posting as a benchmark (which won't be an official entry), the winner WILL receive a $50 Gift Certificate.
Rules are as follows:
At this point, there isn't a formal deadline for submissions. I'm going to give it about a week or so to see who is going to participate, and at that point we can take an informal vote to determine what the deadline should be, based on everyone's schedules.
(1) The entry can be written in any language as long as:
- The source is provided, obviously
- An entry point to a C-style function with the following name and signature is provided:
void encode( const char* in_file, const char* out_file, int mode_is_compress );
The function should read data from in_file and write the output to out_file, compressing the data if mode_is_compress is non-zero, otherwise decompress.
(2) No non-constant global variable are allowed. All of the various data structures needed to process the data must be created and destroyed within the entry point. This restriction applies to external files as well.
(3) No assumptions should be made about the format of an uncompressed file. It may contain any byte value ranging from 0 to 255.
(4) No assumptions should be made about the size of an input file, except that it will some value less than 2^32.
(5) An implementation that crashes during the test will be disqualified.
(6) No third party libraries will be allowed. Standard libraries are fine unless it is determined that they result in some 'unfair' advantage. If in doubt, just ask.
(7) Submissions must be original. No plagerized works allowed.
(8) No entries employing non-Huffman compression schemes will be allowed.
(9) Entries shall be submitted via PM.
I've attached below the benchmark. It is, of course, compressed. :D Full source code will be provided at the end of the contest.
- The implementation must not favor any particular data format (such as text, for example)
- The output file must consistently be more than four bytes smaller than the benchmark. That may sound like an odd requirement, but the reasoning is that the benchmark could have been designed to output four bytes less, but I opted to keep it for the fact that it improved error-detection significantly.