I wrote a [C++] program that uses the Huffman compression to compress a text file, and I found that it only compresses a 40K file to about 24K, whereas Winzip can compress it to 9K. I haven't even stored the tree in the file.
Right now, my program inputs the characters from the file and creates a dictionary then a frequency table. I then create a tree out of every character in the dictionary, place them in a priority queue, and then I continually remove and combine the two smallest trees until I only have one tree left. Then I traverse the tree in preorder to create the codes (left=0, right=1). After that, I go through all the characters in the file and get their codes, combine the codes into a string and convert each set of eight characters from binary to decimal. Finally I output each decimal value to a binary file.
Also, my program can only handle ANSI files, and it cannot deal with files that contain Unicode characters (the program can only handle characters in the ASCII table). How could I handle Unicode characters?
I did not include the source because it is about six or seven pages long. I can e-mail it to whoever helps me.