>>Are your symbols 8-bit long, or do you use a different alphabet?
Yes, I'm encoding the files byte by byte (easiest way of reading a file).
>>How much does your implementation compress an average text file?
I tried compressing a Bible file (4508 kb), and ended up with a 2677 kb compressed file, or around 41% compression. Although if I didn't put the dummy node in a special case, it would have come to around 3500 bytes (or more, can't remember the exact result). A bitmap file gives considerably less compression, because there's a lot more variation in the symbols used.
>>I mean, only the 8 most common characters of 256 would be encoded with fewer than 8 bits.
I don't think that's completely accurate. From the bits and pieces that I've read on the internet, that should only occur when the frequencies follow a fibonnacci sequence, thus causing a completely unbalanced tree (which by the way is the worst-case scenario)... although, in your implementation, it would seem that you're forcing the worst-case scenario in all cases
It's because the huffman codes may mix 1's and 0's; the 1's until 0 approach would only work (as far as I know) in the worst-case scenario where all nodes are on the right side of the tree.Why not just count the number of binary ones until the next binary zero and then use that number to look up the character in an array?
Do you have a link I can follow, or a search string? Because I've looked through Google's first two pages of results on "algorithm compression huffman one symbol" as well as "huffman algorithm implementation", and every algorithm I've come across involves merging the bottom nodes until only one remains, then numbering the left and right as 0 and 1... which still doesn't cover the one-symbol possibility, in which the sole symbol at the root node has no code associated with it.