Huffman code for single character file?

**Hunter2** · 09-06-2004

>>Are your symbols 8-bit long, or do you use a different alphabet?
Yes, I'm encoding the files byte by byte (easiest way of reading a file).

>>How much does your implementation compress an average text file?
I tried compressing a Bible file (4508 kb), and ended up with a 2677 kb compressed file, or around 41% compression. Although if I didn't put the dummy node in a special case, it would have come to around 3500 bytes (or more, can't remember the exact result). A bitmap file gives considerably less compression, because there's a lot more variation in the symbols used.

>>I mean, only the 8 most common characters of 256 would be encoded with fewer than 8 bits.
I don't think that's completely accurate. From the bits and pieces that I've read on the internet, that should only occur when the frequencies follow a fibonnacci sequence, thus causing a completely unbalanced tree (which by the way is the worst-case scenario)... although, in your implementation, it would seem that you're forcing the worst-case scenario in all cases

Why not just count the number of binary ones until the next binary zero and then use that number to look up the character in an array?

It's because the huffman codes may mix 1's and 0's; the 1's until 0 approach would only work (as far as I know) in the worst-case scenario where all nodes are on the right side of the tree.

Prelude:
Do you have a link I can follow, or a search string? Because I've looked through Google's first two pages of results on "algorithm compression huffman one symbol" as well as "huffman algorithm implementation", and every algorithm I've come across involves merging the bottom nodes until only one remains, then numbering the left and right as 0 and 1... which still doesn't cover the one-symbol possibility, in which the sole symbol at the root node has no code associated with it.

**Prelude** · 09-06-2004

>which still doesn't cover the one-symbol possibility, in which the sole symbol at the root node has no code associated with it.
So make sure that the root node has a code. For extra compression you can add a special case where the root code is ignored if either subtrees are non-empty.

**Hunter2** · 09-06-2004

>>So make sure that the root node has a code.
But the only way I've seen this done is with a special case of sorts (whether it's a recursion depth check or an extra node like I did), and this still does not work for decoding since a code of '0' will mean a traversal to the left, which will be NULL even if the root node has a code. Well, unless of course you're advocating the reverse-dictionary decoding method where you search through the dictionary for each bit read?

**EDIT**
Just read that again... You mean make sure it has a code in all instances? In that case Huffman would most certainly NOT be optimal, as I found when I did this and ended up with a compressed bitmap file much larger than the original.

>>you can add a special case where the root code is ignored if either subtrees are non-empty.
This is what I did (in reverse... root only has code if subtrees are empty), and I was trying to ask if there was any way to have the same effect without a special case.

**Prelude** · 09-06-2004

>and I was trying to ask if there was any way to have the same effect without a special case.
Oh. In that case I can't think of any offhand.

**Sang-drax** · 09-06-2004

Originally Posted by Hunter2

I don't think that's completely accurate.

I looked at that site and saw the tree there:

ABCDEF
/ \
(0)F ABCDE
/ \
(10)E ABCD
/ \
(110)D ABC
/ \
(1110)C AB
/ \
(11110)A B(11111)

I misunderstood how Huffman worked.

**Hunter2** · 09-06-2004

Prelude:
*wipes forehead* Whew, vindicated

Ok thanks, I'm just glad that I'm not doing it 'improperly'

Sang-Drax:
Yeah, that was my first thought too when I saw the diagram... then I saw some other ones.

Thread: Huffman code for single character file?

Thread Tools

Search Thread

Display

Similar Threads

File transfer- the file sometimes not full transferred

Formatting a text file...

Need Help Fixing My C Program. Deals with File I/O

Can we have vector of vector?