Huffman code for single character file?

This is a discussion on Huffman code for single character file? within the C++ Programming forums, part of the General Programming Boards category; >>Are your symbols 8-bit long, or do you use a different alphabet? Yes, I'm encoding the files byte by byte ...

  1. #16
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    >>Are your symbols 8-bit long, or do you use a different alphabet?
    Yes, I'm encoding the files byte by byte (easiest way of reading a file).

    >>How much does your implementation compress an average text file?
    I tried compressing a Bible file (4508 kb), and ended up with a 2677 kb compressed file, or around 41% compression. Although if I didn't put the dummy node in a special case, it would have come to around 3500 bytes (or more, can't remember the exact result). A bitmap file gives considerably less compression, because there's a lot more variation in the symbols used.

    >>I mean, only the 8 most common characters of 256 would be encoded with fewer than 8 bits.
    I don't think that's completely accurate. From the bits and pieces that I've read on the internet, that should only occur when the frequencies follow a fibonnacci sequence, thus causing a completely unbalanced tree (which by the way is the worst-case scenario)... although, in your implementation, it would seem that you're forcing the worst-case scenario in all cases

    Why not just count the number of binary ones until the next binary zero and then use that number to look up the character in an array?
    It's because the huffman codes may mix 1's and 0's; the 1's until 0 approach would only work (as far as I know) in the worst-case scenario where all nodes are on the right side of the tree.

    Prelude:
    Do you have a link I can follow, or a search string? Because I've looked through Google's first two pages of results on "algorithm compression huffman one symbol" as well as "huffman algorithm implementation", and every algorithm I've come across involves merging the bottom nodes until only one remains, then numbering the left and right as 0 and 1... which still doesn't cover the one-symbol possibility, in which the sole symbol at the root node has no code associated with it.
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  2. #17
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >which still doesn't cover the one-symbol possibility, in which the sole symbol at the root node has no code associated with it.
    So make sure that the root node has a code. For extra compression you can add a special case where the root code is ignored if either subtrees are non-empty.
    My best code is written with the delete key.

  3. #18
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    >>So make sure that the root node has a code.
    But the only way I've seen this done is with a special case of sorts (whether it's a recursion depth check or an extra node like I did), and this still does not work for decoding since a code of '0' will mean a traversal to the left, which will be NULL even if the root node has a code. Well, unless of course you're advocating the reverse-dictionary decoding method where you search through the dictionary for each bit read?

    **EDIT**
    Just read that again... You mean make sure it has a code in all instances? In that case Huffman would most certainly NOT be optimal, as I found when I did this and ended up with a compressed bitmap file much larger than the original.

    >>you can add a special case where the root code is ignored if either subtrees are non-empty.
    This is what I did (in reverse... root only has code if subtrees are empty), and I was trying to ask if there was any way to have the same effect without a special case.
    Last edited by Hunter2; 09-06-2004 at 03:34 PM.
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  4. #19
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >and I was trying to ask if there was any way to have the same effect without a special case.
    Oh. In that case I can't think of any offhand.
    My best code is written with the delete key.

  5. #20
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Quote Originally Posted by Hunter2
    I don't think that's completely accurate.
    I looked at that site and saw the tree there:
    ABCDEF
    / \
    (0)F ABCDE
    / \
    (10)E ABCD
    / \
    (110)D ABC
    / \
    (1110)C AB
    / \
    (11110)A B(11111)
    I misunderstood how Huffman worked.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  6. #21
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    Posts
    2,879
    Prelude:
    *wipes forehead* Whew, vindicated

    Ok thanks, I'm just glad that I'm not doing it 'improperly'

    Sang-Drax:
    Yeah, that was my first thought too when I saw the diagram... then I saw some other ones.
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

Page 2 of 2 FirstFirst 12
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File transfer- the file sometimes not full transferred
    By shu_fei86 in forum C# Programming
    Replies: 13
    Last Post: 03-13-2009, 12:44 PM
  2. Formatting a text file...
    By dagorsul in forum C Programming
    Replies: 12
    Last Post: 05-02-2008, 03:53 AM
  3. Need Help Fixing My C Program. Deals with File I/O
    By Matus in forum C Programming
    Replies: 7
    Last Post: 04-29-2008, 07:51 PM
  4. Can we have vector of vector?
    By ketu1 in forum C++ Programming
    Replies: 24
    Last Post: 01-03-2008, 04:02 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21