Thread: help for novice re: Huffman tree

  1. #1
    Registered User
    Join Date
    Mar 2013
    Posts
    29

    Question help for novice re: Huffman tree

    I'm trying to code a Huffman tree header, with the ADT defined from an old on-line course in C programming. I'm stumped by the following instructions:
    MAKING A CODE TABLE: Once the Huffman tree is built---
    We need to make a code table from the tree that contains the bitcode for each character. We create this
    table by traversing the tree and keeping track of the left branches we take (we designate this a 0) and the
    right branches we take (the opposite bit sense - a 1) until we reach a leaf node. Once we reach a leaf, we
    have a full code for the character at the leaf node which is simply the string of left and right moves which
    we have strung together along with the code’s length. The output of this procedure is an array, indexed
    by character (a number between 0 and 127 for a 7 bit ascii character), that contains an entry for the coded
    bits and the code’s length.
    I've already made a character-frequency array, and made the tree itself, but do not understand how to construct something such as:
    Code:
    for (int i = 0;i < 127; i++) {
    path_table[i] = path;
    }
    where path would be the appropriate sequence of 1's and 0's.
    For example, what type would the path_table array have? Do I need a special function to write successive bits into path, as opposed to chars?
    I hope that my question is clear, and thus answerable.
    Last edited by atlantis43; 03-03-2015 at 04:52 PM. Reason: typo

  2. #2
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    You need to traverse the tree and track the byte code generated. Start with an empty variable (all 0 bits), shift in a 0 or 1 when you go left/right, and count how many levels down the tree you go. When you get to a leaf (all your characters will be leaves), figure out the character, and store the bit value and bit length in there. I recommend you define a struct that contains two elements: bit_sequence and sequence_length.

    Then, when you do your encoding, you may need to combine the bit sequences of several characters into a single byte. Read up on bitwise operations: Tutorials - Bitwise Operators and Bit Manipulations in C and C++ - Cprogramming.com. You will be using shift and OR at the least. Make sure you always use unsigned types when shifting and doing other bitwise operations.

    A quick example, if 'e' is the sequence 01 and f the sequence 110, then to encode "eff", you would have one single byte, with the bits 01110110.

    Also, you probably already did this, but read the Wikipedia page: Huffman coding - Wikipedia, the free encyclopedia. Lots of good examples, diagrams and such on there.

  3. #3
    Registered User
    Join Date
    Mar 2013
    Posts
    29
    Quote Originally Posted by anduril462 View Post
    You need to traverse the tree and track the byte code generated. Start with an empty variable (all 0 bits)
    Guess I have to scrutinize those links (which I've previously seen), but what type of empty variable is needed to hold the bits?
    Wouldn't unsigned int be 4 bytes for each of 256 elements: or is that inconsequential for the process?

  4. #4
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by atlantis43 View Post
    but what type of empty variable is needed to hold the bits?
    Your problem says: "array, indexed by character (a number between 0 and 127 for a 7 bit ascii character), that contains an entry for the coded bits and the code’s length."

    So, I'd use
    Code:
    typedef struct {
        unsigned int path;
        unsigned char length;
    } huffman_code;
    An unsigned int has (CHAR_BIT*sizeof (unsigned int)) bits, and an unsigned char can hold values between 0 and UCHAR_MAX, inclusive.

    So, an array of huffman_code entries can describe any Huffman tree whose maximum depth does not exceed (CHAR_BIT*sizeof (unsigned int)) or UCHAR_MAX (whichever is smaller applies). These constants are defined in limits.h header file.

    Let's assume you have a table of the above codes, and you initialize it to "no path/no code":
    Code:
        #include <limits.h>
    
        huffman_code table[UCHAR_MAX + 1];
        int i;
    
        for (i = 0; i <= UCHAR_MAX; i++) {
            table[i].path = 0U; /* Same as (unsigned int)0 */
            table[i].length = 0U; /* The U just means unsigned */
        }
    and you're working on tree
    help for novice re: Huffman tree-625px-huffman_tree_2-svg-png
    Consider the leaf specifying x: The path from root to it is right-left-left-right-left, or 10010, five steps long. Since 10010 in binary is 16+2=18 in decimal, you should have
    Code:
        table['x'].path = 18;
        table['x'].length = 5;
    Here's the trick: instead of looping over the 128 characters to fill the table, traverse your tree, keeping track of the path (left or right, and how many steps) taken to reach the current node from the root. Whenever you reach a leaf node, you fill in the corresponding table entry. As you traverse the entire tree, you fill the table.

    Assuming your tree is correctly built, the entries that do not get defined while traversing the tree, will not be used at all. (Which is why it is a good idea to initialize them as "no path".)

    Questions?
    Last edited by Nominal Animal; 03-03-2015 at 08:56 PM.

  5. #5
    Registered User
    Join Date
    Mar 2013
    Posts
    29
    This looks like a great link for my purposes (and novice level).
    Thanks again.

  6. #6
    Registered User
    Join Date
    Mar 2013
    Posts
    29
    Nominal Animal;
    thanks for your clear & detailed explanation. This, together with the above-mentioned C Programming link should help me undo my ignorance.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Huffman
    By amap100 in forum C Programming
    Replies: 3
    Last Post: 12-29-2012, 08:25 PM
  2. huffman tree
    By blazer26 in forum C Programming
    Replies: 0
    Last Post: 04-25-2006, 03:39 PM
  3. Replies: 10
    Last Post: 06-17-2005, 10:00 PM
  4. How can I traverse a huffman tree
    By carrja99 in forum C++ Programming
    Replies: 3
    Last Post: 04-28-2003, 05:46 PM
  5. yet another huffman tree problem
    By ygfperson in forum C++ Programming
    Replies: 2
    Last Post: 04-18-2002, 01:20 AM