-
Huffman compression
Hi,
After reading through the "Self-referential structures" chapter in The C Programming Language, I modified the binary tree code in it to count the number of occurances of each byte inside a file. This works great, and now I want to advance it to use this information to perform Huffman compression on the input file.
When looking at examples of Huffman algorithms, I notice that the byte frequencies are usually stored in an array of 256 items (equal to the possible values of a byte). Should I now "flatten" my frequency tree into an array and use that as a base for building a Huffman tree, or is there a way I have not considered that would allow me to utilise the tree in its current form?
-
Say you have something like this
Code:
typedef struct _node {
char ch;
int frequency;
struct _node *left;
struct _node *right;
} node;
I'm guessing that 'ch' is the key at present, so you compare the current char with 'ch' at the current node to decide whether to go left or right.
But this tree will be balanced according to the order of characters in the file (so if 'z' was the first character, then 'z' would be at the root node, and your tree will be off balance.
It seems to me that you need to do two things
1. you need to replace each frequency with the appropriate Huffman code (the most frequent letter having the shortest code).
2. Ideally, the most frequent letter also needs to be at the root node, to minimise the search time.
-
But seeing as Huffman trees are usually built recursively (parents are generated from children and take the sum of their frequencies), would I need to modify the node struct to include a pointer to a parent node?
Code:
typedef struct _node {
char ch;
int frequency;
struct _node *parent;
struct _node * left;
struct _node *right;
} node;