If I first might make a suggestion,
The Compression Book by Mark Nelson and Jean-Loup Gailly is a book I've been looking at lately, and it does have source for pretty much everything.
I'm going to umm, paraphrase their code now, basically an ineffecient version. First, a note on how character frequency is stored, Huffman Encoding does not need the ranking of the characters, just the frequencies.
unsigned long frequency[257]
Also, the tree I'm showing is just gonna do one tree for all the characters (and EOF, which will be element 256. freq[256] = 1 obviously). It shouldn't be too hard to adjust the tree making function to create three trees instead of one, just remember to decode it accordingly. The tree will be represented as an array of 513 nodes. 257 elements can only generate 257 + 256 = 513 nodes. For the first 257 nodes, those nodes will just be leaves of the tree, representing characters. The other 256 nodes will be actuall nodes. I'm gonna use an extra node just as a dummy maximum value.
The function to create the first 257 nodes is trivial (something like, tree[i].freq = frequency[i] 257 times... This is for the other nodes:
Code:
typedef struct
{
int child0;
int child1;
unsigned long freq;
} NODE;
Code:
unsigned int build_tree(NODE * tree)
{
int min1, min2;
int next_node;
int i;
tree[513].freq = ULONG_MAX; //dummy value
for(next_node = 257; ; next_node++) // break is in loop...
{
min1 = 513;
min2 = 513;
// 1. Figure out the node with the least occourance.
// Ignore nodes with 0 occourance.
for (i = 0; i < next_node; i++)
{
if (tree[i] == 0) continue;
if (tree[i].freq < tree[min1].freq) min1 = i;
else if (tree[i].freq < tree[min2].freq) min2 = i;
}
// min1 and min2 index the nodes with the least freq.
// if min2 is still pointing at the dummy val, then there is only 1
// active node left, the tree root.
if (min2 == 513) break;
// The weight of a node is the weight of the nodes below it.
tree[next_node].freq = tree[min1].count + tree[min2].count;
tree[next_node].child0 = min1;
tree[next_node].child1 = min2;
// deactivate nodes min1 and min2...
tree[min1].freq = 0;
tree[min2].freq = 0;
}
// tree is now set.
return min1; // min1 should index root node at this point.
}
Okay, so now we have a tree... I'm not going to pretend that was simple. Encoding the tree is simpler, but recursive, and of course, uses bitwise operators.
Code:
typedef
{
unsigned long code;
int bits;
}
A code is a compressed letter. Bits tells how many bits long the code is. Each character will have its own code, so we'll be using an array of 257 codes.
Code:
void tree2code (NODE * tree, CODE * codes, int node, unsigned long code, int bits)
{
if (node <= 257) // Halting condition.
{
code[node].code = code;
code[node].bits = bits;
}
code <<= 1;
bits++;
tree2code (tree, codes, tree[node].child0, code, bits);
tree2code (tree, codes, tree[node].child1, code|1, bits);
return;
}
The first call to this function, BTW, needs to pass it code as 0 and bits as 0. The function to encode is trivial, just reading the file, and every time you encounter some char c in the input, you write code[char] to the output file. Of course, you'll have to use some binary operators to write the codes, since they will not be byte-sized values (the whole point of compression really).
Again, I like to think I am helpful, but this is pretty hefty stuff, which took me a while to understand. I'm not SURE that this code is compilable, although I've looked it over pretty closely. I really do suggest hounding the best library in your area for books on "data compression", as they'll have more complete, and more efficient, code.