-
huffman code
hello,
I wish to implement huffman code in C..I know the logic but stuck in implementing..i m finding it pretty difficult , I mean first sorting and counting frequency is ok but then building the tree is where I am totally stuck, and then again sorting?? Pls help if anyone can...
Thanks,
Edesign
-
Well from my memory there are at least a dozen implementations of huffman out there.
The tree is nothing more than a simple binary tree with leafs containing the symbols that
are encoded, and the path to the symbol is the bitwise encoding.
You can refer to wikipedia for the algorithm, or study a ready made implementation
to see how actual details are handled.
-
Hi,
You are right, googling it, I got a few codes, even in that code I am facing problem in understanding sorting part, especially after base nodes are defined...
Code:
// huffman.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <assert.h>
#include <math.h>
#include <string.h>
#define NUMBYTES 256
#define NUMBITS 16
#define INPUTFILENAME "input.huf"
#define OUTPUTFILENAME "output.huf"
#define TEMPFILENAME "huffman.tmp"
typedef struct _HUFFNODE {
char byte;
long freq;
struct _HUFFNODE *parent;
struct _HUFFNODE *left;
struct _HUFFNODE *right;
} HUFFNODE;
typedef struct _HUFFTAB {
char byte;
char numbits;
long freq;
long bits[NUMBITS];
} HUFFTAB;
FILE *f;
long fsize, numnodes, numbasenodes;
long freq_tab[NUMBYTES]; // Frequency table
HUFFNODE *nodelist[NUMBYTES];
HUFFNODE *basenode[NUMBYTES];
HUFFNODE *templist[NUMBYTES];
HUFFTAB code_tab[NUMBYTES];
/////////////////////////////////
// Return file size in bytes //
/////////////////////////////////
long filesize(FILE *f)
{
long pos, fsize;
pos = ftell(f);
fseek(f, 0L, SEEK_END);
fsize = ftell(f);
fseek(f, pos, SEEK_SET);
return fsize;
}
/////////////////////////////////////////////////////////
// Read input file to memory to speed up compression //
/////////////////////////////////////////////////////////
char *huffman_readinputstream(char *inputstream)
{
char *stream;
f = fopen(inputstream, "r+b");
assert(f != NULL);
fsize = filesize(f);
stream = (char *)calloc(fsize, sizeof(char));
assert(stream != NULL);
fread(stream, fsize, 1, f);
fclose(f);
return stream;
}
////////////////////////////
// Make frequency table //
////////////////////////////
void huffman_buildfreq(char *stream)
{
long k;
for (k = 0; k < NUMBYTES; k++)
freq_tab[k] = 0;
for (k = 0; k < fsize; k++)
freq_tab[stream[k]]++;
}
/////////////////////////////////////////////////////////
// Sort node list table using straight insertion sort //
/////////////////////////////////////////////////////////
void huffman_sortnodelist(HUFFNODE *nodes[])
{
HUFFNODE *node;
long i, j, k;
for (i = 0; i < numnodes-1; i++) {
k = i;
node = nodes[i];
for (j = i+1; j < numnodes; j++)
if (nodes[j]->freq < node->freq) { //Surprisingly breaks at this point in 2nd sort...
k = j;
node = nodes[j];
}
if (k != i) {
node = nodes[i];
nodes[i] = nodes[k];
nodes[k] = node;
}
}
}
//////////////////////////
// Build huffman tree //
//////////////////////////
void huffman_buildtree(char *stream)
{
HUFFNODE *node;
HUFFNODE *node1, *node2; // Two nodes with smallest frequency
long k, n;
// Build frequency table
huffman_buildfreq(stream);
// Build base nodes
numbasenodes = 0;
for (k = 0; k < NUMBYTES; k++) {
if (freq_tab[k] > 0) {
node = (HUFFNODE *)calloc(1, sizeof(HUFFNODE));
assert(node != NULL);
node->byte = k;
node->freq = freq_tab[k];
node->parent = NULL;
node->left = NULL;
node->right = NULL;
nodelist[numbasenodes] = node;
basenode[numbasenodes] = node;
templist[numbasenodes] = node;
numbasenodes++;
}
}
// Build huffman tree
numnodes = numbasenodes;
n = numnodes;
while (--n > 0) {
// Sort node list go get two nodes with minor frequency
huffman_sortnodelist(nodelist);
node1 = nodelist[0];
node2 = nodelist[1];
// Make new node and link nodes
node = (HUFFNODE *)calloc(1, sizeof(HUFFNODE));
node->byte = 'x';
node->freq = node1->freq + node2->freq;
node->left = node1;
node->right = node2;
node->parent = NULL;
node1->parent = node;
node2->parent = node;
// Add new node and remove old two
nodelist[0] = node;
for (k = 1; k < n; k++)
nodelist[k] = nodelist[k+1];
nodelist[n] = NULL;
templist[numnodes++] = node;
}
for (k = 0; k < numnodes; k++)
nodelist[k] = templist[k];
}
////////////////////
// Main program //
////////////////////
int main()
{
char *stream;
long k;
// Read input stream
stream = huffman_readinputstream(INPUTFILENAME);
// Build huffman tree
huffman_buildtree(stream);
return 0;
}
have just mentioned sort and tree build part
Code:
for (j = i+1; j < numnodes; j++)
if (nodes[j]->freq < node->freq)
after 1st sort, it always breaks at this point...I can't understand why..
-
Who wrote this code?? :/
First off, their insertion sort sucks. It basically guarantees O(n^2) comparisons. I would suggest replacing it with this:
Code:
void huffman_sortnodelist(HUFFNODE *nodes[])
{
HUFFNODE *node;
long i, j;
for (i = 1; i < numnodes; i++) {
node = nodes[i];
for (j = i-1; j >= 0 && nodes[j]->freq > node->freq; j--)
nodes[j+1] = nodes[j];
nodes[j+1] = node;
}
}
I have no idea what templist is, what its used for, or why its even needed. You can probably get rid of it.
Now here's the main problem:
Code:
// Add new node and remove old two
nodelist[0] = node;
for (k = 1; k < n; k++)
nodelist[k] = nodelist[k+1];
nodelist[n] = NULL;
templist[numnodes++] = node;
Notice, the comment even says that its adding a node and removing two. Explain to me how that should result in incrementing the numnodes count....? The sort routine needs to know how many nodes its dealing with, otherwise it will attempt to access NULL pointers!
[edit] I just checked to see if you replied, and noticed that I had an off-by-one error in the sort. I fixed it. [/edit]
-
Hello Sir,
Thanks for help..
I left it in between..thought I should start with simpler one...so started working on Run length coding..perhaps I can get back to huffman after I sharpen my poor programming skills...
Thanks,
Edesign