1. ## huffman code

hello,

I wish to implement huffman code in C..I know the logic but stuck in implementing..i m finding it pretty difficult , I mean first sorting and counting frequency is ok but then building the tree is where I am totally stuck, and then again sorting?? Pls help if anyone can...

Thanks,
Edesign

2. Well from my memory there are at least a dozen implementations of huffman out there.
The tree is nothing more than a simple binary tree with leafs containing the symbols that
are encoded, and the path to the symbol is the bitwise encoding.
You can refer to wikipedia for the algorithm, or study a ready made implementation
to see how actual details are handled.

3. Hi,

You are right, googling it, I got a few codes, even in that code I am facing problem in understanding sorting part, especially after base nodes are defined...

Code:
```// huffman.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <assert.h>
#include <math.h>
#include <string.h>

#define         NUMBYTES                          256
#define         NUMBITS                            16
#define         INPUTFILENAME             "input.huf"
#define         OUTPUTFILENAME           "output.huf"
#define         TEMPFILENAME            "huffman.tmp"

typedef struct _HUFFNODE {
char              byte;
long              freq;
struct _HUFFNODE *parent;
struct _HUFFNODE *left;
struct _HUFFNODE *right;
} HUFFNODE;

typedef struct _HUFFTAB {
char              byte;
char              numbits;
long              freq;
long              bits[NUMBITS];
} HUFFTAB;

FILE       *f;
long       fsize, numnodes, numbasenodes;
long       freq_tab[NUMBYTES];    // Frequency table
HUFFNODE  *nodelist[NUMBYTES];
HUFFNODE  *basenode[NUMBYTES];
HUFFNODE  *templist[NUMBYTES];
HUFFTAB    code_tab[NUMBYTES];

/////////////////////////////////
//  Return file size in bytes  //
/////////////////////////////////
long filesize(FILE *f)
{
long pos, fsize;

pos = ftell(f);
fseek(f, 0L, SEEK_END);
fsize = ftell(f);
fseek(f, pos, SEEK_SET);

return fsize;
}

/////////////////////////////////////////////////////////
//  Read input file to memory to speed up compression  //
/////////////////////////////////////////////////////////
{
char *stream;

f = fopen(inputstream, "r+b");
assert(f != NULL);
fsize = filesize(f);

stream = (char *)calloc(fsize, sizeof(char));
assert(stream != NULL);

fclose(f);

return stream;
}

////////////////////////////
//  Make frequency table  //
////////////////////////////
void huffman_buildfreq(char *stream)
{
long  k;

for (k = 0; k < NUMBYTES; k++)
freq_tab[k] = 0;

for (k = 0; k < fsize; k++)
freq_tab[stream[k]]++;
}

/////////////////////////////////////////////////////////
//  Sort node list table using straight insertion sort //
/////////////////////////////////////////////////////////
void huffman_sortnodelist(HUFFNODE *nodes[])
{
HUFFNODE  *node;
long       i, j, k;

for (i = 0; i < numnodes-1; i++) {
k = i;
node = nodes[i];

for (j = i+1; j < numnodes; j++)
if (nodes[j]->freq < node->freq) {     //Surprisingly breaks at this point in 2nd sort...
k = j;
node = nodes[j];
}

if (k != i) {
node = nodes[i];
nodes[i] = nodes[k];
nodes[k] = node;
}
}
}

//////////////////////////
//  Build huffman tree  //
//////////////////////////
void huffman_buildtree(char *stream)
{
HUFFNODE  *node;
HUFFNODE  *node1, *node2;  // Two nodes with smallest frequency
long       k, n;

// Build frequency table
huffman_buildfreq(stream);

// Build base nodes
numbasenodes = 0;
for (k = 0; k < NUMBYTES; k++) {
if (freq_tab[k] > 0) {

node = (HUFFNODE *)calloc(1, sizeof(HUFFNODE));
assert(node != NULL);

node->byte = k;
node->freq = freq_tab[k];
node->parent = NULL;
node->left = NULL;
node->right = NULL;

nodelist[numbasenodes] = node;
basenode[numbasenodes] = node;
templist[numbasenodes] = node;

numbasenodes++;
}
}

// Build huffman tree
numnodes = numbasenodes;
n = numnodes;
while (--n > 0) {

// Sort node list go get two nodes with minor frequency
huffman_sortnodelist(nodelist);

node1 = nodelist[0];
node2 = nodelist[1];

// Make new node and link nodes
node = (HUFFNODE *)calloc(1, sizeof(HUFFNODE));

node->byte = 'x';
node->freq = node1->freq + node2->freq;
node->left = node1;
node->right = node2;
node->parent = NULL;

node1->parent = node;
node2->parent = node;

// Add new node and remove old two
nodelist[0] = node;
for (k = 1; k < n; k++)
nodelist[k] = nodelist[k+1];
nodelist[n] = NULL;

templist[numnodes++] = node;

}

for (k = 0; k < numnodes; k++)
nodelist[k] = templist[k];
}
////////////////////
//  Main program  //
////////////////////
int main()
{
char   *stream;
long    k;

// Build huffman tree
huffman_buildtree(stream);

return 0;
}```
have just mentioned sort and tree build part

Code:
```for (j = i+1; j < numnodes; j++)
if (nodes[j]->freq < node->freq)```
after 1st sort, it always breaks at this point...I can't understand why..

4. Who wrote this code?? :/

First off, their insertion sort sucks. It basically guarantees O(n^2) comparisons. I would suggest replacing it with this:
Code:
```void huffman_sortnodelist(HUFFNODE *nodes[])
{
HUFFNODE  *node;
long       i, j;

for (i = 1; i < numnodes; i++) {
node = nodes[i];

for (j = i-1; j >= 0 && nodes[j]->freq > node->freq; j--)
nodes[j+1] = nodes[j];

nodes[j+1] = node;
}
}```
I have no idea what templist is, what its used for, or why its even needed. You can probably get rid of it.

Now here's the main problem:
Code:
```    // Add new node and remove old two
nodelist[0] = node;
for (k = 1; k < n; k++)
nodelist[k] = nodelist[k+1];
nodelist[n] = NULL;

templist[numnodes++] = node;```
Notice, the comment even says that its adding a node and removing two. Explain to me how that should result in incrementing the numnodes count....? The sort routine needs to know how many nodes its dealing with, otherwise it will attempt to access NULL pointers!

 I just checked to see if you replied, and noticed that I had an off-by-one error in the sort. I fixed it. [/edit]

5. Hello Sir,

Thanks for help..
I left it in between..thought I should start with simpler one...so started working on Run length coding..perhaps I can get back to huffman after I sharpen my poor programming skills...

Thanks,
Edesign