searching tries

**Ach1lles** · 06-23-2011

Hey all,

I'm trying to implement LZW compression in C, and my data structure to store the dictionary is a trie.

I know that you store a given code for every prefix+char combo, and if the prefix+char combo appears, you output the code when encoding. However, my question is in searching.

Given a string, I can traverse my trie and figure out what the code is for the string. So if I pass in "hello", I can get to the code for it. Even if I pass in "hell" and 'c' (string and char, respectively), I can get to the code. But, if I pass in the code, I don't know how to get to the bottom.

For example, let's say my dictionary stores words. I'll initialize the root node with 26 child pointers - 1 for each letter. Now lets say my input is:

hello

Then I'll store "he", "el", "ll", and "lo" in my dictionary with codes 27, 28, 29, and 30 respectively. If I want to write a function which takes an int code and returns a string, how would I write that? How do I design my trie?

Code:

typedef struct node
{
int code;
unsigned int prefix; //this will also be a code # which represents the prefix string
char c; //this is the char
char* alphaChilds[26];
} node;

Given a struct node like that, I'm not sure how to write a search function based on codes. Any pointers?

**~~CommonTater~~** · 06-23-2011

Traverse your list, find the code, retrieve the string... It's just like searching for the string.

**Ach1lles** · 06-23-2011

I don't see how I will find the code though. Suppose there is a trie with only one string "he" stored, with code 27, and it's initialized with the alphabet. My function:

Code:

char* search(node* root, int code)...

will attempt to find code 27, but how will it ever get there? As the design stands, the trie is:

root->h (code 8)->e (code 27)

If I just pass in code 27, it will never find it. So, how do I circumvent that?

**tabstop** · 06-23-2011

So walk the trie. You should be able to come up with an algorithm that will visit every node, and checks as you go.

**Ach1lles** · 06-23-2011

Isn't that inefficient? I can think of a depth first search to do that, but it would take O(26*k) comparisons where k = levels of the trie in the worst case right? I suppose that's still constant time, but is there a faster way to design it?

**tabstop** · 06-23-2011

Unless you're doing something weird, I don't think there's any sorting of nodes in your trie (so you don't know ahead of time which way to go to find node 43). Constant time is as good as it gets for unsorted data.

**anduril462** · 06-23-2011

Or keep a second data structure for the reverse mapping (code -> string). It's been a while since I learned about LZW, so I'm not sure, but I think the codes you pick are usually sequential, starting at 0 or 1. Also, I think there's usually a fixed dictionary size. So sequential numbers + fixed size sounds like a great place to use an array. You could have the array be an array of char * or char array if you just want to look up the string, or you can have it point directly to the node in the trie that corresponds to the string.

**Ach1lles** · 06-27-2011

Hey fellas,

I have a subsequent question. I have re-implemented my struct node as such:

Code:

typedef struct node {
int code;
int prefix;
char ch;

node** child
} node;

I only want to store the child pointers that aren't null, so when I create a node, I have the code:

Code:

node->child = malloc(sizeof(node**));

and when I'm inserting nodes, I just do child[0] = node1, child[1] = node2, ..

However, I don't think that's the correct way to do it. I get a corrupt heap error, which means I'm overwriting memory that I've already allocated. So, I wanted some conceptual help here. How should I go about only storing the children that aren't null? I basically want memory efficiency.

I'm updating the OP with this.

**tabstop** · 06-27-2011

Why is child a char**? Don't you want it to be a node** instead?

Doing malloc(sizeof(node**)) gives you, at most, one element in your list (and that only if node* and node** are the same size, which they are going to be unless someone really hates you). If you want more than one, you've got to ask for memory for more than one node* (not node**).

**Ach1lles** · 06-27-2011

My bad, I fixed the char**.

Thanks for the suggestion. I fixed my insert function to realloc whenever a new node is added, so the space is efficiently allocated. Appreciate the help!

Thread: searching tries

Thread Tools

Search Thread

Display

searching tries

Similar Threads

Searching for a job

searching

Searching In C

Searching

Searching

Tags for this Thread