Thread: searching tries

  1. #1
    Registered User
    Join Date
    May 2011
    Posts
    13

    searching tries

    Hey all,

    I'm trying to implement LZW compression in C, and my data structure to store the dictionary is a trie.

    I know that you store a given code for every prefix+char combo, and if the prefix+char combo appears, you output the code when encoding. However, my question is in searching.

    Given a string, I can traverse my trie and figure out what the code is for the string. So if I pass in "hello", I can get to the code for it. Even if I pass in "hell" and 'c' (string and char, respectively), I can get to the code. But, if I pass in the code, I don't know how to get to the bottom.

    For example, let's say my dictionary stores words. I'll initialize the root node with 26 child pointers - 1 for each letter. Now lets say my input is:

    hello

    Then I'll store "he", "el", "ll", and "lo" in my dictionary with codes 27, 28, 29, and 30 respectively. If I want to write a function which takes an int code and returns a string, how would I write that? How do I design my trie?

    Code:
    typedef struct node
    {
    int code;
    unsigned int prefix; //this will also be a code # which represents the prefix string
    char c; //this is the char
    char* alphaChilds[26];
    } node;
    Given a struct node like that, I'm not sure how to write a search function based on codes. Any pointers?

  2. #2
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Traverse your list, find the code, retrieve the string... It's just like searching for the string.

  3. #3
    Registered User
    Join Date
    May 2011
    Posts
    13
    I don't see how I will find the code though. Suppose there is a trie with only one string "he" stored, with code 27, and it's initialized with the alphabet. My function:

    Code:
    char* search(node* root, int code)...
    will attempt to find code 27, but how will it ever get there? As the design stands, the trie is:

    root->h (code 8)->e (code 27)

    If I just pass in code 27, it will never find it. So, how do I circumvent that?

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    So walk the trie. You should be able to come up with an algorithm that will visit every node, and checks as you go.

  5. #5
    Registered User
    Join Date
    May 2011
    Posts
    13
    Isn't that inefficient? I can think of a depth first search to do that, but it would take O(26*k) comparisons where k = levels of the trie in the worst case right? I suppose that's still constant time, but is there a faster way to design it?

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Unless you're doing something weird, I don't think there's any sorting of nodes in your trie (so you don't know ahead of time which way to go to find node 43). Constant time is as good as it gets for unsorted data.

  7. #7
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Or keep a second data structure for the reverse mapping (code -> string). It's been a while since I learned about LZW, so I'm not sure, but I think the codes you pick are usually sequential, starting at 0 or 1. Also, I think there's usually a fixed dictionary size. So sequential numbers + fixed size sounds like a great place to use an array. You could have the array be an array of char * or char array if you just want to look up the string, or you can have it point directly to the node in the trie that corresponds to the string.

  8. #8
    Registered User
    Join Date
    May 2011
    Posts
    13
    Hey fellas,

    I have a subsequent question. I have re-implemented my struct node as such:
    Code:
    typedef struct node {
    int code;
    int prefix;
    char ch;
    
    node** child
    } node;
    I only want to store the child pointers that aren't null, so when I create a node, I have the code:

    Code:
    node->child = malloc(sizeof(node**));
    and when I'm inserting nodes, I just do child[0] = node1, child[1] = node2, ..

    However, I don't think that's the correct way to do it. I get a corrupt heap error, which means I'm overwriting memory that I've already allocated. So, I wanted some conceptual help here. How should I go about only storing the children that aren't null? I basically want memory efficiency.

    I'm updating the OP with this.
    Last edited by Ach1lles; 06-27-2011 at 09:54 AM.

  9. #9
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Why is child a char**? Don't you want it to be a node** instead?

    Doing malloc(sizeof(node**)) gives you, at most, one element in your list (and that only if node* and node** are the same size, which they are going to be unless someone really hates you). If you want more than one, you've got to ask for memory for more than one node* (not node**).

  10. #10
    Registered User
    Join Date
    May 2011
    Posts
    13
    My bad, I fixed the char**.

    Thanks for the suggestion. I fixed my insert function to realloc whenever a new node is added, so the space is efficiently allocated. Appreciate the help!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Searching for a job
    By C_ntua in forum Projects and Job Recruitment
    Replies: 2
    Last Post: 04-25-2010, 09:04 PM
  2. searching
    By neeraj sharma in forum C++ Programming
    Replies: 1
    Last Post: 07-30-2009, 05:37 AM
  3. Searching In C
    By shahsoldier in forum C Programming
    Replies: 12
    Last Post: 12-06-2005, 01:57 AM
  4. Searching
    By TeamGreen in forum C++ Programming
    Replies: 8
    Last Post: 02-20-2004, 08:57 PM
  5. Searching
    By Wanted420 in forum C++ Programming
    Replies: 1
    Last Post: 04-10-2003, 09:49 PM

Tags for this Thread