Thread: Trie add_node

  1. #1
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294

    Trie add_node

    I'm about to write a function that adds a node into a trie structure (Trie - Wikipedia, the free encyclopedia). They way I'm using it is exactly like the picture; to store words. The main purpose is an English-to-French dictionary translator, so each node will store a French translation of the English word it represents in the trie. Each English word may have multiple translations, such as the word like. As well, more than one English word can translate to a particular French word.

    Link to problem ( pdf format ): http://ocw.mit.edu/courses/electrica...10_assn06a.pdf

    Code:
    typedef struct s_trie_node
    {
            char * translation; /* NULL if node not a word */
    
    
            /* pointer array to child nodes */
            struct s_trie_node * children[UCHAR_MAX+1];
    } * p_trie_node;
    My issue is that the problem leaves comments as hints as to how to create the function. One comment in particular has me puzzled "Be sure to store...". Here is how it's laid out in the source file:

    Code:
    /* add word to trie, with translation
       input: word and translation
       output: non-zero if new node added, zero otherwise
       postcondition: word exists in trie */
    int add_word(const char * word, char * translation) {
            /* TODO: add word to trie structure
               If word exists, append translation to existing
               string
               Be sure to store a copy of translation, since    ( <- this is the comment I don't get )
               the string is reused by load_dictionary()
             */
    }
    Can someone enlighten me as to why I need to store a copy of the translation, and what do they mean by that? As in, store a copy of every translation that happens?

    Thanks in advance!!

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I believe this note means that if a word and translation is added to the Trie, then data needs to be also written to the file so it can be found next time the dictionary is loaded into the program.

    [Edit] No.
    Last edited by Adak; 04-07-2013 at 02:45 PM.

  3. #3
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294
    This is the load_dictionary function that uses add_word():

    Code:
    /* read dictionary file into trie structure */
    unsigned int load_dictionary(const char * filename) {
        FILE * pfile;
        char line[LINE_MAX], * word, * translation;
        unsigned int icount = 0;
    
    
        /* ensure file can be opened */
        if ( !(pfile = fopen(filename,"r")) )
            return icount;
    
    
        /* read lines */
        while ( (fgets(line, LINE_MAX, pfile)) ) {
            /* strip trailing newline */
            int len = strlen(line);
            if (len > 0 && line[len-1] == '\n') {
                line[len-1] = '\0';
                --len;
            }
    
    
            if (len == 0 || line[0] == '#')
                continue; /* ignore comment/empty lines */
    
    
            /* separate word and translation */
            word = line + strspn(line, DELIMS);
            if ( !word[0] )
                continue; /* no word in line */
            translation = word + strcspn(word, DELIMS);
            *translation++ = '\0';
            translation += strspn(translation, DELIMS);
    
    
            /* add word to trie */
            if (add_word(word, translation))        // here is the only call to add_word
                icount++;
        }
    
    
        /* finish */
        fclose(pfile);
        return icount;
    }
    So this function ( which was provided ) is used to add the words already in the dictionary file into my trie struct in memory, as I understand it. I don't believe the program is supposed to have the functionality of adding words to the dictionary, just look-up english words to display their translations.

    EDIT: ps I worded my question wrongly in my post. I'm not working on adding a node, but adding a word.
    Last edited by jwroblewski44; 04-07-2013 at 02:36 PM.

  4. #4
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294
    Here is a snippet from the dictionary file:

    yoke joug
    yolk jaune d'euf
    you vous
    you on
    you're welcome de rien
    young jeune
    yourself toi-meme
    youth jeunesse
    youth jeune
    yttrium yttrium
    zebra zebre
    zelkova orme de Siberie
    zero zero
    zero mettre
    zinc zinc
    zip fastener fermeture eclair
    zirconium zirconium
    zither cithare
    zone zone
    zoology zoologie

  5. #5
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Yes. I agree. Not the dictionary.

    Take #2:
    When you add a word it's a reminder to add the translation of the word, also - just being redundant here??

  6. #6
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294
    I was kind of thinking they were saying, "Hey! Don't forget the translation!" too. It was just the phrase "reused by load_dict" that phased me, and still kind of does. How is it being reused by load_dict()? The only time I see it being used again is when I might need to concatenate another translation onto the translation string ( as in with the word like, which has both a translation of comme and aimer ).

    Which does bring me to another question. If I were to attempt to concatenate the second string onto the first, what kind of allocation function would I need to allocate just enough extra memory to store my second string?

    EDIT: It sounds like I should be using realloc here instead of malloc?
    Last edited by jwroblewski44; 04-07-2013 at 02:59 PM.

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Is there one single translation field (realloc), or is there an array of pointers, with one pointer to each translation field (malloc)?

  8. #8
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294
    In the struct they gave me ( posted in my OP ), they declared one character pointer for the translation. And since most English words in my dictionary have only one French word translation, I think they want just one translation field. That would then be extended, if it comes across the same word already translated, to fit the next word, and probably a delimiter like a comma.
    The array is the children structs with the next letter in the word.

    Awesome. Thank you for the help. I'll post soon with any updates or further questions.

  9. #9
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294
    And I'm back.

    Inside my add_word function, I am receiving a SIGSEGV. It occurs on this line:

    Code:
            oldlen = strlen( ( pc = pnode->translation ) );
    I have no idea what I'm doing wrong here. If someone could tell me what stupid thing I am doing, I'd be very grateful.

    Code:
    /* add word to trie, with translation
       input: word and translation
       output: non-zero if new node added, zero otherwise
       postcondition: word exists in trie 
    */
    int add_word(const char * word, char * translation) {
            unsigned int i , oldlen , newlen;
            p_trie_node pnode = proot;
            char * pc;
    
    
            for( i = 0; word[ i ] != '\0'; i++ ){
                    if( !( pnode->children[ ( unsigned int )( word[ i ] ) ] ) )
                            pnode->children[ ( unsigned int )( word[ i ] ) ] = new_node();
                    if( !( pnode = pnode->children[ ( unsigned int )( word[ i ] ) ] ) )
                            return 0;
            }
            oldlen = strlen( ( pc = pnode->translation ) );
            newlen = oldlen + strlen( translation ) + ( ( oldlen ) ? 2 : 1 );
            if( !( pnode->translation = realloc( pc , newlen ) ) )
                    return 0;
            if( oldlen ) {
                    pnode->translation[ oldlen + 1 ]  = ',';
                    pc = pnode->translation + oldlen + 2;
            }
            pc = pnode->translation;
            strcpy( pc , translation );
            return 1;
    }
    And here is new_node:

    Code:
    /* allocate new node on the heap
       output: pointer to new node (must be freed) */
    p_trie_node new_node(void) {
            int i;
            p_trie_node pnode = malloc( sizeof( p_trie_node ) );
            if( pnode ) {
                    pnode->translation = NULL;
                    for( i = 0; i < ( UCHAR_MAX + 1 ); i++ )
                            pnode->children[ i ] = NULL;
            }
            return pnode;
    }

  10. #10
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294
    Fixed an error inside add_word, unfortunately it isn't related to my sigsegv.

    Code:
    int add_word(const char * word, char * translation) {
            unsigned int i , oldlen , newlen;
            p_trie_node pnode = proot;
            char * pc;
    
    
            for( i = 0; word[ i ] != '\0'; i++ ){
                    if( !( pnode->children[ ( unsigned int )( word[ i ] ) ] ) )
                            pnode->children[ ( unsigned int )( word[ i ] ) ] = new_node();
                    if( !( pnode = pnode->children[ ( unsigned int )( word[ i ] ) ] ) )
                            return 0;
            }
            oldlen = strlen( ( pc = pnode->translation ) );
            newlen = oldlen + strlen( translation ) + ( ( oldlen ) ? 2 : 1 );
            if( !( pnode->translation = realloc( pc , newlen ) ) )
                    return 0;
            if( oldlen )
                    pnode->translation[ oldlen + 1 ]  = ',';
            pc = pnode->translation + oldlen + ( ( oldlen ) ? 2 : 0 );       //here is the change
            strcpy( pc , translation );
            return 1;
    }
    EDIT: Well in my infinite wisdom, I failed to realize you can't call strlen on a null pointer.

    As for now: so far, so good.
    Last edited by jwroblewski44; 04-08-2013 at 12:16 AM.

  11. #11
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294
    Sigh. Back.

    So now I have an odd problem. Here is my code:

    Code:
    int add_word(const char * word, char * translation) {
            unsigned int i , oldlen , newlen;
            p_trie_node pnode = proot;
            char * pc;
    
    
            for( i = 0; word[ i ] != '\0'; i++ ){
                    if( !( pnode->children[ ( unsigned int )( word[ i ] ) ] ) )
                            pnode->children[ ( unsigned int )( word[ i ] ) ] = new_node();
                    if( !( pnode = pnode->children[ ( unsigned int )( word[ i ] ) ] ) )
                            return 0;
            }
            oldlen = ( ( pc = pnode->translation ) ? strlen( pc ) : 0 );
            newlen = oldlen + strlen( translation ) + ( ( oldlen ) ? 2 : 1 );
            if( !( pnode->translation = realloc( pc , newlen ) ) )
                    return 0;
            if( oldlen )
                    pnode->translation[ oldlen + 1 ]  = ',';
            pc = pnode->translation + oldlen + ( ( oldlen ) ? 2 : 0 );
            strcpy( pc , translation );
            return 1;
    }
    So when I run it with the entire dictionary file, I receive this weird assertion fail from malloc.c:

    redwood@bluebird:~/Dropbox/OpenCoursework/assn06$ ./a.out dict.txt
    a.out: malloc.c:3097: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
    Aborted


    So what I did was create a test case dictionary file with two entries, like so:

    zero[tab]zero[newline]
    zero[tab]mettre[newline] obviously [tab] means a tab, and so forth.

    What doesn't make sense is that when I run it with the small dict file, it at least gets to the point of asking me for a word to translate. And when I give it the word "zero", I only receive the word "zero" back, and not the other entry. But when I trace it with gdb, I successfully place both strings into the node's translation pointer, comma and nul-terminator.... If that makes any sense?

    Any ideas?

  12. #12
    Stoned Witch Barney McGrew's Avatar
    Join Date
    Oct 2012
    Location
    astaylea
    Posts
    420
    I decided to try this myself, so maybe you'd benefit from reading the code: Ideone.com | Online C Compiler & Debugging Tool . I suppose the most notable changes are that I've limited the array size to include only the characters that are used which can save a lot of memory for large dictionaries, and that I've stored each translation in a linked list.

  13. #13
    Registered User
    Join Date
    May 2012
    Posts
    1,066
    Code:
            /* TODO: add word to trie structure
               If word exists, append translation to existing
               string
               Be sure to store a copy of translation, since    ( <- this is the comment I don't get )
               the string is reused by load_dictionary()
             */
    Code:
    /* read dictionary file into trie structure */
    unsigned int load_dictionary(const char * filename) {
        FILE * pfile;
        char line[LINE_MAX], * word, * translation;
    ...
            translation = word + strcspn(word, DELIMS);
            *translation++ = '\0';
            translation += strspn(translation, DELIMS);
    
    
            /* add word to trie */
            if (add_word(word, translation))        // here is the only call to add_word
    In load_dictionary(), the value of "translation" will always point to a character of "line", which itself never changes its base address. Thus when you pass "translation" to add_word(), the pointer will point to a memory block which changes its content. That's why you can't just assign "translation" to the struct members but have to copy the string (using strcpy). That's what is meant with "the string is reused by load_dictionary()" IMHO.

    Bye, Andreas

  14. #14
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294
    @AndiPersti What I am taking away from that is to not alter the actual value of translation being passed, or save the address and reassign at the end of add_word. How's that sound?

    One problem I have with that is:

    Code:
    /* read dictionary file into trie structure */
    unsigned int load_dictionary(const char * filename) {
        FILE * pfile;
        char line[LINE_MAX], * word, * translation;
    ...
          word = line + strspn(line, DELIMS);
                if ( !word[0] )
                     continue; /* no word in line */
    
            translation = word + strcspn(word, DELIMS);
            *translation++ = '\0';
            translation += strspn(translation, DELIMS);
    
    
            /* add word to trie */
            if (add_word(word, translation))        // here is the only call to add_word
    
    In the loop. word takes on the value of the first word in line. Then translation's value is based off of word. So even if I did alter the value of translation inside add_word ( which I don't think I did ), translation inside load_dictionary isn't reused until a new value is assigned to it.

    If that sounds wrong to you, let me know how. I am not assuming I'm correct here.

    @Barney McGrew Thanks for the alternative point of view ( and making me feel bad that you solved it in one day where I am going on my third!! ). I am desperately trying to not use anyone's solution to shape my own, and am also kind of trying to force the use of realloc here. I'm doing the latter because I've never used it before and am trying to understand it's *mysterious ways!*.

    But once I am finished, I will most definitely peruse your code and yell at what I assume is way better code than mine. Also I do like the sounds of your alterations to the structures, but for the sake of simplicity and comparison the the solution code later on, I chose to keep it the way it is.
    Last edited by jwroblewski44; 04-08-2013 at 03:37 PM.

  15. #15
    Stoned Witch Barney McGrew's Avatar
    Join Date
    Oct 2012
    Location
    astaylea
    Posts
    420
    I am desperately trying to not use anyone's solution to shape my own, and am also kind of trying to force the use of realloc here.
    Sounds like a plan.

    I'm doing the latter because I've never used it before and am trying to understand it's *mysterious ways!*.
    One thing to note is that, when realloc returns a null pointer on failure, it doesn't free the old pointer which remains usable. That's one of the tricky things about realloc that's often overlooked.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Trie Problems
    By User Name: in forum C++ Programming
    Replies: 2
    Last Post: 12-12-2011, 08:53 PM
  2. Patricia trie - skipping redundant entries
    By grigorianvlad in forum C++ Programming
    Replies: 4
    Last Post: 09-24-2010, 07:24 PM
  3. trie algorithm question..
    By transgalactic2 in forum C Programming
    Replies: 10
    Last Post: 04-04-2009, 10:26 PM
  4. Hash vs Trie
    By faxo in forum C Programming
    Replies: 2
    Last Post: 08-13-2007, 01:34 AM