Thread: Data structure for french-english dictionary

  1. #1
    Registered User officedog's Avatar
    Join Date
    Oct 2008
    Posts
    77

    Data structure for french-english dictionary

    I am trying to make a french-english dictionary of simple word-word correspondence (I don't need the meaning). However, as I have thought about it, I will probably need some additional tags like
    a) Linguistic category (noun, verb, adjective)
    b) Type category (household, numerical, time, work, shopping)
    Eventually I will try to use this as a data storage for use in a graphical program where I can select a category and it will display random words from that category.

    With my current level of understanding I have thought of two options:

    1. Created a multidimensional array

    e.g.
    Code:
    char dictionary[100][2][20]
    2. Create a struct

    (I have never implemented a struct within a struct, so the following might not be great...)

    Code:
    typedef struct{
        char french[20];
        char english[20];
        char type[20];
        char category[20];
    }wordpair;
    
    typedef struct{
        wordpair word[100];
        int length;
    }dictionary;

    or more simply

    Code:
    typedef struct{
        char word[100][2][20];
        char dictionaryName[20];
        int length;
    }dictionary;
    At one level the data structure does not need to be too complicated as it's simply for me and my partner to add words as we go along - I don't need to popuate it with the entire vocabulary, just a few at a time.

    I realise neither of these is especially elegant in that I am setting aside a whole blob of memory without necessarily using it. However, I'm still not completely confident with malloc/realloc/free yet.

    I'm about to learn something about C++ (especially classes) and am dimly aware also of somethings called vectors and binary trees - would these be a better way to go?

    My questions are about
    a) Improvement of existing ideas - how might I better implement the data structure using either arrays or structs
    b) Advanced (for me that is) - what other techniques might people use, is there a generally agreed "best way" to implement a language dictionary in C.

    Thanks
    Last edited by officedog; 11-01-2008 at 09:12 AM. Reason: indenting the code

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    Well in C++ STL, this is a doddle.

    Code:
    std::map< std::string, std::string > dict;
    dict["dog"] = "chien";
    Though there's nothing stopping you from implementing the same map idea in C if you wanted to.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User officedog's Avatar
    Join Date
    Oct 2008
    Posts
    77
    Thanks Salem. Just finished reading the tutorial on this. It looks like a good option. Your suggestion about implementing map in C - I'm interested in this too, so I started a brief exploration and came up with 'hash tables' - is this the right direction?

    I think I had reasonable success with the struct idea (i.e. it worked). I post the header and some of the function definitions here.

    If anyone has the time, I'd be interested in tips to making the code leaner and criticisms on style. I'm especially interested in how I could trim down the 'dictAddWord' function. I thought it would be the simplest to write, but it just goes on and on and on...

    Code:
    #include <stdio.h>
    #include <string.h>
    
    typedef struct  {
        char french[20];
        char english[20];
    }wordpair;
    
    typedef struct  {
        int nFilesLoaded;
        wordpair word[100];
        int length;
        int randomIndexList[100];
        wordpair randomWord;
        int randomCurrentIndex;// keeps track of current random index
    }dictionary;
    
    void dictNew(dictionary *d);  // initialise dictionary
    void dictPrint(dictionary *d);
    void dictSaveToFile(dictionary *d, char *filename);
    void dictLoadFromFile(dictionary *d, char *filename);
    
    void setRandomList(dictionary *d);
    void setRandomWord(dictionary *d);
    
    void dictAddWord(dictionary *d);
    void dictEdit(dictionary *d);
    wordpair getRandomWord(dictionary *d);
    and some of the function definitions from the source file

    Code:
    void dictSaveToFile(dictionary *d, char *filename)
    {
        FILE *fp;
        fp = fopen(filename, "w");
        int length = d->length;
        for (int i=0; i <length; i++)
        {
            fputs (d->word[i].english, fp);
            fputs(",", fp);
            fputs (d->word[i].french, fp);
            fputs("\n", fp);
        }
        fclose(fp);
    }
    
    void setRandomList(dictionary *d)
    {
        printf("\n\n --- Generating random list ---");
        int index, isDifferent, isInRange;
        d->randomIndexList[0] = rand()%10;
        int count = 1;
        
        while (count < d->length)
        {
            isDifferent = 1;
            isInRange = 0;
            index = rand()%1000 * d->length/100;  
            if (index < d->length)   {     isInRange = 1;      }
    
            for (int i = 0; i < count; i++)
            {
                if(index == d->randomIndexList[i])
                {
                    isDifferent = 0;
                    break;
                }
            }
    
            if (isDifferent == 1 && isInRange == 1)
            {
                d->randomIndexList[count] = index;
                count++;
            }
        }
        for (int i = 0; i <count; i++) { printf("\nIndex %d = Random Number %d", i, d->randomIndexList[i]);   }
        
        d->randomCurrentIndex = 0;  // resets random index to 0
    }
    This is the rambling dictAddWordFunction

    Code:
    void dictAddWord(dictionary *d)
    {
        if (d->length >49)
            printf("\nDictionary is half full");
        if (d->length <99)
        {
            char english[20] = {'\0'};
            char french[20] = {'\0'};
    
            printf("\nAdd word to dictionary");
            printf("\n\n  Type in English Word:  ");
            char letter;
            int i;
            letter=fgetc(stdin);
            for (i = 0;i<20; i++)
            {
                if (letter == '\0' || letter == '\n')
                    break;
                else {
                    english[i] = letter;
                    letter = fgetc(stdin);
                }
            }
        
            printf("\n  Type in French Word for %s:  ", english);
            letter=fgetc(stdin);
    
            for (i = 0;i<19; i++)
            {
                if (letter == '\0' || letter == '\n')
                    break;
                else 
                {
                    french[i] = letter;
                    letter = fgetc(stdin);
                }
            }
            
            strlcpy(d->word[d->length].english, english, 20);
            strlcpy(d->word[d->length].french, french, 20);
            d->length += 1;
        }
        
        else if (d->length >98)
        {
            printf("\n Not enough room in dictionary, make a new one");
        }
       
        printf("\nResetting random list");
        setRandomList(d); // Updates random list, sets currentRandomIndex to 0
    }

  4. #4
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    I don't believe that a direct map like that would be robust enough for converting between two spoken languages. I would do something more like this:

    Example:
    Code:
    enum dictionary_type_t = 
    {
      E_UNDEFINED = 0,
      E_NOUN = 1,
      E_VERB = 2,
      E_ADJECTIVE  = 4
    };
    
    struct dictionary_item_t
    {
      enum dictionary_type_t  type;
      size_t n_english_entry, n_french_entry;
      const char **p_english_entry, p_french_entry;
      struct dictionary_item_t *associated;
    };
    My rationale here is that you can have more than one word that equally translates to another. For either language! Plus you may have words that associate well with another dictionary entry. I don't speak French so I can't help you out with specific examples: but I know in Spanish there are times when this principle proves true.

  5. #5
    Registered User officedog's Avatar
    Join Date
    Oct 2008
    Posts
    77
    I really appreciate your reply master5001. I've been wondering how to use enums as well with the dictionary, so this is doubly useful, I'll have a go with this in my current attempt.

    Just as a point of clarification - did you put size_t as a way of somehow making this version 'platform neutral'. My limited understanding is that this is related to the sizeof function and will require the values in bytes... and something to do with memcpy and other more generic memory functions rather than, say, strcpy. I'll keep reading

    Thanks again

  6. #6
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Its just less typing than unsigned int. I wish I could say my goals were quite so noble... It is generally a good idea to use unsigned types when you are never going to use a negative value. Conversely, sometimes ints are used internally (as is the case with file descriptors) so that one can simply consider a negative number some sort of error.

    The reason for using an enumeration is to have some sort of compile time way of knowing when something could possibly "go wrong." So maybe ditch my semi-advanced double column look-up tables and steal my enum thingy. I hope it helps Ask any follow ups you need.

  7. #7
    Registered Abuser
    Join Date
    Jun 2006
    Location
    Toronto
    Posts
    591
    The most efficient structure I've seen for dictionaries is the Trie and, especially if it's a natural language, the Radix/Patricia Trie.

    I've implemented both now in various projects and can say they are highly robust (I've even designed a third hybrid ADT combining the concepts from both a radix trie and a hash map... may even turn it into a thesis paper if anyone's interested.)

  8. #8
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    I have never written a true dictionary before. I have used tries for compression, and for intellisense type dictionaries. Perhaps if you use tries in conjunction with some sort linked list for synonyms and hash mapping for idioms. I don't believe the OP is trying to make professional dictionary software though.

  9. #9
    Registered User officedog's Avatar
    Join Date
    Oct 2008
    Posts
    77
    I don't think I have a chance of understanding this just yet, but fascinating stuff and definitely something to explore along with the hash tables. If you write thesis on this @anthony, could you also do an "easy read" version!

  10. #10
    Registered User
    Join Date
    Mar 2009
    Posts
    1
    I have seen this database . I didn't find any problem. Because I was very stupid. But I can suppport a website. english to french translation . I think that it's OK.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Include GTK+ just for a data structure?
    By Jesdisciple in forum C Programming
    Replies: 0
    Last Post: 04-12-2009, 07:19 PM
  2. pthread question how would I init this data structure?
    By mr_coffee in forum C Programming
    Replies: 2
    Last Post: 02-23-2009, 12:42 PM
  3. xor linked list
    By adramalech in forum C Programming
    Replies: 23
    Last Post: 10-14-2008, 10:13 AM
  4. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM
  5. vector static abstract data structure in C?
    By Unregistered in forum C Programming
    Replies: 2
    Last Post: 11-05-2001, 05:02 PM