Thread: Help with a program to spell check a file

  1. #16
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Let's keep it real. If he had studied the data structures you mentioned thoroughly and practiced with them, he wouldn't be asking for help.

    I'm proposing a regular 2D char array for the words, with a binary search function. It's simpler, using just an array and no other data struct. It's also fast, as is your hash table.

    Yours is the better instructional method. Mine is the simpler and faster to finish. Both are quite good.

  2. #17
    Stoned Witch Barney McGrew's Avatar
    Join Date
    Oct 2012
    Location
    astaylea
    Posts
    420
    Quote Originally Posted by Adak View Post
    Let's keep it real. If he had studied the data structures you mentioned thoroughly and practiced with them, he wouldn't be asking for help.
    He wanted to know the best way to load the dictionary into memory and a hash table was one of the examples he listed. I'd say it is indeed the best option among the data structures that have been mentioned so far, and if he were incapable of implementing it I don't think he'd have listed it as an example.

    I'm proposing a regular 2D char array for the words, with a binary search function. It's simpler, using just an array and no other data struct.
    I think the difficulty of implementing a hash table has been overexaggerated. It really needn't be any more complicated than this:

    Code:
     struct word {
        char *word;
        struct word *next;
    };
    
    static struct word *table[32768];
    
    static unsigned int hash(const char *s)
    {
        unsigned int r;
    
        for (r = 0; *s; r = (unsigned char) *s++ + r * 31)
            ;
        return r % (sizeof table / sizeof table[0]);
    }
    
    
    int intable(const char *s)
    {
        for (const struct word *wp = table[hash(s)]; wp; wp = wp->next)
            if (!strcmp(s, wp->word))
                return 1;
        return 0;
    }
    
    
    void insert(struct word *wp)
    {
        unsigned int h = hash(wp->word);
    
        wp->next = table[h];
        table[h] = wp;
    }
    And if you want to count the code for handling the allocation:

    Code:
    struct word *new(const char *s)
    {
        struct word *wp;
    
        if (!(wp = malloc(sizeof *wp)) || !(wp->word = malloc(strlen(s) + 1)))
            handle_error;
        strcpy(wp->word, s);
        return wp;
    }
    Yours is the better instructional method. Mine is the simpler and faster to finish. Both are quite good.
    Is it really important for it to be faster to finish? It's a homework assignment after all and, judging by OP's initial post, appropriate selection of a data structure seems to be one of the key tasks. Though, with that said, I still doubt it's faster to finish.

  3. #18
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    ... and if he were incapable of implementing it I don't think he'd have listed it as an example.
    That's exactly WHY he listed it. He doesn't know how to implement it, he's just heard the term before.

    Asking about something, and knowing about it, are two different things. They don't usually go together, in the same person, at the same time.

  4. #19
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    For this, I have always been fond of going for an array of arrays, where the outer array is just a pointer to a second array (and its length) which has all works of same length, in order, in it. That way, you find the length, then do a binary search over the words of that length.

    This occupies the least memory. Heck you don't even need to store the null-char of each word. It also makes for ultra fast dictionary loading, if you pre-process it into the right order.

    But yeah for a newbie I'd recommend just a single array to binary search.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  5. #20
    Stoned Witch Barney McGrew's Avatar
    Join Date
    Oct 2012
    Location
    astaylea
    Posts
    420
    Quote Originally Posted by Adak View Post
    That's exactly WHY he listed it. He doesn't know how to implement it, he's just heard the term before.
    You don't know that for sure.

    Asking about something, and knowing about it, are two different things. They don't usually go together, in the same person, at the same time.
    Right, but it's one thing to be able to implement a hash table and another to know when its use is beneficial. The former is usually taught in introductory programming classes and references (K&R2 for instance), and the latter is really only learned by analysing the limitations and growth rates of various data structures.

  6. #21
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73

    here is my code so far

    Hi, here is what I have so far, am I on the right track??? any hits/tips???

    Code:
    {
        int number_of_words = 0;
        FILE* inptr = fopen(dictionary, "r");
        if (inptr == NULL)
        {
            printf("Sorry, me noes able to open the file!!!\n");
            return false;
        }
        else
        {
        while(!feof(inptr))
        {
            fscanf(inptr, s);
            number_of_words ++;
        }
        char words[28][number_of_words];
        int times_written = 0;
        while(!feof(inptr))
        {
            fscanf(inptr, s, words[times_written])
            times_written ++;
        }
        }
        return true;
    }

    One problem that I know about is "s" as a modifier to fscanf is not being recognized, am I implementing it incorrectly???

  7. #22
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    Quote Originally Posted by Barney McGrew View Post
    You don't know that for sure.

    I am actually a noob, and am just learning, so I am going with the simplest system for now, BUT the help you have posted it GREAT and I will definitely be looking back at it when I have some free time as I would like to get into some more complicated data structures eventually.

    Quote Originally Posted by Barney McGrew View Post
    He wanted to know the best way to load the dictionary into memory and a hash table was one of the examples he listed. I'd say it is indeed the best option among the data structures that have been mentioned so far, and if he were incapable of implementing it I don't think he'd have listed it as an example.
    Also, I have actually never implemented a hash table in code before, only I know theoretically how it works.
    Last edited by Dude22; 03-13-2013 at 07:21 PM.

  8. #23
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by Dude22 View Post
    Hi, here is what I have so far, am I on the right track??? any hits/tips???


    Code:
    {
        int number_of_words = 0;
        FILE* inptr = fopen(dictionary, "r"); //use "dictionary", maybe "dictionary.txt" - you need the double quotes around the file name
        if (inptr == NULL)
        {
            printf("Sorry, me noes able to open the file!!!\n"); 
            return false;
        }
        else  //you don't need the else at all.
        {
       //    while(!feof(inptr))  don't use feof(). It doesn't work as you expect.
       while((fgets(s, sizeof(s), inptr)) != NULL) 
    
        {
            //fscanf(inptr, s);
            number_of_words ++;
        }
        char words[number of words][29];
        int times_written = 0;
        
        /*What would be very nice is to put the word counter, into a function, then call the function twice, with an int count parameter.
    When count==1, you count the words. When count==0, you read the words from the file. */
    // nope = use the fgets() above    while(!feof(inptr))
        {
    //        fscanf(inptr, s, words[times_written])
            times_written ++;
            //add code to remove the end of string char from s, here
        }
        }
        return true;
    }

    One problem that I know about is "s" as a modifier to fscanf is not being recognized, am I implementing it incorrectly???

    No, fscanf() needs "%s". note the double quotes around the format string. But don't use fscanf() anyway. Use
    Code:
    #define SIZE 29    //before main, just below your include file list.
    
    while((fgets(s, SIZE, inptr)) != NULL)

  9. #24
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    Quote Originally Posted by Adak View Post
    Code:
    {
        int number_of_words = 0;
        FILE* inptr = fopen(dictionary, "r"); //use "dictionary", maybe "dictionary.txt" - you need the double quotes around the file name
        if (inptr == NULL)
        {
            printf("Sorry, me noes able to open the file!!!\n"); 
            return false;
        }
        else  //you don't need the else at all.
        {
       //    while(!feof(inptr))  don't use feof(). It doesn't work as you expect.
       while((fgets(s, sizeof(s), inptr)) != NULL) 
    
        {
            //fscanf(inptr, s);
            number_of_words ++;
        }
        char words[number of words][29];
        int times_written = 0;
        
        /*What would be very nice is to put the word counter, into a function, then call the function twice, with an int count parameter.
    When count==1, you count the words. When count==0, you read the words from the file. */
    // nope = use the fgets() above    while(!feof(inptr))
        {
    //        fscanf(inptr, s, words[times_written])
            times_written ++;
            //add code to remove the end of string char from s, here
        }
        }
        return true;
    }
    Ok, I understand all your recommended changes, and have made them.

    One problem that I know about is "s" as a modifier to fscanf is not being recognized, am I implementing it incorrectly???

    No, fscanf() needs "%s". note the double quotes around the format string. But don't use fscanf() anyway. Use
    Code:
    #define SIZE 29    //before main, just below your include file list.
    
    while((fgets(s, SIZE, inptr)) != NULL)
    The part I don't quite get is what to use to count the number of words, or what to use to copy the words into an array.

    I understand how fgets works but I am not sure where each modifier should go.... is it: fgets(words[times_written], inptr)

    And if so, would that work for both counting and copying???

  10. #25
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    This is the kind of thing I had in mind - just a skeleton now, but study it.
    Code:
    #include <stdio.h>
    #include <string.h>
    
    #define MAXWORDS 26
    #define LENGTH 29
    
    int input(FILE *fp,char *words[LENGTH],int getData);
    
    int main(void) {
       int count;
       char buff[BUFSIZ];  //BUFSIZ or BUFSIZE is a macro for your system - usually 256 or 512 char's in size. A "natural" buffer length, for your system.
    
       char **words;
       FILE *fp=fopen("words1.txt","r");
       
       count=input(fp,words,0);   //just counting this time
       rewind(fp);              //going back to the start of the file
    
       //malloc the right number of words here
       input(fp,words,1);   //now getting the words
     
       //all the other stuff, here (mostly calling some functions)
       
       printf("%s\n",buff);
    
       return 0;
    }
    int input(FILE *fp, char *words[LENGTH], int getData) {
       int i=0;
       char buff[128];
       while((fgets(buff, BUFSIZ, fp)) != NULL) {
          
          if(getData) {
             //remove the newline here
             strcpy(words[i],buff);
          }
          ++i;
       }
       if(getData==1)
          return i;
       else
          return -1;
    }
    How many words do you expect to have in your dictionary?
    Last edited by Adak; 03-14-2013 at 01:33 AM.

  11. #26
    Stoned Witch Barney McGrew's Avatar
    Join Date
    Oct 2012
    Location
    astaylea
    Posts
    420
    Quote Originally Posted by Dude22 View Post
    One problem that I know about is "s" as a modifier to fscanf is not being recognized, am I implementing it incorrectly???
    You really can't learn C by trial and error. You need to read its specifications (in this case, those of fscanf), so I suggest downloading the latest draft prior to the publication of C11: http://www.open-std.org/JTC1/SC22/wg...docs/n1570.pdf

    I am actually a noob, and am just learning, so I am going with the simplest system for now, BUT the help you have posted it GREAT and I will definitely be looking back at it when I have some free time as I would like to get into some more complicated data structures eventually.
    All you need to do is read 2-3 pages in K&R2 (section 6.6) and you'll know how to write one in C. It's not as complicated as you may think, and if you go out of your way to learn about them you might gain a more enlightened perspective on programming.

  12. #27
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    Quote Originally Posted by Adak View Post
    This is the kind of thing I had in mind - just a skeleton now, but study it.
    Code:
    #include <stdio.h>
    #include <string.h>
    
    #define MAXWORDS 26
    #define LENGTH 29
    
    int input(FILE *fp,char *words[LENGTH],int getData);
    
    int main(void) {
       int count;
       char buff[BUFSIZ];  //BUFSIZ or BUFSIZE is a macro for your system - usually 256 or 512 char's in size. A "natural" buffer length, for your system.
    
       char **words;
       FILE *fp=fopen("words1.txt","r");
       
       count=input(fp,words,0);   //just counting this time
       rewind(fp);              //going back to the start of the file
    
       //malloc the right number of words here
       input(fp,words,1);   //now getting the words
     
       //all the other stuff, here (mostly calling some functions)
       
       printf("%s\n",buff);
    
       return 0;
    }
    int input(FILE *fp, char *words[LENGTH], int getData) {
       int i=0;
       char buff[128];
       while((fgets(buff, BUFSIZ, fp)) != NULL) {
          
          if(getData) {
             //remove the newline here
             strcpy(words[i],buff);
          }
          ++i;
       }
       if(getData==1)
          return i;
       else
          return -1;
    }
    How many words do you expect to have in your dictionary?
    Awesome thanks so much, I have looked through the code and I understand most of it, but I have a few questions...

    1: Why is "words" declared 2 times, once at **words and another time as *words???

    2: Related the the first question, line 16 gives me an error, when I try to compile as "the variable words is not initialized"...

    3: When using malloc "void* malloc ();" what should I give it to allocate? (I think this will be easier to understand for me, once question 1 and 2 are cleared up)

    NOTE: There will be 143,091 words in the dictionary

    Thanks,
    Josh

  13. #28
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    Quote Originally Posted by Barney McGrew View Post
    You really can't learn C by trial and error. You need to read its specifications (in this case, those of fscanf), so I suggest downloading the latest draft prior to the publication of C11: http://www.open-std.org/JTC1/SC22/wg...docs/n1570.pdf


    All you need to do is read 2-3 pages in K&R2 (section 6.6) and you'll know how to write one in C. It's not as complicated as you may think, and if you go out of your way to learn about them you might gain a more enlightened perspective on programming.
    I would LOVE to learn more, but for now I am going to try what "Adak" recommended. I understand that I might be easy(ish) and I AM going to come back to this and try out different data types latter, ONCE I get it working with a more basic system.

  14. #29
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Awesome thanks so much, I have looked through the code and I understand most of it, but I have a few questions...

    1: Why is "words" declared 2 times, once at **words and another time as *words???
    Just testing if you're awake! *words[] is conceptually VERY CLOSE to the same as **words. It conveys to me the idea that I'm not sure which exact format will be finally used.

    Or that I had one shot too many of that lovely Scotch. <ROFL>


    2: Related the the first question, line 16 gives me an error, when I try to compile as "the variable words is not initialized"...
    That's just a warning - to see if the compiler is awake!


    3: When using malloc "void* malloc ();" what should I give it to allocate? (I think this will be easier to understand for me, once question 1 and 2 are cleared up)

    NOTE: There will be 143,091 words in the dictionary
    After the count is made, the malloc I had THOUGHT to use was this one.

    Code:
       count=input(fp,words,0);       //0=just counting this time
       printf("count: %d\n",count);
       getchar();
       rewind(fp);                    //going back to the start of the file
    
       //malloc the right number of words here
       words=malloc(count * sizeof(char *));
       for(i=0;i<count;i++) {
          words[i]=malloc(LENGTH * sizeof(char));  //#define LENGTH  29
       }
       
       input(fp,words,1); //1=now getting the words
    Which is great - but does give 29 char's for every word. Which is FINE up to some Number - haven't tested it beyond 50,000 - but I suspect it runs out of FINE, at about 100,000, and SOMEBODY wants a good deal more than that.

    With the Democrats in power, I'm surprised there isn't a law against that kind of verbosity.

    So, I'm thinking it might be good to measure each word, and malloc ONLY the amount of memory that each word will need (they will all need that infamous +1 space for the end of string char: '\0').

    So review strlen() and see what you think.

  15. #30
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    Quote Originally Posted by Adak View Post
    Just testing if you're awake! *words[] is conceptually VERY CLOSE to the same as **words. It conveys to me the idea that I'm not sure which exact format will be finally used.

    Or that I had one shot too many of that lovely Scotch. <ROFL>
    Oh, ok. I have removed **words, but I still have a these errors:

    use of undeclared identifier 'words'

    This error comes up on any line that uses words....


    After the count is made, the malloc I had THOUGHT to use was this one.

    Code:
       count=input(fp,words,0);       //0=just counting this time
       printf("count: %d\n",count);
       getchar();
       rewind(fp);                    //going back to the start of the file
    
       //malloc the right number of words here
       words=malloc(count * sizeof(char *));
       for(i=0;i<count;i++) {
          words[i]=malloc(LENGTH * sizeof(char));  //#define LENGTH  29
       }
       
       input(fp,words,1); //1=now getting the words
    Which is great - but does give 29 char's for every word. Which is FINE up to some Number - haven't tested it beyond 50,000 - but I suspect it runs out of FINE, at about 100,000, and SOMEBODY wants a good deal more than that.

    With the Democrats in power, I'm surprised there isn't a law against that kind of verbosity.

    So, I'm thinking it might be good to measure each word, and malloc ONLY the amount of memory that each word will need (they will all need that infamous +1 space for the end of string char: '\0').

    So review strlen() and see what you think.
    Ok, so I have malloc implemented, for now I am going to leave it as you recommended, if it runs out of room, I will look at ways to optimize it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Spell checking program
    By JYorke2097 in forum C Programming
    Replies: 3
    Last Post: 01-15-2009, 08:28 PM
  2. how to check a file for every 15 mins thr' program
    By nitinmhetre in forum Linux Programming
    Replies: 10
    Last Post: 01-05-2007, 01:53 AM
  3. Spell Checker
    By DeepFyre in forum Tech Board
    Replies: 2
    Last Post: 02-11-2005, 12:17 PM
  4. spell check in C using a dictionary file
    By goron350 in forum C Programming
    Replies: 10
    Last Post: 11-25-2004, 06:44 PM
  5. I can't spell...
    By Cheeze-It in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 05-08-2003, 08:07 AM