Thread: Spell Checker need help

  1. #31
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Y'know what... I started out here trying to help a beginner who was stumped on a programming assignment. My goal was to keep it at a level a beginner can work with comfortably. A simple, word search with a report of words not found in the dictionary... plain, easy to do within the OP's time constraints. (Don't forget, he's going to have other homework as well... might have a total of less than 20 hours for this...)

    Now, all of a sudden we've got people suggesting hyper-complex concepts, complicating the project and shouting down every minute detail of my suggestions, even moderators threatening to come back later and see if I've "improved things" whatever the heck that means.

    How much do you think our friend --remember him, the guy we're supposed to be helping?-- is going to get from this?
    If I was him, I'd just work the project and not come back here any time soon...

    In fact... I think I'll take my own suggestion. I'm outa this mess... just way too much stupidity for me to tolerate.
    Last edited by CommonTater; 11-05-2011 at 01:59 PM.

  2. #32
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I think the OP has had some excellent discussion here, and can use his own common sense to figure out how far he can take this project, in the time he has left.

    A few grumpy posts shouldn't diminish the value of the whole thread, imo.

  3. #33
    Registered User
    Join Date
    Aug 2010
    Posts
    231
    Quote Originally Posted by panosstylianou View Post
    Is there a way i can compare the words from the dictionary file itself? i tried copying the whole dictionary in an array but something seems to be wrong with my code.
    Why do you copy the entire filecontent?
    File- and string handling are easy in C,
    - calculate the max. wordlen
    - allocate one memoryblock
    - read each word in
    - sort them
    - (b)search them
    No pointerarray or realloc stuff is needed like:
    Code:
    int main()
    {
      size_t maxlen=0,z=0;
      FILE *f=fopen("words.txt","r");
      char line[100],*words,*p;
    
      while( fgets(line,100,f) ) /* read each line with one word */
      {
        if( strlen(line)>maxlen ) maxlen=strlen(line);
        ++z;
      }
    
      p=words=malloc(z*++maxlen); /* allocate the needed memory as one block */
    
      rewind(f);  /* to filebegin */
      while( fgets(line,100,f) ) /* read each line with one word */
      {
        if( strchr(line,'\n') ) *strchr(line,'\n')=0;
        strcpy(p,line);
        p+=maxlen;
      }
      fclose(f);
    
      qsort(words,z,maxlen,strcmp); /* sort all words asc */
    
      printf("searchword:");
      scanf("%99s",line); while(getchar()!='\n');
    
      if( bsearch(line,words,z,maxlen,strcmp) ) /* search for your word */
        puts("found");
      else
        puts("not found");
    
      free(words);
      return 0;
    }
    You can replace strcmp with other functions like stricmp, soundex or what you need.

  4. #34
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    @BillyTKidd...

    That is an extremely inefficient use of memory... your array has to be nwords * longest ... shorter words will populate the array very sparsely leaving, I would bet, more gaps than data... plus your method requires reading the file twice which is another gross inefficiency... and then you're going to be sorting the file *every time you load it* which takes a lot of time...

    Now go back to my original suggestions... (message #8)

    1) Write a small utility to load, sort and rewrite your wordlist so it's in alphabetical order.
    Now he only has to sort the file once.

    2) Create a quick software to load the entire word file in one piece into an array...
    You suggested the same thing, but in the most incredibly inefficient way possible... I showed him a very effeicient way to load his word list using not one byte more than is needed.

    In your example... for a 100,000 word list given that the longest word (Antidisestablishmentarianism) is 28 characters, your array would end up being 29 * 100,000 = 2.9 million bytes. Given an average word length of 6 characters... I'd estimate you'll have more than 60% slack space in your array.

    In my example ... 100,000 words, AVERAGE word length of 6 characters (5 plus a null) = 600,000 bytes and 100,000 * 4 = 400,000 for the pointer array ... total just under a megabyte for a 100,000 word list. In fact, testing on a couple of different word lists says it'll probably be more like 850k.

    So... 2.9mb vs 850k.... Hmmmm... which to choose....


    And FWIW... bsearch is simply a binary search function provided in the CRT.


    Seriously... I'm thinking the problem is that you looked at my code, didn't understand it, and went on the warpath... If you cannot contribute positively to these threads, you should stay out of them.
    Last edited by CommonTater; 11-06-2011 at 08:16 AM.

  5. #35
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Good post Tater, but I have one queston: I have looked inside my CRT, and there is nothing there but a cathode ray tube monitor? [j/k]

  6. #36
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    LOL... Of course I mean "C Run Time" (library)

    But while we're on that topic... What happened to the circuit boards?

    (And... here come the humour police.... )
    Last edited by CommonTater; 11-06-2011 at 10:14 AM.

  7. #37
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    You mean those little hand sized built in skeet targets?

  8. #38
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Adak View Post
    You mean those little hand sized built in skeet targets?
    No no... those are dust covers for the cup holder in your computer tower... I'm talking about the boards with all those little containers of smoke on them. But be careful... if you let the smoke out, they'll stop working...

    Spell Checker need help-cd-drive-cup-holder-jpg

  9. #39
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    I think a hash table will be simpler and faster than a binary search for this. Also, it will save having to do the initial sort. You just read the dictionary into the table and to check do a look-up.

    Hash table - Wikipedia, the free encyclopedia

    Make the table 1/2 the size of the dictionary, use a simple addition of the ASCII values and modulus by the size of the table to get a key/bucket. Then you have a simple, singly linked list for each bucket to manage collisions; alternately you could use a dynamically realloc'able array, or just find out what the biggest bucket will be (I think, < 10 items) and used a fixed size array.
    Last edited by MK27; 11-07-2011 at 10:16 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  10. #40
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    If you're interested in efficient memory usage for this problem, back when I did this sort of thing I allocated a contiguous block for all words of each length. I.e. I first counted up how many words there were of each length, then if there were say 594 six letter words then I allocated 594*(6+1) bytes for those etc. (Heck you can leave off room for the null if you really want and avoid routines like strcmp) So then the first step is to get the length of the string you want to check, then binary search through just the list of words of that length. It's very fast and very low memory usage. In fact you can't get any lower without using compression.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  11. #41
    Registered User
    Join Date
    Nov 2011
    Posts
    13
    ok this is the code for inputing a text file into an array.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    
    #define MAXLINELENGTH 200
    #define MAXLINENUMBER 100
    #define MAXWORDLENGTH 20
    #define MAXWORDNUMBER 1000
    
    
    char *word[MAXWORDNUMBER][MAXWORDLENGTH];
    
    
    char* readString(char* prompt)
    {
        char buffer[MAXLINELENGTH];
        printf("%s", prompt);
    fgets(buffer,MAXLINELENGTH,stdin);
        size_t inputLength= strlen(buffer);
        char *input =(char*) calloc(sizeof(char), inputLength);
        strncpy(input, buffer, inputLength-1);
        input[inputLength] = '\0';
        return input;
    }
    
    
    void removeNewLine(char *buffer)
    {
        size_t length = strlen(buffer);
        if (length == 0)
        {
            return;
        }
        if (buffer[length-1] == '\n')
        {
            buffer[length-1] = '\0';
        }
    }
    
    
    void readFile()
    {
    char *fileName = readString("Please enter the file name:");
        FILE *inputFile = fopen(fileName,"r");
        int i=0,j=0,counter=0,index=0;
    char line[MAXLINENUMBER][MAXLINELENGTH];
        if (inputFile == NULL)
        {
    printf("\n unable to read from file\n");
            return;
        }
    while(i<MAXLINENUMBER-1)
        {
            char *input=fgets(line[i],MAXLINELENGTH-1,inputFile);
            if (input == NULL)
            {
                break;
            }
            removeNewLine(line[i]);
            for(j=0;j<strlen(line[i]);j++)
            {
                if(line[i][j]== ' ' || (line[i-1][strlen(line[i-1])-1]=="\n"  && i>0))
                {
                    index=0;
                }
                else
                {
                    if(index==0)
                    {
    //printf("%s\n",word[counter]);
                        counter++;
                    }
                    word[counter][index]=line[i][j];
                    index++;
                }
            }
            i++;
        }
    //line[i]= NULL;
        fclose(inputFile);
        free(fileName);
    }
    and here are my errors. does anyone have a solution for that?

    project4.c: In function ‘readFile’:
    project4.c:64: warning: comparison between pointer and integer
    project4.c:75: warning: assignment makes pointer from integer without a cast

  12. #42
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by panosstylianou View Post
    project4.c:64: warning: comparison between pointer and integer
    Probably (I've added some whitespace):
    Code:
               if (line[i][j] == ' ' || 
                          (line[i-1][strlen(line[i-1])-1] == "\n" && i>0))
    You want == '\n', not "\n".

    project4.c:75: warning: assignment makes pointer from integer without a cast
    Because these two do not match:

    Code:
    char *word[MAXWORDNUMBER][MAXWORDLENGTH];
    char line[MAXLINENUMBER][MAXLINELENGTH];
    Did you look into hash tables? As I mentioned a few posts ago, it is almost certainly the most efficient way to do this (unless you want the checker to suggest), and probably easier too.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  13. #43
    Registered User
    Join Date
    Nov 2011
    Posts
    13
    oh thanks so much i was on this for a whole day

    Quote Originally Posted by MK27 View Post
    Did you look into hash tables? As I mentioned a few posts ago, it is almost certainly the most efficient way to do this (unless you want the checker to suggest), and probably easier too.
    yeah i looked into your suggestion. And it's quite interesting. The problem is that its a project in which i'll have to suggest as well. and i believe its quite inneficient to do that with a hash table. please correct me if im wrong

  14. #44
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    You want to declare your word array in either global (above main() ) space, or malloc it. If you create a local word array in readfile(), then the array will be destroyed as soon as you return from that array.

    Hash tables are fast and efficient, but not what you need/want, imo. What Tater has suggested is well suited for your needs. Use the binary search, and handle the suggestions for words as I posted earlier, and you'll be VERY pleased with the result.

    There are many interesting ideas on this thread, and many of them are excellent. Just not the right choice for you, at this time, on this assignment. imo.

  15. #45
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by panosstylianou View Post
    yeah i looked into your suggestion. And it's quite interesting. The problem is that its a project in which i'll have to suggest as well. and i believe its quite inneficient to do that with a hash table.
    It is basically impossible, AFAIK, without some parallel tree -- even if you suggested all the things with the same hash, or nearly the same hash, those are going to be a bunch of dissimilar words.

    So you should stick with a tree or a binary search on a sorted array. An issue with the latter is that the standard lib's bsearch is not a "horseshoes and hand grenades" affair: if the word is not found, you get nothing. Ie, you may have to write your own search to retain a fixed size slice near the end of the recursion (still less work than a tree, I think). Adak has that kind of suggestion in post #21.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Spell checker help
    By MyNameIs.. in forum C Programming
    Replies: 16
    Last Post: 11-25-2009, 05:43 PM
  2. Simple spell checker
    By purplechirin in forum C Programming
    Replies: 31
    Last Post: 03-19-2008, 07:17 AM
  3. Spell Checker
    By DeepFyre in forum Tech Board
    Replies: 2
    Last Post: 02-11-2005, 12:17 PM
  4. spell checker in c needs help
    By madmax in forum C Programming
    Replies: 3
    Last Post: 03-13-2003, 09:36 AM
  5. spell checker
    By bob20 in forum Windows Programming
    Replies: 3
    Last Post: 12-03-2002, 02:35 AM