Thread: Simple spell checker

  1. #1
    Registered User
    Join Date
    Mar 2008
    Posts
    14

    Simple spell checker

    I have a function in my spell checker codes to load words from a text file and store them in an array of strings.. Problem is, I can't figure out how to correct it.

    For instance, the 'dictionary' text file has:

    line1
    line2
    line3

    and my code goes:
    Code:
    // assuming I only have 10 words, and done all other necessary declarations..
    char *words[10]; 
    
    for (i=0; fgets(s, 9, dict) != NULL; i++)
            words[i] = s;
    How do I correct it so it doesn't store "line3" in every single element of my array?

  2. #2
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    you made array of pointers - you should change it to aray of strings

    char words[10][MAX_LEN];
    if you know the maximum length

    and read into it like
    fgets(words[i], sizeof words[i], dict)

    If you do not know the maximum string size in advance - you will need a dynamic allocatoin and strcpy to move strings
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  3. #3
    Registered User
    Join Date
    Mar 2008
    Posts
    14

    Post

    It works! Thanks for your help!!

    Quote Originally Posted by vart View Post
    you made array of pointers - you should change it to aray of strings

    char words[10][MAX_LEN];
    if you know the maximum length

    and read into it like
    fgets(words[i], sizeof words[i], dict)

    If you do not know the maximum string size in advance - you will need a dynamic allocatoin and strcpy to move strings

    Now it comes to another problem.. How do I remove the newline character (\n) from the words that was copied from fgets??
    Last edited by purplechirin; 03-18-2008 at 12:08 AM.

  4. #4
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    there are several approaches. I prefer:
    Code:
    char *p = strchr(buffer,'\n');
    if(p) *p = '\0';
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  5. #5
    Registered User sndpchikane's Avatar
    Join Date
    Mar 2008
    Posts
    5
    Hey buddy

    can you please be more specific about what you want to do in your program so

    that i can help you.

    Also if you are reading text from TEXT file use fscanf() function to retrieve values

    so that you can get values without new line char "\n"

    and store this values directly in STRUCTURE so that comparison is easy.

  6. #6
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by sndpchikane View Post
    Also if you are reading text from TEXT file use fscanf() function to retrieve values

    so that you can get values without new line char "\n"
    Not necessary. Fgets works just as well. removing the newline is easy. Fscanf isn't "better", it's just another method of doing it.

    Quote Originally Posted by sndpchikane View Post
    and store this values directly in STRUCTURE so that comparison is easy.
    That depends on what you want to do. If there's no good reason to do it, you're just going to get unneccesary overhead.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  7. #7
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Well, it really is a simple spell checker.. Read a text file containing 'dictionary' words, read another text file containing paragraphs of words, and print out if any of the words are not in the 'dictionary'.

    I've done a spell checker in Java some time ago and it was slightly more complex than this (using data structures, and there were spelling suggestion feature as well), but I'm trying to improve on my C - which I only know the very basics.

    Anyway, the way I did it was, I store the words of the 'dictionary' in an array, and then as I read each word in the file, i compare them to the elements in the array.

    So far (with vart's help), I've got the dictionary loading done. Now I'm doing the strcmp part, which is giving me another headache as strcmp(token, words[i]) doesn't seem to give the correct results.

    I tried printing out both token and words[i] out, and it seemed the same to me - but strcmp doesn't agree. I suspect it has something to do with the \0 character?

    I tried using the debugger, and it seems that token and words[i] have really odd values.. like one being \"line3\\000\\217\\031\\177\" and the other \"line3\\r\\000\\324\\340\\310\" when both of them prints "line3".

  8. #8
    uint64_t...think positive xuftugulus's Avatar
    Join Date
    Feb 2008
    Location
    Pacem
    Posts
    355
    Do you know that strcmp returns 0 when two strings are equal?
    Code:
        char *a = "gorilla";
        if( strcmp(a,"gorilla") == 0 )
            printf("KING KONG vs GODZILLA\nFIGHT!\n");
    The above actually prints a Mortal Monster Combat intro message...
    Code:
    ...
        goto johny_walker_red_label;
    johny_walker_blue_label: exit(-149$);
    johny_walker_red_label : exit( -22$);
    A typical example of ...cheap programming practices.

  9. #9
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Do you have a documentation? Docs explicitly specify that strcmp returns 0 when strings match, as vart points out.
    As for the strings... well, I'm guessing you never initialized your array, hence it contains "garbage." But all strings are expected to end when a '\0' (or 0) is encountered. Thus it never reads any of that junk beyond the 0, so you're fine. Nothing wrong with it.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  10. #10
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Oh dear, I'm really sorry I tend to skip a lot of details when I post my question..

    My actual part of the code that does (or is supposed to do) the string compare is:

    Code:
    /* this whole chunk of codes are in a while-loop that reads each token in the file */
    
    for (i=0; i<strlen(*words); i++) {
        if (strcmp(token, words[i]) == 0) {
            /* just an int variable to keep track on whether the word is found or not,
                i initialized it to 1 */
            notFound = 0;
            printf("Here!");
            break;
        }
    }
                
    if (notFound) {
        printf("%s\n", token);
    }
    I've put printf's around to print the values of token and words[i] in each loop and it printed out as expected (eg. token0, word[0].. token0, word[1].. token0, word[2]..... token1, word[0].. and so on), but the strcmp(token, words[i]) never gave me a 0 despite, say, token is "string1" and words[i] is also "string1".

  11. #11
    uint64_t...think positive xuftugulus's Avatar
    Join Date
    Feb 2008
    Location
    Pacem
    Posts
    355
    Did you remove the newline character as vart suggested?
    Code:
    ...
        goto johny_walker_red_label;
    johny_walker_blue_label: exit(-149$);
    johny_walker_red_label : exit( -22$);
    A typical example of ...cheap programming practices.

  12. #12
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Quote Originally Posted by xuftugulus View Post
    Did you remove the newline character as vart suggested?
    Yup, I did, and the 'dictionary' array (i.e. words[]) prints out fine when i do a simple loop to print out all the elements..

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Code:
    for (i=0; i<strlen(*words); i++) {
        if (strcmp(token, words[i]) == 0) {
            /* just an int variable to keep track on whether the word is found or not,
                i initialized it to 1 */
            notFound = 0;
            printf("Here!");
            break;
        }
    }
    I'm pretty sure strlen(*words) is incorrect here. What do you intend that to do?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Quote Originally Posted by matsp View Post
    I'm pretty sure strlen(*words) is incorrect here. What do you intend that to do?

    --
    Mats

    Umm, to specify the boundary for the for-loop.. The size of the 'dictionary', i.e, char word[][]. Initially I put for (i=0; i<strlen(words); i++) (same thing, only without the asterisk), but the compiler gave me an error without the asterisk. The for-loop works fine though (as far as I know), I'm only having problem with the string compare.

  15. #15
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Deosn't work that way. You want the size of the array, not the length of the string, so you can't rely on strlen here.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 10:33 AM
  2. what SHOULD be a painfully simple API call...
    By Citizen Bleys in forum Windows Programming
    Replies: 3
    Last Post: 09-17-2003, 03:20 PM
  3. Simple simple graphics
    By triplem in forum C Programming
    Replies: 2
    Last Post: 05-19-2003, 02:52 AM
  4. spell checker in c needs help
    By madmax in forum C Programming
    Replies: 3
    Last Post: 03-13-2003, 09:36 AM
  5. spell checker
    By bob20 in forum Windows Programming
    Replies: 3
    Last Post: 12-03-2002, 02:35 AM