Thread: Simple spell checker

  1. #16
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Quote Originally Posted by Elysia View Post
    Deosn't work that way. You want the size of the array, not the length of the string, so you can't rely on strlen here.
    I declared the array as char words[i][j]; (i and j are arbitrary numbers for me to test files that contain only a few lines - here I put char words[10][10];)

    I tried using sizeof(words), but it gave me a value of 100..

  2. #17
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    To get the size of an array, you can usually do:
    sizeof(array) / sizeof(array[0]);
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  3. #18
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    To get the size of an array, you can usually do:
    sizeof(array) / sizeof(array[0]);
    But you have to do it in the function where the array is declared.
    In any other funtion - you will need to pass an array size as a parameter
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  4. #19
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Thanks, I'll try to get the array size problem fixed.

    But back to my other question, why would printf("%s, %s\n", token, words[i]); output "string1, string1", but when I use strcmp(token, words[i]) == 0 it won't give a true?

  5. #20
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    print each of the words[i] characters as decimal or hex to see if there is any invisible characters that "get in the way".

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #21
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Quote Originally Posted by matsp View Post
    print each of the words[i] characters as decimal or hex to see if there is any invisible characters that "get in the way".

    --
    Mats
    do you mind explaining how it's supposed to be done? i'm really lost at this. not typecasting is it?

  7. #22
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Well since this won't help you get any marks (if this is an assignment)

    Code:
    void print_string_as_hex(const char * s)
    {
        size_t  len = strlen(s),
                i = 0;
        
        for(i = 0; i < len; i++)
        {
            printf("%X", s[i]);
        }
        
        return;
    }
    I didn't test it, hope it works

    Read it and understand it before using it.

    In my opinion you're jumping onto the 'code train' too early, design it first -- for example:
    • open the dictionary file
    • read each line (word), trim the newline character if nessisary -- adding the word to a linked list or array (sizing the array as nessisary with realloc() )
    • close the file
    • do whatever with the array or linked list of words
    Last edited by zacs7; 03-18-2008 at 09:23 PM.

  8. #23
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    would printf("&#37;s, %s\n", token, words[i]); output "string1, string1",
    try something like
    Code:
    printf("\"%s\", \"%s\"\n", token, words[i]);
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  9. #24
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Quote Originally Posted by zacs7 View Post
    Well since this won't help you get any marks (if this is an assignment)

    Code:
    void print_string_as_hex(const char * s)
    {
        size_t  len = strlen(s),
                i = 0;
        
        for(i = 0; i < len; i++)
        {
            printf("%X", s[i]);
        }
        
        return;
    }
    I didn't test it, hope it works

    Read it and understand it before using it.

    In my opinion you're jumping onto the 'code train' too early, design it first -- for example:
    • open the dictionary file
    • read each line (word), trim the newline character if nessisary -- adding the word to a linked list or array (sizing the array as nessisary with realloc() )
    • close the file
    • do whatever with the array or linked list of words
    It is an assignment, but I'm not looking for the solution - i just need help with particular parts that I seem to have problems with.

    About the design, I do have a sketch of the flowchart before I started coding, and I'm currently at the "do whatever with the array or linked list of words" - which is where I'm now stuck.

    Just so no one gets the impression that I'm expected the entire solution here, here's what I've completed so far in the coding:

    • check whether the command line arguments are valid; exit properly if not
    • check if the provided dictionary filename exists
    • check if the provided filename (of the file to be checked) exists
    • open the dictionary
    • read the dictionary words into an array
    • close the dictionary file
    • open the file to be checked
    • read the each words in the file
    • compare the words to the 'dictionary' array <<-- problem here
    • print out the words that are not found in the dictionary
    • close the file


    I really appreciate all the help I get from you guys here, but yeah, I'm trying to do my own homework.

  10. #25
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Quote Originally Posted by vart View Post
    try something like
    Code:
    printf("\"%s\", \"%s\"\n", token, words[i]);
    ehh.. i tried that and it printed out "string1", "string1 (without the last double quote). where did it go?? is that the problem?

  11. #26
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    Quote Originally Posted by purplechirin View Post
    ehh.. i tried that and it printed out "string1", "string1 (without the last double quote). where did it go?? is that the problem?
    I suppose it goes on the next line indicating that the second string contains \n at the end
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  12. #27
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Quote Originally Posted by vart View Post
    I suppose it goes on the next line indicating that the second string contains \n at the end
    the double quote somewhat disappeared.. there's no extra " on the second line.

    edit: i changed it a little, and instead of a double quote i put different symbols:

    Code:
    printf("!%s@, #%s$\n", token, words[i]);
    and instead of printing out !string1@, #string2$, it came out:

    Code:
    $string1@, #string1
    $string1@, #string2
    $string1@, #string3
    ...
    Last edited by purplechirin; 03-19-2008 at 03:46 AM.

  13. #28
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    That indicates that there is a '\r' [carriage return] at the end of the line - can you post:
    1. Code that opens the "valid word list".
    2. Code that reads the word.
    3. Code that removes newline.

    I suspect you are reading the file in binary mode, but it could be other things.

    Using the "print in hex" variation will show you that it's got a 0D character at the end of the string, I suspect.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #29
    Registered User
    Join Date
    Mar 2008
    Posts
    14
    Quote Originally Posted by matsp View Post
    That indicates that there is a '\r' [carriage return] at the end of the line - can you post:
    1. Code that opens the "valid word list".
    2. Code that reads the word.
    3. Code that removes newline.

    I suspect you are reading the file in binary mode, but it could be other things.

    Using the "print in hex" variation will show you that it's got a 0D character at the end of the string, I suspect.

    --
    Mats
    While i do the printing in hex.. here are the codes:

    Code that opens the 'valid word list':
    Code:
    FILE *dict;
    dict = fopen(filename, "r");
    Code that reads the word:
    Code:
    for (i=0; fgets(words[i], sizeof(words[i]), dict) != NULL; i++);
    Code that removes newline:
    Code:
    /* remove the newline characters from each word */
    for (i=0; i<sizeof(words)/sizeof(words[0]); i++) {
        char *p = strchr(words[i],'\n');
        if(p) {
            *p = '\0';
        }
    }
    EDIT: I did the printing in hex for the words[], and 0D was at the end of the last element in the array -

    6C696E6531D <-- word[0]
    6C696E6532D <-- word[1]
    6C696E6533D <-- word[2]
    ...
    6C696E653130D <-- word[9]

    (in my 'dictionary' of 10 words)
    Last edited by purplechirin; 03-19-2008 at 05:29 AM.

  15. #30
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Strange. Are you by any chance using a word-list generated by a Windows / DOS program on a Linux/Unix machine? That would explain the newline/carriage return problem.

    If so, you should use "dos2unix wordlist" to make sure the newlines are converted from CR+LF to LF only.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 10:33 AM
  2. what SHOULD be a painfully simple API call...
    By Citizen Bleys in forum Windows Programming
    Replies: 3
    Last Post: 09-17-2003, 03:20 PM
  3. Simple simple graphics
    By triplem in forum C Programming
    Replies: 2
    Last Post: 05-19-2003, 02:52 AM
  4. spell checker in c needs help
    By madmax in forum C Programming
    Replies: 3
    Last Post: 03-13-2003, 09:36 AM
  5. spell checker
    By bob20 in forum Windows Programming
    Replies: 3
    Last Post: 12-03-2002, 02:35 AM