Thread: Help with a program to spell check a file

  1. #1
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73

    Help with a program to spell check a file

    Hi again,

    I am working on a HOMEWORK assignment to spell check a file, and output any misspelled words. As this is something that I will be graded on, I am NOT asking you do "do it all for me" I am asking for hints and help so that I may continue to learn (and get a good grade at the same time)

    The way it is set up, a good chunk of the code is written already, There is a file called speller.c, that uses functions that I have to write. The functions actually get written in a different program called dictionary.c. Both dictionary.c and speller.c are compiled together.

    The functions I have to write are:

    check

    //Returns true if word is in dictionary else false.

    load

    //Loads dictionary into memory. Returns true if successful else false.

    size

    //Returns number of words in dictionary if loaded else 0 if not yet loaded.

    unload

    //Unloads dictionary from memory. Returns true if successful else false.

    I am currently working on writing my psudocode, and I have a few questions about the best way to do things...



    1. What is the best way to load the dictionary into memory, and what should it be loaded into... An array, hash table, trie.... Also, I have used the "f" functions before(fread, feof....), but there are so many to choose from, I am not sure what one(s) would work best.
    2. Is there an "f" function that can be used to return the size of the dictionary??? I don't want to know the total number of letters, but the total number of words. Is it best to find the size of the original dictionary or, the one that is loaded into memory.


    Any hints/help related to these questions or anything else to do with my program, would be great!!!

    Thanks,
    Josh

  2. #2
    Registered User
    Join Date
    Mar 2011
    Posts
    546
    1. i would start with just an array of words and do a linear search on it. that is the easiest to implement so you can get your program working. then when that works you can experiment with fancier methods such as tree,trie,hash etc. figuring out the tradeoffs between the methods is part of the fun.
    2. if your input dictionary has one word per line, you can use fgets to read each line for a word. hint : read docs on fgets about what it does with a newline. if your file isn't line oriented, you have to parse out each word yourself.
    3. you can find the binary size of the file with the function 'stat'. however that doesn't tell you the number of words. there is no 'f' function that will count words for you. you have to do that when you read the file. or you could run the utility 'wc' on the file if its available on your platform.

  3. #3
    Registered User
    Join Date
    May 2010
    Posts
    4,632
    What is the best way to load the dictionary into memory, and what should it be loaded into... An array, hash table, trie.... Also, I have used the "f" functions before(fread, feof....), but there are so many to choose from, I am not sure what one(s) would work best.
    You have not provided enough detail to really answer the question about how to read your file. How you read a file varies with how the information is laid out in the file.

    As to what type of data structure to use that is also not easy to answer. Again it depends on several factors. Like your level of experience, the actual type of class and many other issues. For example if this class is a Data Structures class then one of the more advanced data structures would probably be required but for a beginning C class an array may suffice.

    Jim

  4. #4
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    Quote Originally Posted by dmh2000 View Post
    1. i would start with just an array of words and do a linear search on it. that is the easiest to implement so you can get your program working. then when that works you can experiment with fancier methods such as tree,trie,hash etc. figuring out the tradeoffs between the methods is part of the fun.
    Ok, That makes sense.


    2. if your input dictionary has one word per line, you can use fgets to read each line for a word. hint : read docs on fgets about what it does with a newline. if your file isn't line oriented, you have to parse out each word yourself.
    It does have 1 word per line, so I will use fgets.

    3. you can find the binary size of the file with the function 'stat'. however that doesn't tell you the number of words. there is no 'f' function that will count words for you. you have to do that when you read the file. or you could run the utility 'wc' on the file if its available on your platform.
    Would It work if I just counted the number of times fgets loops???

    Thanks,
    Josh
    Last edited by Dude22; 03-08-2013 at 01:07 PM.

  5. #5
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    Quote Originally Posted by jimblumberg View Post
    You have not provided enough detail to really answer the question about how to read your file. How you read a file varies with how the information is laid out in the file.

    As to what type of data structure to use that is also not easy to answer. Again it depends on several factors. Like your level of experience, the actual type of class and many other issues. For example if this class is a Data Structures class then one of the more advanced data structures would probably be required but for a beginning C class an array may suffice.

    Jim
    It is a beginning C class. They have left it very open, I don't have to use a certain type of data structure.

  6. #6
    Stoned Witch Barney McGrew's Avatar
    Join Date
    Oct 2012
    Location
    astaylea
    Posts
    420
    What is the best way to load the dictionary into memory, and what should it be loaded into... An array, hash table, trie....
    Hash tables are ideal for this and they're very easy to implement in C. Just pick a good hash function as well as a good method for handling collisions.

    Also, I have used the "f" functions before(fread, feof....), but there are so many to choose from, I am not sure what one(s) would work best.
    It does have 1 word per line, so I will use fgets.
    Think about what a word is. It's more natural just to use all whitespace as a delimiter than it is to use fgets since whitespace is never part of a word (and you'd have to strip all that out after the call to fgets). The "s" format specifier for *scanf is exactly what you want. Just be sure to check its return value and give it a maximum field width so the destination storage isn't overflown.

    Is there an "f" function that can be used to return the size of the dictionary??? I don't want to know the total number of letters, but the total number of words. Is it best to find the size of the original dictionary or, the one that is loaded into memory.
    Why do you need to know that information?

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    You could use fgets() counting up all the words, and then rewind() the file pointer, if you wanted to.

    Then malloc an array of pointers with each word being malloc'd the length it needs, as it's re-read by fgets(). Sort the words when you're done, and search using a binary search with this change - instead of just finding that some spelling of a word doesn't exist, have it display the 5-7 closest words to it, as "hints". Quite workable, and look ma! no collisions! <LOL>

  8. #8
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    Ok, thanks for all the great hints, I will try it out and post with my results, most likely on monday. (unless I run into trouble sooner)

  9. #9
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    I was hoping to be able to get back to you sooner, I have a problem/question. If I am using scanf to read into an array, and I don't yet know the file length, how can I declare the array??? I could declare a super large array, but that seems like a waste of memory, and no matter how large I make it, the user could give it more input.

  10. #10
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    You can do the same thing with fscanf(). Read every word, and count how many there are. Then rewind() the file pointer (so it's back at the beginning again), and allocate the array size you want + a generous amount more - say 1,000 words more. That gives the user plenty of room to add more words. If you still believe the user will overflow the extra 1,000 word space, it's very easy to track how many words are in the array, and whenever the number of extra word spaces is reduced to say 50 words, then reallocate a new array, say 25% larger.

    The above will require malloc() and realloc(). It's important NOT to realloc() small amounts of extra space for an array.

    This is the help on realloc() from Pelles C:
    Purpose:
    Changes the size of an allocated object.

    Syntax:
    void * realloc(void *ptr, size_t size);

    Declared in:
    <stdlib.h>

    Description:
    The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.

    If ptr is a null pointer, the realloc function behaves like the malloc function for the specified size. Otherwise, if ptr does not match a pointer earlier returned by the calloc, malloc, or realloc function, or if the space has been deallocated by a call to the free or realloc function, the behavior is undefined. If memory for the new object cannot be allocated, the old object is not deallocated and its value is unchanged.

    Returns:
    A null pointer if the new object could not be allocated, otherwise a pointer to the new object (which may have the same value as a pointer to the old object). The allocated space is guaranteed to be suitable aligned for all basic data types.

  11. #11
    young grasshopper jwroblewski44's Avatar
    Join Date
    May 2012
    Location
    Where the sidewalk ends
    Posts
    294
    I rather liked the idea of reading in the words in into an array of pointers, allocating enough size for each word as needed. Sort then search.

  12. #12
    Noobin to the max
    Join Date
    Mar 2013
    Posts
    73
    The words actually start sorted, what would be the best system to use knowing that???

    Josh

  13. #13
    Stoned Witch Barney McGrew's Avatar
    Join Date
    Oct 2012
    Location
    astaylea
    Posts
    420
    I was hoping to be able to get back to you sooner, I have a problem/question. If I am using scanf to read into an array, and I don't yet know the file length, how can I declare the array??? I could declare a super large array, but that seems like a waste of memory, and no matter how large I make it, the user could give it more input.
    You can safely define a maximum limit for each word, since it's unlikely that an English dictionary will contain any words longer 45 characters (though there isn't any harm in allocating more providing you decide to strdup() each word). Concerning the list of words then, like I said in my previous post, the method that makes the most sense for the criteria you've listed is a hash table. Just handle collisions with a linked list and you have constant time for insertion and, depending on how large your table is and how good your hash function is, you should get constant time for most searches too.

    These data structures -- prefix trees, binary trees, hash tables, linked lists, and arrays are all very simple to implement and useful for many purposes. However, if you don't know when to use one over the other your programs are quite likely to suck.

    The words actually start sorted, what would be the best system to use knowing that???
    You don't need them to be sorted unless you're taking advantage of that with your searching algorithm.

  14. #14
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by Dude22 View Post
    The words actually start sorted, what would be the best system to use knowing that???

    Josh
    A binary search, of course. Your words will be found in the blink of an eye, even with tens of thousands of words in the array.

    The data structs that Barney has mentioned using above, are things that you will learn a lot more by doing them. If you have the time and think you're ready to give them a go, by all means do so.

    What I'm proposing is simpler. They're both very fast and impressive, and entirely adequate. The best solution would require using a search tree, and if you were ready for that, you wouldn't be asking the topic of this thread.

    The longest non-coined and non technical word in English is only 28 letters:
    Antidisestablishmentarianism
    btw.

  15. #15
    Stoned Witch Barney McGrew's Avatar
    Join Date
    Oct 2012
    Location
    astaylea
    Posts
    420
    What I'm proposing is simpler.
    I don't know about that. In any case, it would depend on the experience of OP. I'd say they're all fairly simple to implement if you've studied them thoroughly and practised using them in different ways.

    The best solution would require using a search tree
    Best in what regard? Binary search trees have an average time complexity of O(logn) for both insertion and searching, whereas hash tables have an average time complexity of O(1) for those.

    The longest non-coined and non technical word in English is only 28 letters:
    Antidisestablishmentarianism
    Right. I was looking at this page: Longest word in English - Wikipedia, the free encyclopedia
    The one I was thinking of is apparently the longest word to appear in a major dictionary. Personally, I'd probably just allocate an array of 8KB or so.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Spell checking program
    By JYorke2097 in forum C Programming
    Replies: 3
    Last Post: 01-15-2009, 08:28 PM
  2. how to check a file for every 15 mins thr' program
    By nitinmhetre in forum Linux Programming
    Replies: 10
    Last Post: 01-05-2007, 01:53 AM
  3. Spell Checker
    By DeepFyre in forum Tech Board
    Replies: 2
    Last Post: 02-11-2005, 12:17 PM
  4. spell check in C using a dictionary file
    By goron350 in forum C Programming
    Replies: 10
    Last Post: 11-25-2004, 06:44 PM
  5. I can't spell...
    By Cheeze-It in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 05-08-2003, 08:07 AM