Thread: Spell Checker need help

  1. #16
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Adak View Post
    Quite true, Tater. I was thinking that 120,000 words might not be all the words he needs, and if he does need a really large dictionary, splitting out the least commonly accessed words, would be a good idea. If you can fit all the words you need into your array and still have memory enough for the PC to be responsive, then forget it -- you're golden.
    Well, testing the binary search with 100,000 words I've yet to type in a word and get my finger off the enter key before it tells me I don't know how to spell .... Figure with the optimizing loader I put up, and an average word length of 6 letters, half a million words... 500,000 * 7 = 3.5mb ... still not that big a deal... the binary search can resolve it in 22 tries... which, in memory, is a matter of microseconds. (For what it's worth, my inventory packages routinely load 200+mb index files with no issues at all but they have almost exclusive use of the machine.)


    Writing out the nearest misses from a binary search is an old trick from way back when 80286's roamed the earth! I don't know whatever happened to them either - I think the TRex ate 'em!
    Ahhh... ya learn something new every day.

    Tries are a very elegant solution for a spell checker. Time consuming to work out any bugs though, imo.
    Interesting, but I fear a bit beyond our new friend's level at the moment...
    Last edited by CommonTater; 11-04-2011 at 02:23 PM.

  2. #17
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    If you're going to enhance your spelling checker into a spelling suggestor, then take a look at the Levenshtein Distance algorithm. It's really not all that hard actually.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  3. #18
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by iMalc View Post
    If you're going to enhance your spelling checker into a spelling suggestor, then take a look at the Levenshtein Distance algorithm. It's really not all that hard actually.
    I think we're losing track of the OP's task here... The guy is a beginner who has a week to get something resembling a spell checker to work at least far enough to hand in as a class project. Neither his time allocation nor his skill level is going to lead him to any of the more esoteric stuff being suggested and I don't think it's helping him at all to keep tossing up ever increasing complexity.

    Remember when we were beginners? How much did we do in a day? How hard was it for us to get even the most basic stuff working?
    That's where he is now.
    Last edited by CommonTater; 11-05-2011 at 04:30 AM.

  4. #19
    Registered User
    Join Date
    Nov 2011
    Posts
    13
    Quote Originally Posted by CommonTater View Post
    I think we're losing track of the OP's task here... The guy is a beginner who has a week to get something resembling a spell checker to work at least far enough to hand in as a class project. Neither his time allocation nor his skill level is going to lead him to any of the more esoteric stuff being suggested and I don't think it's helping him at all to keep tossing up ever increasing complexity.

    Remember when we were beginners? How much did we do in a day? How hard was it for us to get even the most basic stuff working?
    That's where he is now.
    well tater is quite right about my level. i'm quite a beginner. But thanks everyone about your suggestions. i guess i'll have to work my ass off and understand these things, that look really simple to you i might have to do just the spell checking without the suggestion part cause i guess a week is not enough. But still i'll check it out and see if its anywhere near possible within a week.

  5. #20
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by panosstylianou View Post
    well tater is quite right about my level. i'm quite a beginner. But thanks everyone about your suggestions. i guess i'll have to work my ass off and understand these things, that look really simple to you i might have to do just the spell checking without the suggestion part cause i guess a week is not enough. But still i'll check it out and see if its anywhere near possible within a week.
    Concentrate on the basics... get the minimum project working. If there's time, make a copy and see if you can't add some features before your deadline... worst case you hand in the working version.

    Good luck with this... If you run into problems or have questions, we'll do what we can...

  6. #21
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by panosstylianou View Post
    well tater is quite right about my level. i'm quite a beginner. But thanks everyone about your suggestions. i guess i'll have to work my ass off and understand these things, that look really simple to you i might have to do just the spell checking without the suggestion part cause i guess a week is not enough. But still i'll check it out and see if its anywhere near possible within a week.
    A spell checker *without* suggestions? As Hollywood would say: "That's unacceptable soldier!"

    It's simple to do. Here's a basic binary search function, for words:

    Code:
    int binarySearch(const char goal[], int left, int right) {
       int lo, mid, hi,count=0;
       lo=left;
       hi=right;
       while (lo <= hi) {
          ++count;
          mid = (lo + hi) / 2;
          /* watch how the binary search zerooes in on the goal word  */
      
          //printf("goal: %s mid: %d  words[mid]: %s  ", goal, mid, words[mid]); 
          //printf("stringcmp returns: %d\n", strcmp(goal, words[mid]));
          //getchar();
          
          if (strcmp(goal, words[mid]) < 0)     //goal < words[mid]: -1
             hi = mid-1;
          else if(strcmp(goal, words[mid]) > 0) //goal > words[mid]:  1 
             lo = mid+1;
          else
             return mid;
       }
       return -1;
    }
    If the word is found, obviously no suggestions are wanted, so it's the very last part of the function that needs just a tweak to give you the suggestions:
    Code:
       }
    
       (in pseudo code)
         print Nearest words found are: 
         =======================
             print words[mid-2]
             print words[mid-1]
             print words[mid]
             print words[mid+1]
             print words[mid+2]
    
       return -1;
    }
    All based on mid being the closest value to the word being searched for. If +2 and -2 aren't large enough, try +5 -5.

    "And you WILL adapt and overcome, Marine!

  7. #22
    Registered User
    Join Date
    Aug 2010
    Posts
    231
    Quote Originally Posted by CommonTater View Post
    Code:
                  Words = realloc(Words, sList * sizeof(char*)); } }
    To use realloc is a very bad idea for this usecase with many words, and strdup is not C standard.

  8. #23
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by BillyTKid View Post
    To use realloc is a very bad idea for this usecase with many words, and strdup is not C standard.
    Seriously, have you nothing better to do than sit around quoting imaginary rule books and half-baked theoretical knowledge?

    How do you propose loading an array of unknown size without realloc()?

    I don't care a whit about C Standards, if it's there use it!
    Last edited by CommonTater; 11-05-2011 at 10:57 AM.

  9. #24
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Well, testing the binary search with 100,000 words I've yet to type in a word and get my finger off the enter key before it tells me I don't know how to spell
    Well, there you go - you've got dreadfully slow fingers. [j/k]

    Realloc() ?? Wouldn't recommend it. Won't his memory be Swiss Cheese, with repeating realloc calls, and run out of contiguous big blocks?

    I've never been brave enough to try realloc() with large arrays.

    Did you catch the ad for the spell checker at the top of the forum, just now?
    Last edited by Adak; 11-05-2011 at 10:58 AM.

  10. #25
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Adak View Post
    Well, there you go - you've got dreadfully slow fingers. [j/k]

    Realloc() ??

    More old think, my friend... Windows memory management has come a long way since the Turbo C days... Heap Management is now some pretty slick stuff.

    I use it all the time and haven't had so much as a moment's concern about it yet. But if it bothers you that much you can always create your own private heap to put the array and words into...

    FWIW... the array itself, for 100,000 words is only going to be 400k. In a machine with gigabytes of memory I really don't understand this big ongoing fuss about saving memory... It's there to use... USE IT!
    Last edited by CommonTater; 11-05-2011 at 11:17 AM.

  11. #26
    Registered User
    Join Date
    Aug 2010
    Posts
    231
    You don't know what you say.
    Obviously you only know Windows and its CRT, there are many other OS and compilers and every have problems with many realloc calls, its a compiler independent "problem". Therefore real programmers use memory managers.

  12. #27
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by BillyTKid View Post
    You don't know what you say.
    Wanna know where you can stuff that BS attitude?

    Obviously you only know Windows and its CRT, there are many other OS and compilers and every have problems with many realloc calls, its a compiler independent "problem". Therefore real programmers use memory managers.
    Here's a quarter... go call someone who cares.
    Last edited by CommonTater; 11-05-2011 at 11:29 AM.

  13. #28
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    You're quite right there, Tater. I don't know anything about how Windows works with it's heap management.

  14. #29
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Adak View Post
    You're quite right there, Tater. I don't know anything about how Windows works with it's heap management.
    Well, read the link... it's pretty informative. Most malloc(), free() and realloc() are just wrappers for the Heap functions. Fragmentation can be a problem when reallocating frequently, but don't forget we're also loading a mess of short strings that will stuf themselves into any gaps left by the realloc function... It's not like we're reallocating a single entity line by line... if you'll notice I had it do it in blocks of 100... Reallocation is no longer the old malloc-copy-free chain of events, most times realloc can resize memory in place.
    Last edited by CommonTater; 11-05-2011 at 01:04 PM.

  15. #30
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well if you're going to use realloc, you may as well try to use it properly.
    I'll be back in a few hours to see if CT's improved things.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Spell checker help
    By MyNameIs.. in forum C Programming
    Replies: 16
    Last Post: 11-25-2009, 05:43 PM
  2. Simple spell checker
    By purplechirin in forum C Programming
    Replies: 31
    Last Post: 03-19-2008, 07:17 AM
  3. Spell Checker
    By DeepFyre in forum Tech Board
    Replies: 2
    Last Post: 02-11-2005, 12:17 PM
  4. spell checker in c needs help
    By madmax in forum C Programming
    Replies: 3
    Last Post: 03-13-2003, 09:36 AM
  5. spell checker
    By bob20 in forum Windows Programming
    Replies: 3
    Last Post: 12-03-2002, 02:35 AM