Thread: correct word checking

  1. #1
    Registered User
    Join Date
    Feb 2015
    Posts
    45

    correct word checking

    hello ..
    i have this problem i wanna find out how to do it :

    for example : if a have a list of ( incorrect words ):

    nce //should be nice // one letter missing
    heree // should be here // one letter added
    hte // should be the // two letters reversed


    here is the methods i need to follow to solve the previous situations :


    1. One letter missing. You assume that one letter has been left out of the word. You can assemble new words to check by adding letters a..z in each of the positions in the word from the start to the end of the word.
    2. One letter added. You assume the word has an extra letter. You scan through the word deleting each of the letters in turn, and looking up the word formed by the remaining letters.
    3. Two letters reversed. You swap letters in positions 0..1, 1..2, 2..3, ... , n-2..n-1, to form new words which you look up.


    here is what i have so far :
    Code:
     1 char string[20];
     2 char temp[20];
     3 int i;
     4 int l=strlen(string);
     5 
     6 for(i=0;i<l/2;i++0){
     7    temp = string[i];
     8    string[i] = string[l-i];
     9    string[l-i] = temp;
    10    }
    which is method # 3
    i cannot think about #1 or #1
    please help me how to do it

    thank you

  2. #2
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393
    For method 3, my understanding is the following. If the user enters "somehting" as input then your word generation function should generate the following permuted words with method 3:

    osmehting
    smoehting
    soemhting
    somheting
    something
    somehitng
    somehtnig
    somehtign
    somehting

    You could write a function that generates that list of words (just start by printing out the list of generated words to verify it is working). Similar for the other methods. In the end you will also need a way to look up each of your permuted words to see if it is in your dictionary, and you will also need a way to choose a "winner" in case more than one permutation is found in the dictionary.

  3. #3
    Registered User
    Join Date
    Feb 2015
    Posts
    45
    Quote Originally Posted by c99tutorial View Post
    For method 3, my understanding is the following. If the user enters "somehting" as input then your word generation function should generate the following permuted words with method 3:

    osmehting
    smoehting
    soemhting
    somheting
    something
    somehitng
    somehtnig
    somehtign
    somehting

    You could write a function that generates that list of words (just start by printing out the list of generated words to verify it is working). Similar for the other methods. In the end you will also need a way to look up each of your permuted words to see if it is in your dictionary, and you will also need a way to choose a "winner" in case more than one permutation is found in the dictionary.

    thank you C99tutorial
    but can you explain in c how to implement it ? or in pseudocode ?

    thanks

  4. #4
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    You just need to swap every character with every other character.

    So with a smaller word perhaps... generate all the permutations of cat.
    1. cat
    Starting with cat, the best thing to do is swap the first and next character.
    2. act
    Then swap the next two.
    3. atc
    When you get to this point, swap the last and first, then keep doing what was working before.
    4. cta
    5. tca
    6. tac
    These words can be stored in an array/list or tried one by one against a list of good words. The trick is to know how many there are and just count them all in your loop that makes permutations.

    EDIT: It would be smarter to give up after n tries then to try to go through all permutations of a word, as many words have huge number of permutations. "something" (nine letters) has 362,880.
    Last edited by whiteflags; 11-04-2015 at 04:24 PM.

  5. #5
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    Quote Originally Posted by whiteflags View Post
    EDIT: It would be smarter to give up after n tries then to try to go through all permutations of a word, as many words have huge number of permutations. "something" (nine letters) has 362,880.
    If I understand the problem description correctly there are not 362880 spellings in a 9 letter word. This is because it's only looking for 2 out of order letters in the given word; and therefore there is a constraint
    (based on: "Two letters reversed. You swap letters in positions 0..1, 1..2, 2..3, ... , n-2..n-1, to form new words which you look up". Therefore, if I am interpreting the task correctly, there only n possible variations of the words need be examined not all permutations using characters in the word.

  6. #6
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Ah OK, then the counting of the permutations would go differently than what I said, but the approach to swapping itself is correct.

  7. #7
    Registered User
    Join Date
    Feb 2015
    Posts
    45
    thank you guys but none of these really helped me
    if any of you have idea about method #1 or #2 please help


    thanks

  8. #8
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Quote Originally Posted by Hana Nasser View Post
    thank you guys but none of these really helped me
    if any of you have idea about method #1 or #2 please help


    thanks
    Well, in general ...

    "Fuzzy matching" is a thing, and even the method described here is not really how you do it. Reading wikipedia may be the best thing for you to do right now. Try to make sense of existing algorithms.

    I am pretty sure that method 1 doesn't work at all. I will walk you through the reason why:
    You can assemble new words to check by adding letters a..z in each of the positions in the word from the start to the end of the word.
    This sounds good until you notice the word nice has four letters in it, but if you misspell it "nce", no three letter combination is going to result in nice.

    Here's what I would do:
    1. Assume that the first letter of any word is correct. This rule restricts you to words that start with n if you misspell nice - which is better than checking words that start with any letter.
    2. Assume that a character missing in a word is the result from missing the second letter in a digraph, as to not go against assumption one.
    3. Take a list of Latin digraphs. Compare the list against your word (look at words in two-character sized chunks) and look for existing digraphs.
    4. If you find a digraph while searching, assume it is part of the correctly spelled word.
    5. When you find a chunk that is not a digraph ("nc" in "nce") copy the word in such a way that there is space for one. ("n?ce").
    5. Build a list of words to check using correct digraphs:
    nace
    nece
    nice
    noce
    nuce
    ...

    Drawbacks of using this method:
    1. You are likely going to have to curate your own list of digraphs. I'm not a linguist so I can't do it for you and it would do you no good to use a giant list such as the one I linked, which contains non-English constructs.You may decide that even a small subset of all the digraphs makes a spell checker that is good enough.
    2. You should be reasonably sure that a missing letter is the problem before you try this.
    3. It still involves a lot of string manipulation.
    Last edited by whiteflags; 11-04-2015 at 07:42 PM.

  9. #9
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    I think it's silly

    Using /usr/share/words/dict and the incorrect spelling "nce" my test program using strategy #1 I get:

    Code:
    nace
    nice
    once
    pnce
    Even ignoring the words "nace" and "ponce" (what are they?), there is still "nice" and "once" as candidates. Did you get any instructions on what to output if there was more than one potential "correction".


    Edit: Now that I think about it, strategy #1 should not attempt adding a missing letter before "nce" and nor should it attempt adding a letter after "nce". I think. You'd better check that. But that still doesn't rule out duplicate candidates for a correct spelling.

    Edit 2: After reading the wikipedia page linked to by @whiteflags I have changed my mind and think adding a letter prior to and after the "incorrect spelling" should be included. So using the word coil (even though it isn't spelled incorrectly) the candidates from strategy 1 are:

    Code:
    choil
    coils
    Last edited by Hodor; 11-04-2015 at 10:17 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 13
    Last Post: 09-20-2012, 11:48 AM
  2. Replies: 28
    Last Post: 10-23-2011, 07:17 PM
  3. checking how frequent a word appear in a string
    By kemar in forum C Programming
    Replies: 2
    Last Post: 09-14-2011, 01:32 PM
  4. reading text-and-numbers file word by word
    By bored_guy in forum C Programming
    Replies: 22
    Last Post: 10-26-2009, 10:59 PM
  5. open file, search of word, replace word with another
    By Unregistered in forum C++ Programming
    Replies: 0
    Last Post: 06-05-2002, 01:16 PM