I split a sentence into a word per line, now I want to compare word to a word in file

This is a discussion on I split a sentence into a word per line, now I want to compare word to a word in file within the C Programming forums, part of the General Programming Boards category; Here's my code: Code: #include <stdio.h> #include <string.h> int main () { char b[256]; char * pch; printf("Input a short ...

  1. #1
    Registered User
    Join Date
    Apr 2011
    Posts
    229

    I split a sentence into a word per line, now I want to compare word to a word in file

    Here's my code:

    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main ()
    {
      char b[256];
      char * pch;
      printf("Input a short sentence: ");
      gets(b);
      pch = strtok (b," ,.-");
      while (pch != NULL)
      {
        printf ("%s\n",pch);
        pch = strtok (NULL, " ,.-");
      }
      getch();
      return 0;
    }

    Here's my results:

    Input a short sentence: dog ran
    dog
    ran

    ___________

    I want the result to read like this though:

    Input a short sentence: dog ran
    dog: Noun
    ran


    How I want to add ': Noun' to 'dog' is to open a file, and read from that file that dog is a noun, then print this on the console screen.
    I looked up how on a reference library but didn't see anything that would do this, there was something for c++ but I don't understand C++ at all.

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Is the word list something like this?

    apple n
    apply v
    bake v
    baker n
    Charles p
    dog n
    eccentric a

    So the words are in alphabetical order, with the a=adjective, n=noun, p=pronoun, v=verb classification, following the word it applies, and separated by a space?

    You can see, all the details are important, but that's how I would do what you want. You could use a binary search to massively help with the searching.

    An even better idea would be to group the words after their sorted, by their position, in an index array. The index array would give the binary search a much smaller search space - for instance, it could tell the binary search to start looking for dog, at the first 'd' word, at position # 156, with a ceiling of 272 (because the 'e' words start there).

    I've done some programs like this, and I can assure you there are a LOT, LOT, LOT of words. Make every effort to limit the number of words you will have in your word list. Ideally, you want to fit them (the word list), all into an array of strings, with each row, a word, a space, then a char. Searching through a file is much slower than searching through an array.

  3. #3
    Registered User
    Join Date
    Apr 2011
    Posts
    229
    The words keep the order their order, they just get the 'n' 'v' 'a' beside them. examples:'dog n' 'ran v' 'fast a'.

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by jeremy duncan View Post
    Here's my code:

    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main ()
    {
      char b[256];
      char * pch;
      printf("Input a short sentence: ");
      gets(b);
      pch = strtok (b," ,.-");
      while (pch != NULL)
      {
        printf ("%s\n",pch);
        pch = strtok (NULL, " ,.-");
      }
      getch();
      return 0;
    }

    Here's my results:

    Input a short sentence: dog ran
    dog
    ran

    ___________

    I want the result to read like this though:

    Input a short sentence: dog ran
    dog: Noun
    ran


    How I want to add ': Noun' to 'dog' is to open a file, and read from that file that dog is a noun, then print this on the console screen.
    I looked up how on a reference library but didn't see anything that would do this, there was something for c++ but I don't understand C++ at all.
    Try entering ... The quick brown fox jumped over the lazy dog in the field
    You might be in for a bit of a surprise.

  5. #5
    Registered User
    Join Date
    Apr 2011
    Posts
    229
    The results:

    Input a short sentence: The quick brown fox jumped over the lazy dog in the fiel
    d
    The
    quick
    brown
    fox
    jumped
    over
    the
    lazy
    dog
    in
    the
    field

  6. #6
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Hmmmm... ok... now make it do that *outside* the strtok loop... and make it print the original sentence after.

    The problem you're going to discover is that you are destroying your input which means that by line 16 you actually have no useable data.

  7. #7
    Registered User
    Join Date
    Apr 2011
    Posts
    229
    It's only for short sentences for now. I just want to figure out the way I can add the 'n' to the word dog by looking up the word dog in a file.
    So the string I input, not read from a file, I type 'dog' and the word is then looked up in a file to show it is a 'n' so the console output is 'dog n'.

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    So declare a FILE pointer variable, and use fopen("worklist.txt", "r"), and if the file pointer ("fp" is a good name), is valid after the fopen line of code, then you're free to start searching.

    strcmp(string1, string2) will let you know when you have found the right word, by returning a zero. Think of it as "zero difference between the two strings being compared".

  9. #9
    Registered User
    Join Date
    Apr 2011
    Posts
    229
    Quote Originally Posted by Adak View Post
    So declare a FILE pointer variable, and use fopen("worklist.txt", "r"), and if the file pointer ("fp" is a good name), is valid after the fopen line of code, then you're free to start searching.

    strcmp(string1, string2) will let you know when you have found the right word, by returning a zero. Think of it as "zero difference between the two strings being compared".
    I tried what you said but I don't know how the txt file should look like or how to get the code to use the txt file.

    To commonTater, I fixed that error where it had a limit on the amount of words the sentence could be. I have a char array with 'a unknown size' now.

    Here is the code I have:

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    int main ()
    {
      char *b = malloc(sizeof(char));
      char * pch;
      FILE * fp;
      fp = fopen("worklist.txt", "r");
      printf("Input a short sentence: ");
      gets(b);
      pch = strtok (b," ,.-");
      while (pch != NULL)
      {
        printf ("%s\n",pch);
        pch = strtok (NULL, " ,.-");
      }
      getch();
      return 0;
    }
    Adak, can I ask you to fix the code so it can read from the txt file and show me how the txt file should look?

  10. #10
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Before you do anything else, let's get the file opening sorted out:
    Code:
       fp = fopen("worklist.txt", "r");
       /* insert this */
       if(!fp) {  //if fp was given NULL by fopen() meaning the file was not opened
          perror("Error: file worklist.txt was not found or opened");
          return 0;
      }
      printf("Input a short sentence: ");
    You'd be surprised how often a filename is not quite right, or the file was moved, etc.

    As for the words, how many words are you expecting to search through?

  11. #11
    Registered User
    Join Date
    Apr 2011
    Posts
    229
    The code works when I run this code:

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    int main ()
    {
      char *b = malloc(sizeof(char));
      char * pch;
      FILE * fp;
      fp = fopen("worklist.txt", "r");
      /* insert this */
      if(!fp) {  //if fp was given NULL by fopen() meaning the file was not opened
    	  perror("Error: file worklist.txt was not found or opened");
    	  return 0;
      }
      printf("Input a short sentence: ");
      gets(b);
      pch = strtok (b," ,.-");
      while (pch != NULL)
      {
        printf ("%s\n",pch);
        pch = strtok (NULL, " ,.-");
      }
      getch();
      return 0;
    }
    In the code I can open a txt file, but I'm not sure how to write the code so the txt file is used, I just declared it.

    I want the code to open and use the txt file to show that 'dog' is 'dog: n'. And I want to know how the txt file should look too, so two things I'm asking help with.

    And to answer your question. I have the dictionary template made of a folder called dictionary, in this folder are individual folders with alphabetical names, so 26 folders, and in each folder I have folders and in each of these folders in ten txt files. Each txt file is one page of the dictionary. I have already made this and have the letter A finished so the names are listed but not the grammar part the word is.

    After I have the entire dictionary page by page I can open each page and write the verb or noun beside the word, ten pages will be done in a day, I calculate it will take me 200 days at one folder or ten txt file per day.

    After I'm done I guess I can look at other dictionaries to cross reference my list.

    Then to group them by nouns and verbs etc. So I have a dictionary of nouns, verbs each in their individual txt file.

    So now you see why I want to know how to do what I'm asking your help with.

    This is my project, I'm doing this because when I'm done I can give away my work maybe at sourceforge.
    Last edited by jeremy duncan; 10-19-2011 at 01:18 AM.

  12. #12
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Well, sad to say, but you're nutty as a fruitcake.

    As am I, and I had the feeling you were going to go "nutty" with this. I'll write up a longer post in a bit, with more details. You don't want to do this, the way you're thinking of doing it, right now. More in about 15 minutes.

  13. #13
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by jeremy duncan View Post
    The code works when I run this code:
    In the code I can open a txt file, but I'm not sure how to write the code so the txt file is used, I just declared it.
    Your best bet is to load your entire wordlist file into memory.
    This will be done with a big old array and a loop or you can use a linked list if you choose.
    Once the file is loaded you then need to search the file for every word in the sentence and mark it accordingly.

    Your wordlist file will probably look something like this...
    PHP Code:
    dog N
    go V
    cat N
    car N
    fast A
    amorphous A 
    ... and so on.

    so what you do is when you pull a word you look for it in the list and print from the list instead...

  14. #14
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by jeremy duncan View Post
    And to answer your question. I have the dictionary template made of a folder called dictionary, in this folder are individual folders with alphabetical names, so 26 folders, and in each folder I have folders and in each of these folders in ten txt files. Each txt file is one page of the dictionary. I have already made this and have the letter A finished so the names are listed but not the grammar part the word is.
    You might as well stop working on that right now... This is a searching nightmare! You are talking about moving through multiple folders, and searching multiple files *for each word* in a short sentence... just think how much worse it will get when you expand that to work with text files as inputs...

    You need your word list in *a single file*... yes, all 147,000 English words in a single file... in alphabetical order!

    ( How many words are there in the English language? : Oxford Dictionaries Online )

    Then you use advanced search algorythms such as Binary Search to very quickly go through the list and look up your word.

  15. #15
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I'm referring to a text based project here, which is compatible with the level you're at. A quicker way to make this happen would use a binary data, and etc.

    1) There are word lists (some are whole dictionary word lists), available on line, free for the downloading. Some are more British, some have more technical words, some more popular words. Some are lists of every word they have found on the internet by "web scraping", some from dictionaries, some from other word lists, etc. I combined two of these word lists, and then deleted a bunch of the really obscure words.

    2) I recommend two tiers of files, the top tier for the first letter in the word (and all letters in all words for searching, are first set to lowercase by tolower() ), and the second tier was according to the length of the word, using strlen(). So to look up CAT, I changed it to cat, went to the C3.txt file, and all the files words were there, in sorted order. A quick binary search and you're done.

    3) Each word file held several pages of a dictionary. I used the simple:

    cat
    catastrophe
    catastrophic
    catechism
    etc.
    format, and recommend it. The "space" you see on the page, isn't wasted space in the file. In the file, the above looks like:

    cat\ncatastrophe\ncatechism\n etc.

    No wasted space. Also, it's super easy to scan down the file listing in an editor, add or delete a word or word classification.

    4) Your computer has a natural "page" size, which will be some multiple of the BUFSIZ in C, for your compiler. Use that, if you can find it in your OS. Otherwise, you BUFSIZ and times it by 4 to 10.

    5) For your format, I'd suggest using:

    cat n
    catastrophe n
    etc.
    Just a simple space, and no commas or quotes of any kind. Perfect for fscanf(), because your data will be rigidly formatted.

    It's a good idea to break the word lists up into several files, but do it only in two tiers: First letter, and length of the word. Sounds like a searching nightmare, but it isn't, if you keep all the files, in the same directory. Nothing wrong with loading all the words up into one file, and loading the whole list from that file, into memory (an array), but can that be done?

    Working from Turbo C, I certainly could not come close to doing that, when I was working on my word project!
    Last edited by Adak; 10-19-2011 at 01:59 AM.

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. reading text-and-numbers file word by word
    By bored_guy in forum C Programming
    Replies: 22
    Last Post: 10-26-2009, 10:59 PM
  2. reading file word by word
    By 98holb in forum C Programming
    Replies: 2
    Last Post: 01-25-2006, 04:49 PM
  3. Reading in a file word by word
    By Bumblebee11 in forum C Programming
    Replies: 4
    Last Post: 06-10-2003, 09:39 PM
  4. open file, search of word, replace word with another
    By Unregistered in forum C++ Programming
    Replies: 0
    Last Post: 06-05-2002, 01:16 PM
  5. Help reading text file word by word
    By Unregistered in forum C++ Programming
    Replies: 6
    Last Post: 05-25-2002, 05:13 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21