Thread: Word Counting Help(newbie)

  1. #1
    Registered User
    Join Date
    Sep 2003
    Posts
    15

    Word Counting Help(newbie)

    Hi, I'm new to these boards as well as programming. I'm working my way through "Accelerated C++" by Koenig and Moo. I'm happy with this book except for the fact that it does not provide solutions to the execercises.

    I'm currently working a word counting program that should display the number of distinct words in its input. I am probably way off track, but have been able to get it to work unless there are 3 or more of the same entries. I could really use advice on where to go from here. Thanks in advance.

    Code:
    #include<iostream>
    #include<string>
    #include<vector>
    #include<stdlib.h>
    
    using namespace std;
    
    int main()
    {
     // get input
     cout << "Please enter a list a words." << endl;
     vector<string> words;
     string x;
     while(cin >> x)
     {
       words.push_back(x);
     }
    
    //count words
     vector<string>::size_type size;
     size = words.size();
    
    int pass = 0;
    int i = 0;
    int j = 0;
    int duplicate = 0;
    
    while (i < size)
     {
     j = 0;
    
     while (j < size)
      {
    
       if (i != j && j >= pass)
        {
         cout << "\nComparing " << words[i] << " with "   //troubleshooting
              << words[j] << endl << endl;                //troubleshooting
         if(words[i] == words[j])
              ++duplicate;
         }
       ++j;
      }
     ++pass;
     ++i;
     }
     cout << endl << "You have entered " << size << " words.";
     cout << endl << "You have entered " << size - duplicate << " distinct words.";
     cout << "\n";
     system("pause");
     return 0;
    }
    Last edited by MedicKth; 09-17-2003 at 09:44 PM.

  2. #2
    Registered User
    Join Date
    Sep 2003
    Posts
    23
    I would use for loops instead of while, because if you know the number of loops, it is better readable than some ++i and ++pass statements.

    And you can think about contain of your data structures. Do you need to store all words? Wouldn't it be better to keep only different words in your list, and maybe number of occurences in input for each word?

  3. #3
    Registered User
    Join Date
    Jan 2003
    Posts
    311
    The core of your problem is that you are counting the number of pairs of words, not duplicates. The easyest solution is to simply sort the vector. The other problem is the strangeness of your loops.
    Code:
    for(int i=0;i<size;++i) {
        for(int j=i+1;j<size;++j) {
            all pairs i,j with i<j here..
        }
    }
    This loop will cover all pairs in much the same way as your loop, it's just simpler and faster. You still need a way to make sure that you only count a duplicate between words[i] == words[j] if and only if there is no words[i-k] == words[i] One way would be to swap every second element of each matching pair with the last element in the vector and remove the last element. This will alter the order of the list, and you might as well sort, the faster, beter solution. You could also have another vetor of bool that you use as a look up table to see if any duplicate has been counted before. Still, sort the vector, or use a set and count duplicates as you read the words in.

  4. #4
    Registered User major_small's Avatar
    Join Date
    May 2003
    Posts
    2,787
    just picking:
    Code:
    #include <stdlib.h>
    should be
    Code:
    #include <cstdlib>
    Join is in our Unofficial Cprog IRC channel
    Server: irc.phoenixradio.org
    Channel: #Tech


    Team Cprog Folding@Home: Team #43476
    Download it Here
    Detailed Stats Here
    More Detailed Stats
    52 Members so far, are YOU a member?
    Current team score: 1223226 (ranked 374 of 45152)

    The CBoard team is doing better than 99.16% of the other teams
    Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

    Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT

  5. #5
    Registered User
    Join Date
    Sep 2003
    Posts
    15

    Post Improvement(maybe)

    Thank you greatly for all of the advice. I have written the following code based simply on aerian's advice, prior to reading the other's. I will go ahead and post it while I ponder the rest.

    This is working although I'm sure it is by far the best means to solve the problem. Also it was aerian who got me to think of doing it this way but I seriously doubt it's exactly what he intended. Again thanks for the replies and please keep it coming.

    Code:
    #include<iostream>
    #include<string>
    #include<vector>
    #include<stdlib.h>
    
    using namespace std;
    
    int main()
    {
     // get input
     cout << "Please enter a list a words." << endl;
     vector<string> words;
     string x;
     int count = 0;
    
     while(cin >> x)
     {
      int duplicate = 0;
      if (count != 0)
         {
          for(int i = 0; i < words.size(); i ++)
            {
             if(x == words[i])
                  duplicate = 1;
            }
          if(duplicate == 0)
             words.push_back(x);
         }
    
      else
          words.push_back(x);
      count++;
     }
    
     cout << endl << "You have entered " << words.size() << " distinct words.";
     system("pause");
     return 0;
    }

  6. #6
    Registered User
    Join Date
    Sep 2003
    Posts
    23
    Hmm, it seems to be good now. More or less, it's what I intended
    At least it is easier to read for me. Though, my knowledge of STL is reduced to it's name and to some intuition, what else can be in such a library, if not some useful templates?

    Now, if all is working, you can think about changing the algorithm to something faster. That linear search is rather slow, try your program on huge input (probably 50000 words will be enough).
    It will consume about O(n^2) time. (For explanation of that notation, see some issue of Code Journal.)

    What about binary search? Maybe it is even implemented in some function/class in the library.
    (it will take O(n * log n))

    And after you have done that, you can try even more sophisticated algorithms. Try keeping your words in lexicographical tree It will then take only O(n * length), where length is depth of the tree, thus maximal length of the words.

  7. #7
    Registered User
    Join Date
    Sep 2003
    Posts
    15
    Thanks again for all the help. I'm moving on to the other excercises but I'm definately taking notes from all of the replies to look into for these other programs and improving this one later.

    I am sure I will be posting many more problems and look forward to learning more and more with your help. Thanks.
    Last edited by MedicKth; 09-19-2003 at 02:59 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. String - word counting!
    By shardin in forum C Programming
    Replies: 2
    Last Post: 07-11-2007, 05:08 PM
  2. brace-enclosed error
    By jdc18 in forum C++ Programming
    Replies: 53
    Last Post: 05-03-2007, 05:49 PM
  3. Wrong Output
    By egomaster69 in forum C Programming
    Replies: 7
    Last Post: 01-28-2005, 06:44 PM
  4. Word Counting
    By Achillles in forum C++ Programming
    Replies: 9
    Last Post: 09-11-2002, 02:09 PM
  5. follow up with char I/O and line, word, and char counting
    By Led Zeppelin in forum C Programming
    Replies: 1
    Last Post: 03-23-2002, 09:26 PM