Thread: string arrays

  1. #1
    Registered User
    Join Date
    Sep 2007
    Posts
    10

    string arrays

    Hi, i'm new to the boards so please don't flame me too much if this is too easy.

    I'm trying to read from a file that contains a couple paragraphs of sentences and the point is to read each word and count how many times each word appears in the file. Then output each word and next to it how many times it appeared.

    Here's my code so far.

    Code:
    #include <iostream>
    #include <fstream>
    #include <cctype>
    #include <string.h>
    using namespace std;
    
    
    
    void readWords();
    
    
    void main(){
    
    	readWords();
    
    	
    	}
    
    //************************************************************************************//
    void readWords(){
    
    	char ch = 'a';
    	char formWord[100];
    	string words[500];
    	int i = 0;
    	int j = 0;
    	int k = 0;
    	ifstream infile;
    
    	infile.open("p01.txt");
    	if (!infile){
    		cout << "Cannot open output file. \n";
    	}
    
    	while (!infile.eof()){
            infile.get(ch);
    		if ((ch == ' ')||(ch == ',')||(ch == '?')||(ch == '!')||(ch == ':')||(ch == '"')||(ch == '.')){
    		    ch = 'a';
    			cout << " ";
    			words[k] = formWord;
    		}
    		else{
                 formWord[i] = toupper(ch);
    	         cout << formWord[i];
    	         i++;
    	      
    		}
    	}
    	
    }
    Now I am reading each letter from the file into the character array formWord until it hits a space or one of the punctuation marks, then I try to store the word formed into a string array so I can later output it, but I can't seem to figure out how to do that.

    Can anyone help?

    Much appreciated.

  2. #2
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    When you find the end of a word you need to do a couple things. One is to make sure that the string is not empty (what if there is a '.' followed by a ' '). Another is to add a null character to your character array, or use a different method of assigning the character data to the string. The string words[k] will expect a null terminated character array when you pass it formWord. Next, you'll want to reset i so that when you start looking for letters again you form a word by itself, not tacked on to the previous word. Finally, don't forget to increment your word count variable every time you add a new word.

    A couple other suggestions:
    1. You are using the C++ string class, so you should #include <string>. That is different than #include <string.h>. You might need <string.h> (or even better its C++ equivalent <cstring>) if you use isalpha as I suggest below, otherwise you would only need <string>.
    2. void main() is not legal in C++, even if your compiler supports it. It's better to use the standard int main().
    3. Don't use eof() to control the while loop. It probably won't make a difference for you here, but in many situations it will cause your loop to run an extra time because eof() doesn't return true until after you try to read past the end of the file. A better choice would be to move the get into the while control:
      Code:
      while (infile.get(ch)){
    4. Consider isalpha to find valid letters. That way you won't miss any characters that shouldn't be part of words.
    5. Use better variable naming. For example, instead of k, name the variable wordCount or something to indicate you are using it to count the number of words and the current index in the word array.
    6. Don't mix spaces and tabs in your code indentation, it makes it hard to read and follow the code when you post it here.

  3. #3
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Another thing you could do is read the whole file into a string or vector<char>, then remove all punctuation from that string so you're only left with words & spaces. Then start counting words...

  4. #4
    Registered User
    Join Date
    Sep 2007
    Posts
    10
    Ok, so I think i've got it to read in the words and store them fine except for it keep storing spaces too for some reason. Also I can't seem to figure out how to run each word I form through the array to see if I found it already and then add to its count. It seems so simple, but I can't figure it out!

    Code:
    #include <iostream>
    #include <fstream>
    #include <cctype>
    #include <iomanip>
    #include <string.h>
    #include <string>
    #include <cstring>
    using namespace std;
    
    
    
    void readWords();
    
    
    int main(){
    	readWords();
    
    	return 0;
    }
    
    //************************************************************************************//
    void readWords(){
    
    	char ch = 'a';
    	char formWord[15];
    	string word;
    	string words[800];
    	int count[350];
    	int i = 0;
    	int k = 0;
    	int numCount = 0;
    	int numCount2 = 0;
    	int wordCount = 0;
    	ifstream infile;
    
    	infile.open("p01.txt");
    	if (!infile){
    		cout << "Cannot open input file. \n";
    	}
    
    	for (i = 0; i < 350; i++){
    		count[i] = 0;
    	}
    	i = 0;
    
    	while (infile.get(ch)){
    		if ((ch == ' ')||(ch == ',')||(ch == '?')||(ch == '!')||(ch == ':')||(ch == '"')||(ch == '.')){
    		    ch = 'a';
    			infile.ignore();
    			formWord[i] = '\0';
    			word = formWord;
    
    			if (wordCount == 0){
    				words[wordCount] = word;
    				count[numCount] = count[numCount] + 1;
    			}
    			else if (wordCount != 0){
    					if (k < wordCount){
    						for (k = 0;k < wordCount;k++){
    							if (word == words[k]){
    								count[k] = count[k] + 1;
    							}
    						}
    					}else{
    						words[wordCount] = word;
    						count[numCount] = count[numCount] + 1;
    					}
    
    			wordCount++;
    			numCount++;
    			i=0;
    		}
    		else{
                 formWord[i] = toupper(ch);
    	         i++;
    	      
    		}
    		}
    	}
    
    }

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Another thing you could do is read the whole file into a string or vector<char>, then remove all punctuation from that string so you're only left with words & spaces.
    Instead of reading the whole file, you could read "word" by "word" using formatted input with the overloaded operator>> for istreams. For each word, strip the punctuation and then...

    Then start counting words...
    ... use a std::map<std::string, int> to map the words to their frequencies.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Registered User
    Join Date
    Sep 2007
    Posts
    10
    Alright, I think i've got it to read the words alright, but now the counts are all off for some reason. Can't figure out why.

    Code:
    #include <iostream>
    #include <fstream>
    #include <cctype>
    #include <iomanip>
    #include <string.h>
    #include <string>
    #include <cstring>
    using namespace std;
    
    
    
    void readWords();
    bool searchArray(string words[], string word, int wordCount, int& searchCount);
    
    
    int main(){
    	readWords();
    
    	return 0;
    }
    
    //************************************************************************************//
    void readWords(){
    
    	char ch = 'a';
    	char formWord[15];
    	string word;
    	string words[800];
    	int count[350];
    	int i = 0;
    	int k = 0;
    	int numCount = 0;
    	int numCount2 = 0;
    	int wordCount = 0;
    	int searchCount = 0;
    	ifstream infile;
    
    	infile.open("p01.txt");
    	if (!infile){
    		cout << "Cannot open input file. \n";
    	}
    
    	for (i = 0; i < 350; i++){
    		count[i] = 0;
    	}
    	i = 0;
    
    	while (infile.get(ch)){
    		if((ch != ' ')&&(ch != ',')&&(ch != '?')&&(ch != '!')&&(ch != ':')&&(ch != '"')&&(ch != '.')){
    			formWord[i] = toupper(ch);
    	        i++;
    		}
    		else if(((ch == ' ')||(ch == ',')||(ch == '?')||(ch == '!')||(ch == ':')||(ch == '"')||(ch == '.'))&&(i > 0)){
    				formWord[i] = '\0';
    				word = formWord;
    				if (searchArray(words,word,wordCount, searchCount)){
    					count[searchCount] = (count[searchCount] + 1);
    				}
    				else{
    					words[wordCount] = formWord;
    					count[numCount] = count[numCount] + 1;
    					wordCount++;
    					numCount++;
    				}
    					i=0;
    				
    				}    
    	}
    	for (int j = 0; j < 200; j++){
    		cout << words[j] << "    " << count[j] << setw(10) << endl;
    	}
    }
    
    
    //*************************************************************//
    bool searchArray(string words[], string word, int wordCount, int& searchCount) {
    
    	bool condition = false;
    
    	for (searchCount = 0; searchCount < wordCount; searchCount++){
    		if (words[searchCount] == word)
    			condition = true;
    	}
    	return (condition);
    }
    //*************************************************************//

  7. #7
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    <cstring> is exactly the same as <string.h>, except the contents of cstring are in the std namespace, whereas the contents of string.h are in the global namespace. Since you have using namespace std, you put the contents of the std namespace into the global namespace.

    This means that you don't need both of these headers, just one or the other. I'd suggest cstring instead of string.h, because of the namespace thing (in case you ever decide to stop using namespace std) and because all of your other header files are C++-style.

    Code:
    	int count[350];
    	int i = 0;
    
    	// ...
    
    	for (i = 0; i < 350; i++){
    		count[i] = 0;
    	}
    This does exactly the same thing:
    Code:
    	int count[350] = {0};
    	int i = 0;
    	// ...
    But it's a lot more concise, don't you think?

    Code:
    		if((ch != ' ')&&(ch != ',')&&(ch != '?')&&(ch != '!')&&(ch != ':')&&(ch != '"')&&(ch != '.')){
    			formWord[i] = toupper(ch);
    	        i++;
    		}
    		else if(((ch == ' ')||(ch == ',')||(ch == '?')||(ch == '!')||(ch == ':')||(ch == '"')||(ch == '.'))&&(i > 0)){
    You might want to make an is_word() function or something. It would simplify that code greatly.

    Code:
    cout << words[j] << "    " << count[j] << setw(10) << endl;
    setw() only applies to the value printed right after it. If you wanted both words[j] and count[j] to be setw()'d to 10, for example, you'd have to use:
    Code:
    cout << setw(10) << words[j] << "    " << setw(10) << count[j] << endl;
    Code:
    count[searchCount] = (count[searchCount] + 1);
    Why not just use this?
    Code:
    count[searchCount] += 1;
    or even
    Code:
    count[searchCount] ++;
    Code:
    	ifstream infile;
    
    	infile.open("p01.txt");
    	if (!infile){
    		cout << "Cannot open input file. \n";
    	}
    You should probably quit the program if the input file could not be opened, otherwise you'll try reading from an unopened file and likely segfault. I'd use
    Code:
    	ifstream infile("p01.txt");
    
    	if (!infile.is_open()){
    		cout << "Cannot open input file. \n";
    		return 1;
    	}
    Code:
    char ch = 'a';
    There's no need to initialize ch to anything, and doing so makes it less readable because the reader might think that you're going to use the value in ch later on.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    835
    In C++ you're allowed to use an empty initializer list for an array - though in C there has to be at least 1 element. So in C++ you can write
    Code:
    int count[350] = {};
    which has the same effect.

  9. #9
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Hmm, you're right. I wasn't aware of that. It's nice to know.

    Also:
    Code:
    		else if(((ch == ' ')||(ch == ',')||(ch == '?')||(ch == '!')||(ch == ':')||(ch == '"')||(ch == '.'))&&(i > 0)){
    				formWord[i] = '\0';
    				word = formWord;
    				if (searchArray(words,word,wordCount, searchCount)){
    					count[searchCount] = (count[searchCount] + 1);
    				}
    There's no need to use the temporary variable word. You can pass on the char array formWord directly to searchArray(), which takes a string parameter.

    Code:
    		if((ch != ' ')&&(ch != ',')&&(ch != '?')&&(ch != '!')&&(ch != ':')&&(ch != '"')&&(ch != '.')){
    			formWord[i] = toupper(ch);
    	        i++;
    		}
    		else if(((ch == ' ')||(ch == ',')||(ch == '?')||(ch == '!')||(ch == ':')||(ch == '"')||(ch == '.'))&&(i > 0)){
    You don't need to repeat those conditions twice. This would work.
    Code:
    		if((ch != ' ')&&(ch != ',')&&(ch != '?')&&(ch != '!')&&(ch != ':')&&(ch != '"')&&(ch != '.')){
    			formWord[i] = toupper(ch);
    	        i++;
    		}
    		else if(i > 0){
    And an is_word() function like I suggested would make it even easier.
    Code:
    bool is_word(char ch) {
        return (ch != ' ')&&(ch != ',')&&(ch != '?')&&(ch != '!')&&(ch != ':')&&(ch != '"')&&(ch != '.');
    }
    
    		if(is_word(ch)){
    			formWord[i] = toupper(ch);
    	        i++;
    		}
    		else if(i > 0){
    You could even make use of strchr():
    Code:
    bool is_word(char ch) {
        return !strchr(" ,?!:\".", ch);
    }
    Code:
    	for (int j = 0; j < 200; j++){
    		cout << words[j] << "    " << count[j] << setw(10) << endl;
    	}
    What if there weren't 200 words? I'd use wordCount or numCount, both of which seem to do exactly the same thing. Perhaps you could combine them into one variable.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  10. #10
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    What's the difference between:
    Code:
    int count[350] = {};
    and
    Code:
    int count[350];

  11. #11
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    I believe the first one initializes each element to 0, while the last one contains undefined values (except possibly in various circumstances).

    If I'm correct, the first one is equivalent to:

    Code:
    int count[350] = { 0 };
    This last one, as hinted to in previous posts, is valid C code as well and generally how you would set an array to 0 upon declaration.
    Last edited by MacGyver; 09-07-2007 at 10:52 PM.

  12. #12
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    That is correct.

    Every element of global and static arrays are also automatically initialized to zero. But ordinary arrays start with undefined values.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. char Handling, probably typical newbie stuff
    By Neolyth in forum C Programming
    Replies: 16
    Last Post: 06-21-2009, 04:05 AM
  2. String Class
    By BKurosawa in forum C++ Programming
    Replies: 117
    Last Post: 08-09-2007, 01:02 AM
  3. Program using classes - keeps crashing
    By webren in forum C++ Programming
    Replies: 4
    Last Post: 09-16-2005, 03:58 PM
  4. Something is wrong with this menu...
    By DarkViper in forum Windows Programming
    Replies: 2
    Last Post: 12-14-2002, 11:06 PM
  5. string handling
    By lessrain in forum C Programming
    Replies: 3
    Last Post: 04-24-2002, 07:36 PM