Thread: Need Help on text analyzer program

  1. #1
    Registered User
    Join Date
    Mar 2005
    Posts
    5

    Need Help on text analyzer program

    the problem read
    "Write a text analyzer program that will read any txt file. The program is to print a menu that gives the user the options of counting lines, words, characters, sentences (one or more words ending in a period), or all of the above. Provide a separate functionfor each option. At the end of the analysis, write an appropriat report."
    the question that i have is, how do I make functions taht count lines, words, characters, and sentences?

    Any help would be great.

  2. #2
    Registered User
    Join Date
    Mar 2002
    Posts
    1,595
    As a first approximation you could read through the file one char at a time. Each char that isn't a whitespace char should increment a char count variable. Each char that is a space will increment a word count variable. Each char that is a newline char will increment a line count variable. Each char that is a period will increase a sentence count variable.

    There are ways to get more sophisticated that you can add to the process if you can get all those to work. To start out bive it your best shot by starting out writing things down with pen and paper (or a text processor) using English (or your native language). Then try to change that into pseudocode and finally into real code. Write the code by writing just a few lines, certainly no more than one function, at a time before compiling and linking and fixing any errors. If you get stuck at any point post your questions including pertinent code(using code tags!), and error messages (indicate appropriate line(s) where error message is pointing).
    You're only born perfect.

  3. #3
    Registered User
    Join Date
    Mar 2005
    Posts
    5
    tx that helped alot for now
    i'll try and ask more questions later

  4. #4
    former member Brain Cell's Avatar
    Join Date
    Feb 2004
    Posts
    472
    Quote Originally Posted by elad
    Each char that is a space will increment a word count variable. Each char that is a newline char will increment a line count variable.
    Smart , but it would count some letters as word , too as in : This is a text file. It would count 'a' as a word.
    Quote Originally Posted by elad
    Each char that is a period will increase a sentence count variable.
    text files might contain contiguos periods (like '...' ) , this would show false results.

    As elad said , work on those then try to figure out more accurate ways and post here if you encounter any problem
    My Tutorials :
    - Bad programming practices in : C
    - C\C++ Tips
    (constrcutive criticism is very welcome)


    - Brain Cell

  5. #5
    Registered User
    Join Date
    Mar 2005
    Posts
    5
    tx for all the help...i have another question...
    How do i make the program print out the ten most frequently occurring alphabetic characters and the number of times each occurred, in descending order by frequency of occurrence.

  6. #6
    Registered User
    Join Date
    Mar 2005
    Posts
    5
    also, where should i read the file from? In each function, or in the main?

  7. #7
    Registered User
    Join Date
    Mar 2005
    Posts
    5
    hmm this is what i got so far.
    can anyone help me break this into 3 functions of words, lines and chars?

    Code:
    #include <iostream>
    #include <fstream>
    #include <cstdlib>
    #include <iomanip>
    using namespace std;
    #define WHT_SPC (cur == ' ' || cur == '\n' || cur == '\t')
    
    int main() 
    {
    	char curCh;
    	char preCh;
    	char cur;
    	char word = 'O';
    	int countWd = 0;
    	int countLn = 0;
    	int countCh = 0;
    
    	ifstream fileIn;
    	fileIn.open("dream.txt");
    	if (!fileIn)
    	{
    		cerr << "Error opening dream.txt";
    		exit (100);
    	}
    
    	while (fileIn.get (curCh))
    	{
    		if (curCh != '\n')
    			countCh++;
    		else
    			countLn++;
    		preCh = curCh;
    	}
    
    	if (preCh != '\n')
    		countLn++;
    
    	while (fileIn.get (cur))
    	{
    		if (WHT_SPC)
    			word = 'O';
    		else
    			if (word == 'O')
    			{
    				countWd++;
    				word = 'I';
    			}
    	}
    
    	cout << endl;
    	cout << "Number of characters: " << setw(4) << countCh << endl;
    	cout << "Number of lines	: " << setw(4) << countLn << endl;
    	cout << "The number of words is :" << setw(4) << countWd << endl;
    
    	fileIn.close();
    	system("pause");
    	return 0;
    }

  8. #8
    Registered User
    Join Date
    Mar 2002
    Posts
    1,595
    If you are allowed to use them, there are a number of standard functions available for use that may or may not be helpful. For example, isspace(ch) evaluates ch to determine if it is a whitespace char, isalpha(ch) determines if ch is a letter, isdigit(ch) determines if ch is a digit, ispunct(ch) determines if ch is a punctuation mark, etc.


    I'd read each char in from file in a single loop, analyzing it at the time and incrementing the appropriate counter.

    You have to decide whether capital A is same as lower case a or not. You use toupper(ch) or tolower(ch) to be sure you have all upper or lower case letters if you don't want to track upper/lower case letters separately. You have to decide if you want to count periods, commas, colons, semi-colons, etc, as non-alphabetical char or not. To keep track of each char to determine frequency of individual char you will need a container that maps the char to an int. Since each char is an int to the computer you can use that relationship to your advantage. A common container to use for this purpose is an array of ints, initialized to zero to begin with. The index of each char is the character set (usually ASCII or unicode) value (if all char alphabetical or not is used) or adjusted to the ASCII value (if change to all caps you can use ch - 'A' to get the appropriate index and if change to all lower case you can use ch - 'a' to get the index. Once you have the index you increment the value. Once you have completed the file read in you can sort the container and choose however many values you want, the top 10, the bottom 3, etc.

    To me the a in "Elsie is a cow" is a word, just as the letter I would be in the sentence "I am hungry." However, the colon the following sentence would not be considered a word, IMHO, "The vowels are : a, e, i, o, u and sometimes y." whereas each of the individual vowels would be. Since there is a space after the colon, it will throw of the word count a little, and would need to be accounted for eventually. However, in a file for beginners to evaluate, that syntax isn't likely to occur. Double spaces between sentences and spaces used in place of tabs may be problematic--though again, for beginners, the simple space count will probably be adequate. The more sophisticated analyses can usually be entered later, and for now, are clearly more than adequate given the questions being raised by the OP.
    You're only born perfect.

  9. #9
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Quote Originally Posted by drkmarine
    tx for all the help...i have another question...
    How do i make the program print out the ten most frequently occurring alphabetic characters and the number of times each occurred, in descending order by frequency of occurrence.
    Well, if I were doing this I would use a map<char,int> container to store the characters and the occurrence of said character. And then a multimap<int,char,greater<int> > container to handle the sorting by occurrence. All of this could be done in around a dozen lines of code. Your level of experience however may dictate other solutions.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Client-server system with input from separate program
    By robot-ic in forum Networking/Device Communication
    Replies: 3
    Last Post: 01-16-2009, 03:30 PM
  2. Replies: 3
    Last Post: 05-25-2005, 01:50 PM
  3. Changing text size in a program
    By RedTroja in forum C++ Programming
    Replies: 3
    Last Post: 10-12-2003, 03:53 AM
  4. Help with text program...
    By gcn_zelda in forum Windows Programming
    Replies: 2
    Last Post: 09-08-2003, 12:47 AM
  5. My program, anyhelp
    By @licomb in forum C Programming
    Replies: 14
    Last Post: 08-14-2001, 10:04 PM