Thread: Files that do not end with newline?

  1. #1
    Registered User edw211's Avatar
    Join Date
    Jun 2011
    Location
    Wilkes-Barre, PA
    Posts
    22

    Files that do not end with newline?

    Earlier I posted a program as a solution to a K&R exercise that reads a file/text stream and prints out a histogram of number of occurrences of word lengths. I think I've worked out nearly all the bugs and this is what I now have:

    Code:
    /* Corresponding K&R section: 1.6 */
    
    /* Prints vertical histogram of lengths of words in input*/
    
    #include <stdio.h>
    
    /* NOTE: given enough horizontal space, can scale to any two-digit maximum by
       modifying this alone... gets messy with three digits, but I could 
       make the numbers on the x-axis display vertically down to make it infinitely
       scalable. */
    
    #define MIN_WORD_LENGTH 1
    #define MAX_WORD_LENGTH 20 	 
    
    int main(void)
    {
    	//Holds characters
    	int c = 0;			
    	int lastchar = 0;
    
    	//current number of contiguous non-whitespace chars						
    	int currentLength = 0;			
    
    	//holds number of occurrences of each length				
    	int wordLengthFrequencies[MAX_WORD_LENGTH] = {0};
    
    	//highest number of occurrences encountered
    	int maxFrequency = 0;
    
    	/* ---------------- Collect length data --------------- */
    	
    	while((c = getchar()) != EOF)
    	{
    		//Are we currently inside a word?
    		if(currentLength >= MIN_WORD_LENGTH) 
    		{
    			//Are we encountering whitespace?
    			if(c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\v') 
    			{
    				/* We've reached the end of a word. Update array */
    				wordLengthFrequencies[currentLength - 1]++; 				
    				currentLength = 0; //Reset
    			}
    			
    			/* No whitespace, so we're still inside a 
    			   word. Update currentLength if it hasn't 
    			   maxed out*/
    			else if(currentLength < MAX_WORD_LENGTH)
    				currentLength++;
    		}
    		
    		/* Have we just encountered the start of a word? */
    		else if(c != ' ' && c != '\t' && c != '\n' && c != '\r' && c != '\v') 
    			currentLength = 1;
    
    		lastchar = c;			
    	}
    
    	/* Handle case where file does not end on newline */
    	if(lastchar != '\n' && currentLength != 0)
    		wordLengthFrequencies[currentLength - 1]++;	
    	
    	/* --------------- Print Histogram ------------- */
    	
    	printf("\n\n");
    
    	int i = 0;
    
    	/* Determine maximum frequency */
    	for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
    	{
    		if(wordLengthFrequencies[i - 1] > maxFrequency)
    			maxFrequency = wordLengthFrequencies[i- 1];
    	}
    
    
    	/* Start printing the graph starting from the maximum frequency */
    	for(c = maxFrequency; c >= 1; c--)
    	{
    
    		/* Make sure graph will still be aligned even for 7-digit frequencies
    			i.e. if we are operating on a large file */
    		printf("%7d | ", c);	
    
    		for(i = 0; i < MAX_WORD_LENGTH; i++)
    		{
    			if(wordLengthFrequencies[i] >= c)
    				printf("*  ");	//fill in where appropriate
    			else 
    				printf("   ");
    		}
    		printf("\n");
    	}
    	
    	/* print the x-axis and legend */
    	putchar('\t');
    	for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
    		printf("---");
    	
    	printf("\n\t");
    	for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
    		printf("%3d", i);
    	putchar('+');
    	printf("\n\nx-axis: word length\ny-axis: # of occurrences\n\n");
    	return 0;
    }
    I noticed that my program failed to count the last word in the file if the file did not end on a newline and had no trailing whitespace after the last word. So I added this particular snippet to take care of that:

    Code:
    /* Handle case where file does not end on newline */
    if(lastchar != '\n' && currentLength != 0)
    	wordLengthFrequencies[currentLength - 1]++;
    By my logic, if the last character to be read before EOF was not '\n', this means
    the file did not end on a newline. But, this does not necessarily mean I should
    just go ahead and increment the array member corresponding to what's left
    in currentLength. It may very well be 0 because I may have trailing whitespace after the last word which would cause currentLength to be reset when it is processed. What's worse, this would mean I'm incrementing the -1 index, which is out of bounds. No matter what whitespace is trailing, currentLength will be 0, so I just make sure it isn't. With that in place, I'm fairly certain my program is solid.

    I might be being anal about this for an exercise out of a programming book, but I guess I see little point in trying to learn C with exercises if I don't make damned sure my solutions are airtight, given how easy it is to proverbially "shoot myself in the foot." I was wondering if anyone else could see any flaws in my logic above or if there's something I overlooked.

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Keep track of the last character read and if you hit EOF, you can look at your last character. If it is a letter, it's part of a word, so you can just see the current length. If it's not, then you don't need to increment any word count, because the file ended in a space or something not-a-word.
    Code:
    if( isalpha( lastchar ) )
        do something with this word
    else
        not a word - don't do something with a word
    Alternately, you can decide if you want to check for numbers or letters or punctuation or something else. If all you care about is if it is white space, you can try flopping the if and else chunks, and use isspace, or some other similar function.


    Quzah.
    Hope is the first step on the road to disappointment.

  3. #3
    Registered User edw211's Avatar
    Join Date
    Jun 2011
    Location
    Wilkes-Barre, PA
    Posts
    22
    Those functions do look useful for what I'm doing. I think the point of the exercise, though, is to implement a solution using only the tools Kernighan and Ritchie have described so far. I feel like I'd be cheating if I used isspace() or isalpha() but I see your point and thanks for the info.

    I was mainly wondering if, given my solution as it stands, there is any conceivable way someone could throw input at it that would screw it up. Should have made that clearer.

  4. #4
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
    		if(currentLength >= MIN_WORD_LENGTH) 
    		{
    			//Are we encountering whitespace?
    			if(c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\v') 
    			{
    				/* We've reached the end of a word. Update array */
    				wordLengthFrequencies[currentLength - 1]++; 				
    				currentLength = 0; //Reset
    			}
    			
    			/* No whitespace, so we're still inside a 
    			   word. Update currentLength if it hasn't 
    			   maxed out*/
    			else if(currentLength < MAX_WORD_LENGTH)
    				currentLength++;
    		}
    What happens if you hit your word length? I don't see you covering that scenario. Are you just ignoring the extra letters? So if the only thing in the file is a word of MAX_WORD_LENGTH+1 characters, you will just count it as a single word of MAX_WORD_LENGTH?

    But I think you've pretty much got it covered.


    Quzah.
    Last edited by quzah; 06-09-2011 at 01:03 AM.
    Hope is the first step on the road to disappointment.

  5. #5
    Registered User edw211's Avatar
    Join Date
    Jun 2011
    Location
    Wilkes-Barre, PA
    Posts
    22
    Quote Originally Posted by quzah View Post
    What happens if you hit your word length? I don't see you covering that scenario. Are you just ignoring the extra letters? So if the only thing in the file is a word of MAX_WORD_LENGTH+1 characters, you will just count it as a single word of MAX_WORD_LENGTH?
    Well, I realized I don't have infinite amounts of horizontal room over which to display every possible length, so I group 20 or more character-long words in the same category. When I display the histogram, I label that category as 20+.

    Quote Originally Posted by quzah View Post
    But I think you've pretty much got it covered.
    Sick. Onto the next exercise, then. Thanks for the help.

    -Evan Williams

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. newline
    By st00ch in forum C Programming
    Replies: 6
    Last Post: 03-19-2011, 04:10 AM
  2. getting rid of newline?
    By KBriggs in forum C Programming
    Replies: 10
    Last Post: 05-07-2010, 12:30 PM
  3. getting rid of newline
    By AngKar in forum C Programming
    Replies: 24
    Last Post: 04-28-2006, 07:52 PM
  4. Binary files read newline (help)
    By Perimeter in forum C++ Programming
    Replies: 2
    Last Post: 02-13-2003, 08:08 AM
  5. I/O newline
    By Unlimited4s in forum C++ Programming
    Replies: 3
    Last Post: 08-03-2002, 11:45 AM