Thread: Another K&R Exercise... Looking for a critique

  1. #1
    Registered User edw211's Avatar
    Join Date
    Jun 2011
    Location
    Wilkes-Barre, PA
    Posts
    22

    Another K&R Exercise... Looking for a critique

    Hello all, I'm on Exercise 1-13 of K&R and am really trying to focus on learning how to come up with a solution that scales well and is easily modifiable. This exercise asks me to implement a program that will read a file/text stream and print out a histogram with the number of occurrences of each length of word up to a certain maximum. I had my choice of either doing a horizontal or vertical histogram, but I decided to go with the prettier, but more difficult vertical option. I'm mainly seeking advice on what to do better by someone more experienced than me.

    Anyway, here's my code:

    Code:
    /* Corresponding K&R section: 1.6 */
    
    /* Prints vertical histogram of lengths of words in input*/
    
    #include <stdio.h>
    
    /* NOTE: given enough horizontal space, can scale to any two-digit maximum by
       modifying this alone... gets messy with three digits, but I could 
       make the numbers on the x-axis display vertically down to make it infinitely
       scalable. */
    
    #define MIN_WORD_LENGTH 1
    #define MAX_WORD_LENGTH 20 	 
    
    int main(void)
    {
    	//Holds characters
    	int c = 0;			
    
    	//current number of contiguous non-whitespace chars						
    	int currentLength = 0;			
    
    	//holds number of occurrences of each length				
    	int wordLengthFrequencies[MAX_WORD_LENGTH] = {0};
    
    	//highest number of occurrences encountered
    	int maxFrequency = 0;
    
    	/* ---------------- Collect length data --------------- */
    	
    	while((c = getchar()) != EOF)
    	{
    		//Are we currently inside a word?
    		if(currentLength >= MIN_WORD_LENGTH) 
    		{
    			//Are we encountering whitespace?
    			if(c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\v') 
    			{
    				/* We've reached the end of a word. Update array */
    				wordLengthFrequencies[currentLength - 1]++; 				
    				currentLength = 0; //Reset
    			}
    			
    			/* No whitespace, so we're still inside a 
    			   word. Update currentLength if it hasn't 
    			   maxed out*/
    			else if(currentLength < MAX_WORD_LENGTH)
    				currentLength++;
    		}
    		
    		/* Have we just encountered the start of a word? */
    		else if(c != ' ' && c != '\t' && c != '\n' && c != '\r' && c != '\v') 
    			currentLength = 1;			
    	}
    
    	/* --------------- Print Histogram ------------- */
    	
    	printf("\n\n");
    
    	int i = 0;
    
    	/* Determine maximum frequency */
    	for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
    	{
    		if(wordLengthFrequencies[i - 1] > maxFrequency)
    			maxFrequency = wordLengthFrequencies[i- 1];
    	}
    
    
    	/* Start printing the graph starting from the maximum frequency */
    	for(c = maxFrequency; c >= 1; c--)
    	{
    
    		/* Make sure graph will still be aligned even for 7-digit frequencies
    			i.e. if we are operating on a large file */
    		printf("%7d | ", c);	
    
    		for(i = 0; i < MAX_WORD_LENGTH; i++)
    		{
    			if(wordLengthFrequencies[i] >= c)
    				printf("*  ");	//fill in where appropriate
    			else 
    				printf("   ");
    		}
    		printf("\n");
    	}
    	
    	/* print the x-axis */
    	putchar('\t');
    	for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
    		printf("---");
    	
    	printf("\n\t");
    	for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
    		printf("%3d", i);
    	putchar('+');
    
    	printf("\n\n");
    	return 0;
    }
    UPDATE: modified code to allow safe modification of MIN_WORD_LENGTH

    UPDATE2: removed frequent and unnecessary comparisons involving maxFrequency... it is now calculated before printing the histogram

    UPDATE3: added a more complete list of escape sequences to whitespace checks
    Last edited by edw211; 06-08-2011 at 07:15 PM.

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    • You can zero out an array at declaration, by putting
      Code:
      ={0}
      as an initializer, rather than doing it later in a loop.
    • You should probably find the max height of your histogram at the end, rather than always checking/updating the max -- just loop through your bins and find the largest.
    • I don't know the text of the exercise, so I have no idea if you are treating punctuation with the respect it deserves (i.e., ignoring it).
    • I also don't know when you get to <ctype.h>, but using isalpha(), or maybe just isspace(), would be a wonderful thing. If you don't/aren't supposed to know ctype, then I think you're still missing a whitespace character (which is \v if I'm not mistaken).

  3. #3
    Registered User edw211's Avatar
    Join Date
    Jun 2011
    Location
    Wilkes-Barre, PA
    Posts
    22
    Quote Originally Posted by tabstop View Post
    • You can zero out an array at declaration, by putting
      Code:
      ={0}
      as an initializer, rather than doing it later in a loop.
    • You should probably find the max height of your histogram at the end, rather than always checking/updating the max -- just loop through your bins and find the largest.
    • I don't know the text of the exercise, so I have no idea if you are treating punctuation with the respect it deserves (i.e., ignoring it).
    • I also don't know when you get to <ctype.h>, but using isalpha(), or maybe just isspace(), would be a wonderful thing. If you don't/aren't supposed to know ctype, then I think you're still missing a whitespace character (which is \v if I'm not mistaken).
    Didn't know I could initialize the array that way, thanks. I haven't gotten up to <ctype.h> yet. I implemented this using only what I learned in the stuff I've read so far. You're right, I should just check my max height at the end. I just realized I'm doing that check unnecessarily for every word it processes. Could you also elaborate on what you mean by punctuation? You mean my placement of braces and spacing? I'm still trying to find a good style to stick with =/ Admittedly, I probably am a bit inconsistent as it stands now.

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    As your program stands right now, it counts commas, apostrophes, and periods as part of the word (so the word "now" earlier in this sentence would have four characters). If that's what you want, then great.

  5. #5
    Registered User edw211's Avatar
    Join Date
    Jun 2011
    Location
    Wilkes-Barre, PA
    Posts
    22
    Ohhhh okay, I gotcha. Yeah, K&R, for this exercise anyway, defines a word loosely as a contiguous group of non-whitespace characters. Not sure what \v is though, will have to look that up or wait till CH 2 when they introduce all the escape sequences. Thanks again.

  6. #6
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by edw211 View Post
    Ohhhh okay, I gotcha. Yeah, K&R, for this exercise anyway, defines a word loosely as a contiguous group of non-whitespace characters. Not sure what \v is though, will have to look that up or wait till CH 2 when they introduce all the escape sequences. Thanks again.
    '\v' is another ascii whitespace character. It's the vertical tab. It's not used much these days, but for completeness you should include it in your check.

    Same with '\r', which is used.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  7. #7
    Registered User edw211's Avatar
    Join Date
    Jun 2011
    Location
    Wilkes-Barre, PA
    Posts
    22
    Quote Originally Posted by King Mir View Post
    Same with '\r', which is used.
    If I'm not mistaken, '\r' returns the active position to the beginning of the current line. Just curious, what kind of file or situation would a '\r' crop up in as far as this program is concerned? I think what I'm really trying to ask is how a '\r' would get into a file or text stream to be processed. Hopefully that question makes sense.

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Windows writes newlines to a file as two characters (^J ^M), but of course reads it back as just \n. But, if such a file were to somehow end up on another system (Mac, Linux) the ^J will be interpreted as \r and the ^M as \n.

  9. #9
    Registered User edw211's Avatar
    Join Date
    Jun 2011
    Location
    Wilkes-Barre, PA
    Posts
    22
    Ohhh, so that's probably why I often see CR and LF in writing grouped together as CRLF, then? So CRLF means \r\n located in a file written in Windows that got onto a different OS, where just an \n would normally suffice?

    Is that also why in a lot of example code I see, both ASCII 10 and 13 are checked for when dealing with new lines? To make it work across multiple platforms with different newline mechanisms?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Critique my StringTokenizer please...
    By maxhavoc in forum C++ Programming
    Replies: 10
    Last Post: 11-24-2004, 09:08 PM
  2. Critique my program
    By Jason Spaceman in forum C Programming
    Replies: 6
    Last Post: 10-26-2004, 05:38 PM
  3. Critique my code?
    By bigbadbowlindud in forum C++ Programming
    Replies: 20
    Last Post: 06-24-2004, 12:54 PM
  4. C 101 critique, please?
    By adobephile in forum C Programming
    Replies: 13
    Last Post: 01-01-2003, 07:05 PM
  5. critique me please
    By ober in forum A Brief History of Cprogramming.com
    Replies: 13
    Last Post: 10-02-2002, 01:59 PM