Another K&R Exercise... Looking for a critique

**edw211** · 06-08-2011

Hello all, I'm on Exercise 1-13 of K&R and am really trying to focus on learning how to come up with a solution that scales well and is easily modifiable. This exercise asks me to implement a program that will read a file/text stream and print out a histogram with the number of occurrences of each length of word up to a certain maximum. I had my choice of either doing a horizontal or vertical histogram, but I decided to go with the prettier, but more difficult vertical option. I'm mainly seeking advice on what to do better by someone more experienced than me.

Anyway, here's my code:

Code:

/* Corresponding K&R section: 1.6 */

/* Prints vertical histogram of lengths of words in input*/

#include <stdio.h>

/* NOTE: given enough horizontal space, can scale to any two-digit maximum by
   modifying this alone... gets messy with three digits, but I could 
   make the numbers on the x-axis display vertically down to make it infinitely
   scalable. */

#define MIN_WORD_LENGTH 1
#define MAX_WORD_LENGTH 20 	 

int main(void)
{
	//Holds characters
	int c = 0;			

	//current number of contiguous non-whitespace chars						
	int currentLength = 0;			

	//holds number of occurrences of each length				
	int wordLengthFrequencies[MAX_WORD_LENGTH] = {0};

	//highest number of occurrences encountered
	int maxFrequency = 0;

	/* ---------------- Collect length data --------------- */
	
	while((c = getchar()) != EOF)
	{
		//Are we currently inside a word?
		if(currentLength >= MIN_WORD_LENGTH) 
		{
			//Are we encountering whitespace?
			if(c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\v') 
			{
				/* We've reached the end of a word. Update array */
				wordLengthFrequencies[currentLength - 1]++; 				
				currentLength = 0; //Reset
			}
			
			/* No whitespace, so we're still inside a 
			   word. Update currentLength if it hasn't 
			   maxed out*/
			else if(currentLength < MAX_WORD_LENGTH)
				currentLength++;
		}
		
		/* Have we just encountered the start of a word? */
		else if(c != ' ' && c != '\t' && c != '\n' && c != '\r' && c != '\v') 
			currentLength = 1;			
	}

	/* --------------- Print Histogram ------------- */
	
	printf("\n\n");

	int i = 0;

	/* Determine maximum frequency */
	for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
	{
		if(wordLengthFrequencies[i - 1] > maxFrequency)
			maxFrequency = wordLengthFrequencies[i- 1];
	}


	/* Start printing the graph starting from the maximum frequency */
	for(c = maxFrequency; c >= 1; c--)
	{

		/* Make sure graph will still be aligned even for 7-digit frequencies
			i.e. if we are operating on a large file */
		printf("%7d | ", c);	

		for(i = 0; i < MAX_WORD_LENGTH; i++)
		{
			if(wordLengthFrequencies[i] >= c)
				printf("*  ");	//fill in where appropriate
			else 
				printf("   ");
		}
		printf("\n");
	}
	
	/* print the x-axis */
	putchar('\t');
	for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
		printf("---");
	
	printf("\n\t");
	for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
		printf("%3d", i);
	putchar('+');

	printf("\n\n");
	return 0;
}

UPDATE: modified code to allow safe modification of MIN_WORD_LENGTH

UPDATE2: removed frequent and unnecessary comparisons involving maxFrequency... it is now calculated before printing the histogram

UPDATE3: added a more complete list of escape sequences to whitespace checks

**tabstop** · 06-08-2011

You can zero out an array at declaration, by putting
Code:
```
={0}
```
as an initializer, rather than doing it later in a loop.
You should probably find the max height of your histogram at the end, rather than always checking/updating the max -- just loop through your bins and find the largest.
I don't know the text of the exercise, so I have no idea if you are treating punctuation with the respect it deserves (i.e., ignoring it).
I also don't know when you get to <ctype.h>, but using isalpha(), or maybe just isspace(), would be a wonderful thing. If you don't/aren't supposed to know ctype, then I think you're still missing a whitespace character (which is \v if I'm not mistaken).

**edw211** · 06-08-2011

Originally Posted by tabstop

You can zero out an array at declaration, by putting
Code:
```
={0}
```
as an initializer, rather than doing it later in a loop.
You should probably find the max height of your histogram at the end, rather than always checking/updating the max -- just loop through your bins and find the largest.
I don't know the text of the exercise, so I have no idea if you are treating punctuation with the respect it deserves (i.e., ignoring it).
I also don't know when you get to <ctype.h>, but using isalpha(), or maybe just isspace(), would be a wonderful thing. If you don't/aren't supposed to know ctype, then I think you're still missing a whitespace character (which is \v if I'm not mistaken).

Didn't know I could initialize the array that way, thanks. I haven't gotten up to <ctype.h> yet. I implemented this using only what I learned in the stuff I've read so far. You're right, I should just check my max height at the end. I just realized I'm doing that check unnecessarily for every word it processes. Could you also elaborate on what you mean by punctuation? You mean my placement of braces and spacing? I'm still trying to find a good style to stick with =/ Admittedly, I probably am a bit inconsistent as it stands now.

**tabstop** · 06-08-2011

As your program stands right now, it counts commas, apostrophes, and periods as part of the word (so the word "now" earlier in this sentence would have four characters). If that's what you want, then great.

**edw211** · 06-08-2011

Ohhhh okay, I gotcha. Yeah, K&R, for this exercise anyway, defines a word loosely as a contiguous group of non-whitespace characters. Not sure what \v is though, will have to look that up or wait till CH 2 when they introduce all the escape sequences. Thanks again.

**King Mir** · 06-08-2011

Originally Posted by edw211

Ohhhh okay, I gotcha. Yeah, K&R, for this exercise anyway, defines a word loosely as a contiguous group of non-whitespace characters. Not sure what \v is though, will have to look that up or wait till CH 2 when they introduce all the escape sequences. Thanks again.

'\v' is another ascii whitespace character. It's the vertical tab. It's not used much these days, but for completeness you should include it in your check.

Same with '\r', which is used.

**edw211** · 06-08-2011

Originally Posted by King Mir

Same with '\r', which is used.

If I'm not mistaken, '\r' returns the active position to the beginning of the current line. Just curious, what kind of file or situation would a '\r' crop up in as far as this program is concerned? I think what I'm really trying to ask is how a '\r' would get into a file or text stream to be processed. Hopefully that question makes sense.

**tabstop** · 06-08-2011

Windows writes newlines to a file as two characters (^J ^M), but of course reads it back as just \n. But, if such a file were to somehow end up on another system (Mac, Linux) the ^J will be interpreted as \r and the ^M as \n.

**edw211** · 06-08-2011

Ohhh, so that's probably why I often see CR and LF in writing grouped together as CRLF, then? So CRLF means \r\n located in a file written in Windows that got onto a different OS, where just an \n would normally suffice?

Is that also why in a lot of example code I see, both ASCII 10 and 13 are checked for when dealing with new lines? To make it work across multiple platforms with different newline mechanisms?

Thread: Another K&R Exercise... Looking for a critique

Thread Tools

Search Thread

Display

Another K&R Exercise... Looking for a critique

Similar Threads

Critique my StringTokenizer please...

Critique my program

Critique my code?

C 101 critique, please?

critique me please