# Thread: Another K&R Exercise... Looking for a critique

1. ## Another K&R Exercise... Looking for a critique

Hello all, I'm on Exercise 1-13 of K&R and am really trying to focus on learning how to come up with a solution that scales well and is easily modifiable. This exercise asks me to implement a program that will read a file/text stream and print out a histogram with the number of occurrences of each length of word up to a certain maximum. I had my choice of either doing a horizontal or vertical histogram, but I decided to go with the prettier, but more difficult vertical option. I'm mainly seeking advice on what to do better by someone more experienced than me.

Anyway, here's my code:

Code:
```/* Corresponding K&R section: 1.6 */

/* Prints vertical histogram of lengths of words in input*/

#include <stdio.h>

/* NOTE: given enough horizontal space, can scale to any two-digit maximum by
modifying this alone... gets messy with three digits, but I could
make the numbers on the x-axis display vertically down to make it infinitely
scalable. */

#define MIN_WORD_LENGTH 1
#define MAX_WORD_LENGTH 20

int main(void)
{
//Holds characters
int c = 0;

//current number of contiguous non-whitespace chars
int currentLength = 0;

//holds number of occurrences of each length
int wordLengthFrequencies[MAX_WORD_LENGTH] = {0};

//highest number of occurrences encountered
int maxFrequency = 0;

/* ---------------- Collect length data --------------- */

while((c = getchar()) != EOF)
{
//Are we currently inside a word?
if(currentLength >= MIN_WORD_LENGTH)
{
//Are we encountering whitespace?
if(c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\v')
{
/* We've reached the end of a word. Update array */
wordLengthFrequencies[currentLength - 1]++;
currentLength = 0; //Reset
}

/* No whitespace, so we're still inside a
word. Update currentLength if it hasn't
maxed out*/
else if(currentLength < MAX_WORD_LENGTH)
currentLength++;
}

/* Have we just encountered the start of a word? */
else if(c != ' ' && c != '\t' && c != '\n' && c != '\r' && c != '\v')
currentLength = 1;
}

/* --------------- Print Histogram ------------- */

printf("\n\n");

int i = 0;

/* Determine maximum frequency */
for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
{
if(wordLengthFrequencies[i - 1] > maxFrequency)
maxFrequency = wordLengthFrequencies[i- 1];
}

/* Start printing the graph starting from the maximum frequency */
for(c = maxFrequency; c >= 1; c--)
{

/* Make sure graph will still be aligned even for 7-digit frequencies
i.e. if we are operating on a large file */
printf("%7d | ", c);

for(i = 0; i < MAX_WORD_LENGTH; i++)
{
if(wordLengthFrequencies[i] >= c)
printf("*  ");	//fill in where appropriate
else
printf("   ");
}
printf("\n");
}

/* print the x-axis */
putchar('\t');
for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
printf("---");

printf("\n\t");
for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
printf("%3d", i);
putchar('+');

printf("\n\n");
return 0;
}```
UPDATE: modified code to allow safe modification of MIN_WORD_LENGTH

UPDATE2: removed frequent and unnecessary comparisons involving maxFrequency... it is now calculated before printing the histogram

UPDATE3: added a more complete list of escape sequences to whitespace checks

• You can zero out an array at declaration, by putting
Code:
`={0}`
as an initializer, rather than doing it later in a loop.
• You should probably find the max height of your histogram at the end, rather than always checking/updating the max -- just loop through your bins and find the largest.
• I don't know the text of the exercise, so I have no idea if you are treating punctuation with the respect it deserves (i.e., ignoring it).
• I also don't know when you get to <ctype.h>, but using isalpha(), or maybe just isspace(), would be a wonderful thing. If you don't/aren't supposed to know ctype, then I think you're still missing a whitespace character (which is \v if I'm not mistaken).

2. Originally Posted by tabstop
• You can zero out an array at declaration, by putting
Code:
`={0}`
as an initializer, rather than doing it later in a loop.
• You should probably find the max height of your histogram at the end, rather than always checking/updating the max -- just loop through your bins and find the largest.
• I don't know the text of the exercise, so I have no idea if you are treating punctuation with the respect it deserves (i.e., ignoring it).
• I also don't know when you get to <ctype.h>, but using isalpha(), or maybe just isspace(), would be a wonderful thing. If you don't/aren't supposed to know ctype, then I think you're still missing a whitespace character (which is \v if I'm not mistaken).
Didn't know I could initialize the array that way, thanks. I haven't gotten up to <ctype.h> yet. I implemented this using only what I learned in the stuff I've read so far. You're right, I should just check my max height at the end. I just realized I'm doing that check unnecessarily for every word it processes. Could you also elaborate on what you mean by punctuation? You mean my placement of braces and spacing? I'm still trying to find a good style to stick with =/ Admittedly, I probably am a bit inconsistent as it stands now.

3. As your program stands right now, it counts commas, apostrophes, and periods as part of the word (so the word "now" earlier in this sentence would have four characters). If that's what you want, then great.

4. Ohhhh okay, I gotcha. Yeah, K&R, for this exercise anyway, defines a word loosely as a contiguous group of non-whitespace characters. Not sure what \v is though, will have to look that up or wait till CH 2 when they introduce all the escape sequences. Thanks again.

5. Originally Posted by edw211
Ohhhh okay, I gotcha. Yeah, K&R, for this exercise anyway, defines a word loosely as a contiguous group of non-whitespace characters. Not sure what \v is though, will have to look that up or wait till CH 2 when they introduce all the escape sequences. Thanks again.
'\v' is another ascii whitespace character. It's the vertical tab. It's not used much these days, but for completeness you should include it in your check.

Same with '\r', which is used.

6. Originally Posted by King Mir
Same with '\r', which is used.
If I'm not mistaken, '\r' returns the active position to the beginning of the current line. Just curious, what kind of file or situation would a '\r' crop up in as far as this program is concerned? I think what I'm really trying to ask is how a '\r' would get into a file or text stream to be processed. Hopefully that question makes sense.

7. Windows writes newlines to a file as two characters (^J ^M), but of course reads it back as just \n. But, if such a file were to somehow end up on another system (Mac, Linux) the ^J will be interpreted as \r and the ^M as \n.

8. Ohhh, so that's probably why I often see CR and LF in writing grouped together as CRLF, then? So CRLF means \r\n located in a file written in Windows that got onto a different OS, where just an \n would normally suffice?

Is that also why in a lot of example code I see, both ASCII 10 and 13 are checked for when dealing with new lines? To make it work across multiple platforms with different newline mechanisms?

Popular pages Recent additions