Hello all, I'm on Exercise 1-13 of K&R and am really trying to focus on learning how to come up with a solution that scales well and is easily modifiable. This exercise asks me to implement a program that will read a file/text stream and print out a histogram with the number of occurrences of each length of word up to a certain maximum. I had my choice of either doing a horizontal or vertical histogram, but I decided to go with the prettier, but more difficult vertical option. I'm mainly seeking advice on what to do better by someone more experienced than me.
Anyway, here's my code:
Code:
/* Corresponding K&R section: 1.6 */
/* Prints vertical histogram of lengths of words in input*/
#include <stdio.h>
/* NOTE: given enough horizontal space, can scale to any two-digit maximum by
modifying this alone... gets messy with three digits, but I could
make the numbers on the x-axis display vertically down to make it infinitely
scalable. */
#define MIN_WORD_LENGTH 1
#define MAX_WORD_LENGTH 20
int main(void)
{
//Holds characters
int c = 0;
//current number of contiguous non-whitespace chars
int currentLength = 0;
//holds number of occurrences of each length
int wordLengthFrequencies[MAX_WORD_LENGTH] = {0};
//highest number of occurrences encountered
int maxFrequency = 0;
/* ---------------- Collect length data --------------- */
while((c = getchar()) != EOF)
{
//Are we currently inside a word?
if(currentLength >= MIN_WORD_LENGTH)
{
//Are we encountering whitespace?
if(c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\v')
{
/* We've reached the end of a word. Update array */
wordLengthFrequencies[currentLength - 1]++;
currentLength = 0; //Reset
}
/* No whitespace, so we're still inside a
word. Update currentLength if it hasn't
maxed out*/
else if(currentLength < MAX_WORD_LENGTH)
currentLength++;
}
/* Have we just encountered the start of a word? */
else if(c != ' ' && c != '\t' && c != '\n' && c != '\r' && c != '\v')
currentLength = 1;
}
/* --------------- Print Histogram ------------- */
printf("\n\n");
int i = 0;
/* Determine maximum frequency */
for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
{
if(wordLengthFrequencies[i - 1] > maxFrequency)
maxFrequency = wordLengthFrequencies[i- 1];
}
/* Start printing the graph starting from the maximum frequency */
for(c = maxFrequency; c >= 1; c--)
{
/* Make sure graph will still be aligned even for 7-digit frequencies
i.e. if we are operating on a large file */
printf("%7d | ", c);
for(i = 0; i < MAX_WORD_LENGTH; i++)
{
if(wordLengthFrequencies[i] >= c)
printf("* "); //fill in where appropriate
else
printf(" ");
}
printf("\n");
}
/* print the x-axis */
putchar('\t');
for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
printf("---");
printf("\n\t");
for(i = MIN_WORD_LENGTH; i <= MAX_WORD_LENGTH; i++)
printf("%3d", i);
putchar('+');
printf("\n\n");
return 0;
}
UPDATE: modified code to allow safe modification of MIN_WORD_LENGTH
UPDATE2: removed frequent and unnecessary comparisons involving maxFrequency... it is now calculated before printing the histogram
UPDATE3: added a more complete list of escape sequences to whitespace checks