Thread: c program to count words

    Apr 2011

    c program to count words

    I'm trying to create a program that will read a text file and list each unique word and list how many times they appear.

    I believe that I need to first sort the words and then count it but I'm not sure how to do this.

    This is what I have so far

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #define MAXWORD 100
    #define MAXLINE 256
    int main() {
    	FILE *infile;
    	char *pos;
    	char buffer[MAXLINE];
    	char string[MAXLINE];
    	char *words[MAXWORD];
    	int counter=0;
    	int i;
    	buffer[0] = '\0';
    	infile = fopen("C:\\Users\\Owner\\Documents\\c\\filetocount.txt", "rt");
    	if (infile == NULL) {
    		printf("Cannot open file\n");
    	} else {
    		while ((pos=fgets(buffer, MAXLINE, infile)) != NULL) {
    			while ((sscanf(pos, "%s", string) > 0) && counter < MAXWORD) {
    				words[counter] = (char *) malloc(strlen(string)+1);
    				if (words[counter] == NULL) {
    				} else {
    					strcpy(words[counter], string);
    				pos += strlen(string)+1;
    	for (i=0; i<counter; i++)
    		printf("%s\n",  words[i]);
    	for (i=0; i<counter; i++)
    	return 0;

    sorting might make the search more efficient, but you don't need to sort the words you have, just to see if you've read the same words already.

    A simple linear search will suffice for now.

    That's a good effort for a first attempt.
    I'm guessing you aren't a beginning C student, as you're allocating and freeing memory in this program. Have you learned to use structures yet? You can use a structure that contains the word and its count. After reading each word, you can do a linear search to locate it in the array. If it exists already in the array, you can increment its count. If it doesn't exist, you can add the word to the array and set its count to 1.

    Sorting the list before printing it can be easily enough. To compare the strings, you'll want to use strcmp() from <string.h>.



    May 2010
    Use hash table or binary tree.

