Thread: Frequencies of characters in a file

  1. #1
    Registered User
    Join Date
    Sep 2006
    Posts
    230

    Frequencies of characters in a file

    Hi. I just wrote this program that's supposed to read from a file, count the frequency of each character, sort them, then output any characters in a file along with their frequencies:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #define FILEPATH_MAX FILENAME_MAX*15
    
    int main(int argc, char * argv[])
    {
    	char filePath[FILEPATH_MAX];
    	FILE *file;
    	int i, k, maxchar;
    	char charcount[256] = {0};				/* array to know how many of a specific character have been seen */
    	char charcount2[256] = {0};				/* for sorting */
    	char charcount2_identifiers[256] = {0};			/* for sorting (identifiers for characters in charcount2 */
    
    	if (argc == 2)
    		strcpy(filePath, argv[1]);
    	else
    	{
    		fputs("Wrong arguments. Press enter to terminate...", stderr);
    		getchar();
    		exit(EXIT_FAILURE);
    	}
    
    	if ((file = fopen(filePath, "rt")) == '\0')
    	{
    		fputs("Error opening file. Press enter to terminate...", stderr);
    		getchar();
    		exit(EXIT_FAILURE);
    	}
    	printf("File opened successfully\n");
    
    	/* start counting */
    	while ((i=fgetc(file)) != EOF)
    		++charcount[i];
    	
    	fclose(file);
    
    	/* sort */
    	for (i=0; i<256; i++)
    	{
    		for (k=0, maxchar=0; k<256; k++)
    			if (charcount[k] > maxchar)
    			{
    				maxchar = charcount[k];
    				charcount[k] = 0;
    				charcount2_identifiers[i] = k;
    			}
    
    		charcount2[i] = maxchar;
    	}
    
    
    
    	/* output results */
    	for (i=0; i<256; i++)
    	{
    		if (charcount2[i] != 0)
    		{
    			if (charcount2_identifiers[i] >= 33 && charcount2_identifiers[i] <= 126)
    				printf("&#37;c: %d\n", charcount2_identifiers[i], charcount2[i]);				/* print the identifier */
    			else
    				printf("0x%x: %d\n", charcount2_identifiers[i], charcount2[i]);
    		}
    	}
    
    	printf("Finished. Press enter to continue...");
    	getchar();
    
    	return 0;
    }
    The problem is, not all the characters in the file are output.
    E.g. I tried the following text in a file:
    UJTUQJ FQQ FWTZSI YMJ BTWQI QNPJ YT XTQAJ UZEEQJX
    when I run the program it does not display the letter J or its frequencey.

    I think the problem is in the sorting. I'm pretty sure the output part is right.

    Thanks,
    Last edited by Abda92; 12-13-2007 at 05:33 AM.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I'm a bit confused about your sorting algorithm. Have you tried printing the values BEFORE you sort them [so you print in alphabetical order]?

    Edit: I see the problem. You are setting you charcount[i] = 0 whenever you find something that is greater than the current maxchar - but you don't know yet if this is actually the HIGHEST SO FAR. You need to just remember the position of the highest one, _THEN_ set it to zero at the same time as you set the charcount2[i] to maxchar.

    --
    Mats
    Last edited by matsp; 12-13-2007 at 05:39 AM.
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    yes. It worked fine. I'm sorry about the sorting algorithm, it's the first time I've every tried writing one. Here's the code without the sorting (outputs in alphabetical order):
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #define FILEPATH_MAX FILENAME_MAX*15
    
    int main(int argc, char * argv[])
    {
    	char filePath[FILEPATH_MAX];
    	FILE *file;
    	int i, k, maxchar;
    	char charcount[256] = {0};				/* array to know how many of a specific character have been seen */
    	char charcount2[256] = {0};				/* for sorting */
    	char charcount2_identifiers[256] = {0};			/* for sorting (identifiers for characters in charcount2 */
    
    	if (argc == 2)
    		strcpy(filePath, argv[1]);
    	else
    	{
    		fputs("Wrong arguments. Press enter to terminate...", stderr);
    		getchar();
    		exit(EXIT_FAILURE);
    	}
    
    	if ((file = fopen(filePath, "rt")) == '\0')
    	{
    		fputs("Error opening file. Press enter to terminate...", stderr);
    		getchar();
    		exit(EXIT_FAILURE);
    	}
    	printf("File opened successfully\n");
    
    	/* start counting */
    	while ((i=fgetc(file)) != EOF)
    		++charcount[i];
    	
    	fclose(file);
    
    
    	/* output results */
    	for (i=0; i<256; i++)
    	{
    		if (charcount[i] != 0)
    		{
    			if (i >= 33 && i <= 126)
    				printf("&#37;c: %d\n", i, charcount[i]);				/* print the identifier */
    			else
    				printf("0x%x: %d\n", i, charcount[i]);
    		}
    	}
    
    	printf("Finished. Press enter to continue...");
    	getchar();
    
    	return 0;
    }

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Did you see my edit?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    Oh sorry, I just saw it. I'll try implementing it now (shouldn't take any time).
    but do you have any better sorting algorithms that I can use (just a simple one)? or does mine seem OK?

    EDIT: I just implemented it now. but I don't like the look of this statement:
    Code:
    /* sort */
    	for (i=0; i<256; i++)
    	{
    		for (k=0, maxchar=0; k<256; k++)
    			if (charcount[k] > maxchar)
    			{
    				maxchar = charcount[k];
    				charcount2_identifiers[i] = k;
    			}
    		
    		charcount[charcount2_identifiers[i]] = 0;
    		charcount2[i] = maxchar;
    	}
    Is it illegal in C? because it works completely fine.
    Last edited by Abda92; 12-13-2007 at 05:58 AM.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    That's perfectly legal, if a bit complex.

    I would probably just store maxchar's "k" in a temp variable, and set both charcount2_identifier and the charcount using that temp variable.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Are you allowed to use C++? Moreso the std library?

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    Well, I'm allowed to but I don't know how (and I'd rather not).
    I got it working fine now with the following code:
    Code:
    /* sort */
    	for (i=0; i<256; i++)
    	{
    		for (k=0, maxchar=0; k<256; k++)
    			if (charcount[k] > maxchar)
    			{
    				maxchar = charcount[k];
    				maxchar_identifier = k;
    			}
    		
    		
    		charcount2_identifiers[i] = maxchar_identifier;
    		charcount[maxchar_identifier] = 0;
    		charcount2[i] = maxchar;
    	}

  9. #9
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Ok that's fine. std::map has a very nice feature for this but I'm glad you got it working.

  10. #10
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    This is the C forum, and as I don't see anything C++-ish about that code -- it looks like plain, vanilla C -- you probably shouldn't be recommending std::map.

    Code:
    if ((file = fopen(filePath, "rt")) == '\0')
    '\0' is usually used as a character. When you're dealing with pointers, which is what fopen() returns, it's standard practise to use NULL instead -- or just plain 0. Or even
    Code:
    if (!(file = fopen(filePath, "rt")))
    Code:
    			if (i >= 33 && i <= 126)
    				printf("&#37;c: %d\n", i, charcount[i]);
    Consider isprint() from <ctype.h>. It's more portable and easier on the eyes, too.

    Instead of using the "magic number" 256 everywhere, you could use UCHAR_MAX from <limits.h>. Or not; 256 is pretty common.

    Rather than declare a large array with FILENAME_MAX*15 elements, which still might not be enough, you could just declare filePath as a pointer and have it point to argv[1]. That wouldn't work if you wanted to add support for fgets()-ing the filename from the user at some point in the future, however.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  11. #11
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404
    Indeed, UCHAR_MAX is defined as 255 and not 256.

    I know you knew it DWKS (or maybe was it a small moment of inattention), but i just wanted to point out that if he puts UCHAR_MAX instead of 256, he'll have to use the "<=" operator instead of the "<".

  12. #12
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Of course, I forgot about that. Maybe something like this, then:
    Code:
    #define CHARS (UCHAR_MAX + 1)
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File Writing Problem
    By polskash in forum C Programming
    Replies: 3
    Last Post: 02-13-2009, 10:47 AM
  2. C++ std routines
    By siavoshkc in forum C++ Programming
    Replies: 33
    Last Post: 07-28-2006, 12:13 AM
  3. Reading a file with Courier New characters
    By Noam in forum C Programming
    Replies: 3
    Last Post: 07-07-2006, 09:29 AM
  4. Game Pointer Trouble?
    By Drahcir in forum C Programming
    Replies: 8
    Last Post: 02-04-2006, 02:53 AM
  5. Replies: 3
    Last Post: 03-04-2005, 02:46 PM