Thread: optimizing loop (frequency counting)... HELP

  1. #1
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37

    optimizing loop (frequency counting)... HELP

    okay. i am trying to make this loop run as quick as possible.

    i am trying to collect all occurances of each ASCII character and put into table. i don't really care if ASCII char value equals it's position in the table.

    Code:
    for (;;) {
    	c = fgetc(input_file);
    
    	/* If symbol is already in table then increase frequency
    	 * by one, otherwise put symbol into the table.
    	 */
    	for (i = 0; i < 256; i++) {
    		if(c == i) {
    			if (symbol[i].achar == c) {
    				symbol[i].count++;
    				break;
    			}
    			symbol[i].achar = c;
    		}
    	}
    
    	nc++;
    
    	if (feof(input_file))
    		break;
    }
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

  2. #2
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Why it's bad to use feof() to control a loop

    My suggestion:
    Code:
    char freq[256];
    /* Initalize the array so everything is 0 */
    int c;
    for (;;)
    {
      c = fgetc(input_file);
      if ( c == EOF )
        break;
      freq[c]++;
    }
    easist way IMO if you aren't worried about wasted space.

  3. #3
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37
    nope... no performance improvement from that. anything else? maybe hash tables, linked lists?
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

  4. #4
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    To get the most improvement you need to load as much of the file into memory as possible and then do it. Most of the loop's time is from reading the harddrive.

    Example:

    Code:
    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <unistd.h>
    
    void init (int *, int);
    
    int main (void)
    {
      int size_read, fid, count;
      char buffer[1048576];
      int freq[256];
    
    
      init(freq, sizeof freq/sizeof freq[0]);
    
      fid=open("test.txt", O_RDONLY);
      if ( fid == -1 )
      {
        puts("Could not open file");
        return 1;
      }
    
      while ( (size_read = read(fid, buffer, sizeof buffer)) != 0 )
      {
        for(count=0; count<size_read; count++)
          freq[(int)buffer[count]]++;
      }
    
      for (count=0; count<sizeof freq/sizeof freq[0]; count++)
        printf("ASCII %3d:  %3d\n", count, freq[count]);
      return 0;
    }
    
    void init (int *x, int y)
    {
      int count;
      for (count=0; count < y; count++)
        x[count] = 0;
    }
    Last edited by Thantos; 05-20-2004 at 10:30 PM.

  5. #5
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37
    okay... overnight i looked at the loop, and came up with this. running it with the old code on a 65MB file took 10 mintues (on a 233Mhz Cyrix).

    now, on the same machine i get 1.5 minutes with this:

    Code:
    for (;;) {
    	c = fgetc(input_file);
    
    	if (feof(input_file))
    		break;
    
    	if (symbol[c].count > 0) {
    		symbol[c].count++;
    		continue;
    	}
    	symbol[c].count++;
    
    	nc++;
    }
    haven't tried loading into fread(), but i think this is the best it's gunna get. but if anyone has any other suggestions... bring it on.
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well calling feof() every time is gonna hurt
    Especially since fgetc also returns EOF
    Code:
    int c;
    while ( (c = fgetc(input_file)) != EOF ) {
    	if (symbol[c].count > 0) {
    		symbol[c].count++;
    		continue;
    	}
    	symbol[c].count++;
    
    	nc++;
    }
    Using fread() with
    char buff[BUFSIZ];
    is likely to be a better bet
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    Slightly improved:
    Code:
    int c;
    while ( (c = fgetc(input_file)) != EOF ) {
    	if (symbol[c].count++ > 0) {
    		continue;
    	}
    	nc++;
    }
    Seeing as you increment whatever the result of the test, might as well only access that variable once.
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  8. #8
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37
    whoa! good stuff...
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

  9. #9
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Skip the nc++ as well, since that's known by the file size (unless it's a text file with DOS newlines).

    Other suggestions:
    - use your OS's "raw" file I/O API
    - read into a memory-page sized buffer at a time

    gg
    Last edited by Codeplug; 05-21-2004 at 02:51 PM.

  10. #10
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    >>Skip the nc++ as well, since that's known by the file size (unless it's a text file with DOS newlines).<<
    Based on the snippet last posted, nc looks like its counting the number of different characters in the file, not the total file size.
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  11. #11
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Allow me to continue my OS specific suggestions (in order to redeem myself)....

    Under Windows and using the Win32 API, you could open the file with FILE_FLAG_NO_BUFFERING and FILE_FLAG_OVERLAPPED. Use a buffer size that is some multiple of the hard disk's sector size, but less than or equal to the systems memory page size. While the OS is working on a read operation, crunch the data of the previous read operation.

    You can achieve similar results using asynchronous I/O under *nix (or a couple of threads, one to read and one to crunch).

    gg

  12. #12
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Using the method I posted on a 25 meg file it took about 6.5 seconds to read and print off the results.

  13. #13
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37
    i was omitting what "nc" was for...

    [CODE}
    while ( (c = fgetc(input_file)) != EOF ) {
    symbol[i].position = symbol[i].position + nc; /* this is what the nc is for */
    if (symbol[c].count++ > 0) {
    continue;
    }
    nc++;
    }
    [/CODE]
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

  14. #14
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37
    oops.

    Code:
    while ( (c = fgetc(input_file)) != EOF ) {
    	symbol[c].position = symbol[c].position + nc;
    	if (symbol[c].count++ > 0) {
    		continue;
    	}
    	nc++;
    }
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

  15. #15
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37
    if just counting frequency... i got it down to just this!!!

    Code:
    while ((c = fgetc(input_file)) != EOF ) {
    	symbol[c].count++;
    }
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. My loop within loop won't work
    By Ayreon in forum C Programming
    Replies: 3
    Last Post: 03-18-2009, 10:44 AM
  2. Visual Studio Express / Windows SDK?
    By cyberfish in forum C++ Programming
    Replies: 23
    Last Post: 01-22-2009, 02:13 AM
  3. syntax question
    By cyph1e in forum C Programming
    Replies: 19
    Last Post: 03-31-2006, 12:59 AM
  4. Trouble with a counting loop
    By RedZippo in forum C++ Programming
    Replies: 4
    Last Post: 03-21-2004, 03:12 PM
  5. How to change recursive loop to non recursive loop
    By ooosawaddee3 in forum C Programming
    Replies: 1
    Last Post: 06-24-2002, 08:15 AM