Thread: optimizing loop (frequency counting)... HELP

  1. #16
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Ok its a short loop but is it really that much faster?

  2. #17
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37
    1 second gain
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

  3. #18
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Now try do it by reading a chunk into memory and see if that is any faster.

  4. #19
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37
    Thantos:

    okay... but my C skills are a little rusty. i am not getting your code to work. sure, it compiles, but seg faults on the fread().

    here's my current code, and the fread() is working, but i can't process files over 10k without segfaults. should i be using malloc()?

    Code:
    #include <errno.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>
    
    struct data {
    	int count;
    	int position;
    };
    
    struct data symbol[255];
    
    /*
     * init_symbol_array(): Kinda obvious, but zero out array
     */
    void
    init_symbol_array(void)
    {
    	int i;
    
    	for (i = 0; i < 256; i++) {
    		symbol[i].count = 0;
    		symbol[i].position = 0;
    	}
    }
    
    /*
     * my_fopen(): Custom fopen routine
     */
    FILE
    *my_fopen(char *file, char *type)
    {
    	FILE *fp;
    
    	if((fp = fopen(file, type)) == NULL) {
    		(void)fprintf(stderr, "\nerror: %s: Can't open '%s'\n",
    		    strerror(ENOENT), file);
    		exit(EXIT_FAILURE);
    	}
    
    	return fp;
    }
    
    /*
     * symbol_stats(): Get frequency count and position of each character
     */
    void
    symbol_stats(char *file)
    {
    	FILE *i_file;
    	clock_t start_cpu, now_cpu;	/* timing var */
    	char buffer[80000];
    	int c, i, nc;
    
    	i_file = my_fopen(file, "rb");
    
    	printf("\nGathering statistics... ");
    
    	nc = 1;
    	start_cpu = clock();
    
    	while((c = fread(buffer, 1, sizeof(buffer), i_file)) > 0) {
    		for (i = 0; i < c; i++) {
    			symbol[(int)buffer[i]].count++;
    			symbol[(int)buffer[i]].position = symbol[(int)buffer[i]].position + nc;
    			nc++;
    		}
    	}
    
    	now_cpu = clock();
    	printf("%lf seconds\n",(now_cpu-start_cpu)/(double)CLOCKS_PER_SEC);
    
    	fclose(i_file);
    }
    
    /*
     * Print symbol stats to output file
     */
    void
    print_symbol_stats(char *file)
    {
    	int i;
    	FILE *o_file;
    
    	o_file = my_fopen(file, "wb");
    
    	for (i = 0; i < 256; i++) {
    		if (symbol[i].count > 0) {
    			fprintf(o_file, "%c\t%d\t%d\n", i,
    			    symbol[i].count, symbol[i].position);
    		}
    	}
    
    	fclose(o_file);
    }
    
    /*
     * The main function; where it all happens.
     */
    int
    main(int argc, char **argv)
    {
    	if (argv[2] == NULL) {
    		argv[2] = "compressed.txt";
    	}
    
    	init_symbol_array();
    	symbol_stats(argv[1]);
    	print_symbol_stats(argv[2]);
    
    	return 0;
    }
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

  5. #20
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Ok I ran into a similar problem when using your code. It was fun trying to figure it out but I got it. The seg fault wasn't on the fread but in the array access.
    change
    Code:
    [(int)buffer[i]]
    to
    Code:
    [(unsigned int)( (unsigned char)buffer[i])
    What was happening was it was getting a negative signed character value and converting it to a negative integer value which was outside of your array. Just casting it to (unsigned int) didn't fix it because it changed it to a value too large which was also outside of the array. In reality you shouldn't need the (unsigned int) cast because it should be auto promoted but I put it in there just to silence a compiler warning.

  6. #21
    & the hat of GPL slaying Thantos's Avatar
    Join Date
    Sep 2001
    Posts
    5,681
    Just realized an even more intelligent way.

    Change
    Code:
    char buffer[80000];
    to
    Code:
    unsigned char buffer[80000];
    and leave the rest be.

  7. #22
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > struct data symbol[255];
    Your loops are < 256
    So this array is 1 element short
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #23
    Registered User
    Join Date
    Sep 2001
    Location
    pacific northwest
    Posts
    37
    okay... using fread() has really made an improvement over the final fgetc() method. a 60MB file now takes only 30 seconds instead of 90. not bad. thanks Thantos... and everyone else.
    "No, I am not wise, but I am a lover of wisdom." --Pythagoras

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. My loop within loop won't work
    By Ayreon in forum C Programming
    Replies: 3
    Last Post: 03-18-2009, 10:44 AM
  2. Visual Studio Express / Windows SDK?
    By cyberfish in forum C++ Programming
    Replies: 23
    Last Post: 01-22-2009, 02:13 AM
  3. syntax question
    By cyph1e in forum C Programming
    Replies: 19
    Last Post: 03-31-2006, 12:59 AM
  4. Trouble with a counting loop
    By RedZippo in forum C++ Programming
    Replies: 4
    Last Post: 03-21-2004, 03:12 PM
  5. How to change recursive loop to non recursive loop
    By ooosawaddee3 in forum C Programming
    Replies: 1
    Last Post: 06-24-2002, 08:15 AM