It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
There's not a possibility of running into overflow -- there's essentially a certainty of it (assuming ASCII text and say four characters). You'll have to store the sum elsewhere. (EDIT: I'm assuming you're intending storing the sum in an unsigned char, for whatever reason. If you're thinking of an int, then no worries.)
If an unsigned integer is used to accumulate the total sum of all bytes, and if all bytes were 0xFF, then the file could still be 16MB before an overflow would occur. In that case, a float or double would suffice just fine.
Mainframe assembler programmer by trade. C coder when I can.
Using float would be worse then a long unsigned int.
Double has a 52 fraction bit, which means 53 bit precision. Which is a significant gain against an unsigned long int's 32 bits. So use double if you think the file may be larger than 16.8 MB.
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
Here sample...you have to modify, use at your own risk ^^
Code:/* * @author bvkim * @date 19 August, 2009 * @file sample.c * @note */ #define DEBUG #include <stdio.h> #include <stdlib.h> typedef struct _DATA { unsigned int sum; unsigned int count; } DATA; int average_bytes( const char *file, DATA *data ) { FILE* fp; unsigned int c; fp = fopen( file, "r" ); if( fp == NULL ) { return 0; } do { c = fgetc(fp); if( c == EOF ) break; #ifdef DEBUG printf("%d - ", c); #endif data->sum += c; data->count++; } while ( c != EOF ); #ifdef DEBUG printf("\n"); #endif fclose( fp ); return 1; } int main( int argc, char *argv[] ) { DATA *data; float avg; data = (DATA *) malloc( sizeof data ) ; data->sum = 0; data->count = 0; if( average_bytes("test", data) == 0 ) { printf("[error] error\n"); exit(1); } #ifdef DEBUG printf("Sum %d - Count %d\n", data->sum, data->count); #endif avg = data->sum / data->count; printf("byte average: %.2f \n", avg ); free( data ); return 0; }
Yes, but you want integer precision since if the result is expected to be in the range of 0-255 of the final calculation. To do that, the integer you want to store must have at most the number of significant bits that the format can store. This is one plus the the number of bits of the significand for the IEEE floating point representation.
Otherwise you'll have rounding errors. Specifically, when trying to add a number to your running total, then number will be rounded off of it's least significant bits.
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
I doubt you solved the problem correctly. Assume BLOCK_SIZE = 3, the average should be 3/1 = 3. Why adding the value of the bytes and dividing them by 3. It should be adding the number of the bytes and dividing them by the number of the blocks. Assume that the size of the given file in bytes is no greater than a multiple of 256 bytes, how could we solve this problem? Change BLOCK_SIZE = 256 instead.
I'm not sure I follow you. They only opportunity for a fractional value comes at divide time at the end, just as there would be when using an int for an accumulator. Then, the average can be cast to an int (or unsigned char) and you are done. When accumulating, there is no rounding and there is no fractional portion of the total - it's a whole number.
In terms of working with whole numbers, a float is just like an int, but a lot bigger bucket in the same amount of space (both are 4 bytes).
What are you saying that I'm not understanding?
Mainframe assembler programmer by trade. C coder when I can.
When adding a small number to a large number, the smaller number can be rounded off.
For example, imagine using a format with only four bits of significand, and adding 1.0000x2^5(32) and 1.1000x2^1 (3). The result should be 1.00011x2^5 (35), but the last digit does not fit in the encoding, so it must be rounded off, rounding either up or down getting either 34 or 36.
IEEE floating point formats are similar and suffer from the outlined problem.
Normally, you don't need 24 digits of precision, so this is not a problem. But in this case you want integer precision, no matter how high the accumulated total gets.
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
BLOCK_SIZE in the fread() has nothing to do with the average of all bytes in a file. With BLOCK_SIZE, you're just reading chunks of the file at a time.
Now, if your task was to calculate how many blocks were in a file of a given size, then you wouldn't even have to read the file to figure that out.
Mainframe assembler programmer by trade. C coder when I can.