Problems with Summing lots of doubles

• 07-14-2008
firetheGlazer
Problems with Summing lots of doubles
Hi,
I'm working with a very large number of doubles, each with a value between 0 and around 10. I'm not sure of the best implementation to get around floating point precision so that I can take a good average over all of the values. I was thinking of sorting the values and then breaking them up, but I don't know of a good way to find roughly what the float is before my computer spits out inf.

This is what I've tried so far:
Code:

``` void *func = &dComp;       qsort(moment,sSize,sizeof(double),func);             //compute average       unsigned long long int sum=0;       for(i=0;i<sSize;i++)         {         sum=sum + rint(100*moment[i]);         }             double fSum = (1.0*sum)/100.0;```
sSize can hopefully be around 10,001 and is the same as 1+ num points generated. When I exceed 64bits with accuracy only to the 1000th's, I get 00000000 for everything. I was thinking perhaps of chunking every 100 or 1000 or something like that into an array and then averaging those by dividing each entry by the number of total entries, but I feel like there is a more portable and flexible solution.
• 07-14-2008
matsp
The precision in floating point would be BETTER than the integer value as a 64-bit integer.

Are you actually seeing your average being wrong because of this, or are you just trying to prevent a problem that isn't actually there?

Doubles are precise to about 15 digits, even on repeating calculatons like this.

10001 * 10 is the largest value you can expect from your sum, so that will take up about 6 digits, you still have another 9 or so digits for remainder of the precision. That is much more precise than calculating an integer value of the float value times 100.

--
Mats
• 07-14-2008
firetheGlazer
My average is actually non-existent when I run this. Using an U L L int yields a bunch of 0s. When I use doubles, I get 'inf'. I need to use the average later to get a nice histogram, but since the simplest solution runs into precision problems. I tried using the code below, but it still spits out an inf.
Code:

``` double average=0.0, aveBins[400];        j=0;z=0;       for(i=0;i<sSize;i++)         {         assert(j!=400);         aveBins[j]+=moment[i];         if(z>99 && j<400)             {             j++;             z =0;             }         z++;         }               for(j=0;j<400;j++)         {         aveBins[j]/=sSize;         average+=aveBins[j];         }             printf("Average Field: %lf",average );```
I thought that a calculation to find the best bin size would be more suitable, but I'm not sure if there's a good way to find the size of an array in C. This is what I would use, but when I implemented it, my averages were smaller than expected. Example
Code:

`for(i=0 ; i < sizeof(aveBins)/sizeof(aveBins[0]) ; i++)`
• 07-14-2008
tabstop
This code doesn't initialize aveBins[400], so who knows what's already in there. You know that each aveBin shouldn't get above 4000 or so, so if you're willing to write all those out to a file, you'll see what you get.
• 07-14-2008
matsp
Something is wrong in what you originally stated: This works just fine:
Code:

```#include <stdio.h> #include <stdlib.h> #define SIZE 10001 double frand(double maxval) { #if 0         // Set above to 1 for testing the max value possible.         return maxval; #else         return ((double)rand() / RAND_MAX) * maxval; #endif } int main() {         double a[SIZE];         double sum = 0;         int i;         for(i = 0; i < SIZE; i++)         {                 a[i] = frand(10.0);         }         for(i = 0; i < SIZE; i++)         {                 sum += a[i];         }         printf("sum = %f, avg = %f\n", sum, sum / SIZE);         return 0; }```
The sum, when using random values is about 50000, and the average 5.0 something - which is exactly what one would expect.

If you get other values, I'd suggest you add a bit of code like this:
Code:

```        double max = -100000.0, min = 1000000.0;         for(i = 0; i < SIZE; i++)         {                 if (max < a[i]) max = a[i];                 if (min > a[i]) min = a[i];         }```
print max and min after the loop, to see if it's the expected range. I suspect you are having "garbage" in your array.

--
Mats
• 07-14-2008
firetheGlazer
Right on the money there! Sure enough, after looking through my data (quicksorted for ease of use), there were 7 or 8 'inf's. Is there an explicit way to test for if a value is inf? I'm going to use something along these lines, but something akin to isnan() would be nice.
Code:

```if(value == value+1)   ignore the value```
Thanks, and good catch.
• 07-14-2008
tabstop
C99 has isfinite() and isinf() in math.h.
• 07-15-2008
matsp
By what method are your application acquiring the numbers? Are you reading from a file, or using some other method?

If you are reading numbers from a file or such, my suggestion would be to check the result of reading the numbers, and reject the input if it's out of range.

If that's not the method you are using, then you need to figure out "why are the values #INF" - because that's clearly not a valid value.

--
Mats