# Thread: Math Gone Kablooie

1. ## Math Gone Kablooie

I've got this one little function that calculates standard deviation. All the separate peices work, but together it blows up. I simply cannot figure out why.

I'm trying to get this formula (its hard to show in ascii): Sx = sqrt( (#SUM OF FOR I TO N# Xi^2 - n(average of X)^2) / N)

Here's the function. FYI I put in a zillion extra ()'s to make sure it did things right...
Code:
```double standev( void ) {
//standard deviation

int i;
double top=0;

for (i=0;i<N;i++) {
top += ((x[i]*x[i]) - (((double)(N))*(avgx*avgx)));
if (DEBUG) printf("[+] top += %f %f*%f=%f - %f * %f * %f = %f\n",(x[i]*x[i]) - (N*(avgx*avgx)),x[i],x[i],(x[i]*x[i]),(double)(N),avgx,avgx,(avgx*avgx));
}

if (DEBUG) printf("[+] standard dev: %f\n[+] top: %f, divide by N : %f\n",sqrt(top/(double)(N)),top,(double)(N));
return sqrt(top/(double)(N));
}```
See, the output with DEBUG set is as follows:

Code:
```[+] top += -3252.785233 3.000000*3.000000=9.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3245.785233 4.000000*4.000000=16.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3236.785233 5.000000*5.000000=25.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3225.785233 6.000000*6.000000=36.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3212.785233 7.000000*7.000000=49.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3197.785233 8.000000*8.000000=64.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3180.785233 9.000000*9.000000=81.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3161.785233 10.000000*10.000000=100.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3140.785233 11.000000*11.000000=121.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3117.785233 12.000000*12.000000=144.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3092.785233 13.000000*13.000000=169.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3065.785233 14.000000*14.000000=196.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3036.785233 15.000000*15.000000=225.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -3005.785233 16.000000*16.000000=256.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -2972.785233 17.000000*17.000000=289.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -2937.785233 18.000000*18.000000=324.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -2900.785233 19.000000*19.000000=361.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -2861.785233 20.000000*20.000000=400.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -2820.785233 21.000000*21.000000=441.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
[+] top += -2777.785233 22.000000*22.000000=484.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262```
See, all the various peices hold the right values, but the final is crazy way off.

Here's the full source:
Code:
```#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define N 20
#define DEBUG 1

double x[N],y[N],avgx,avgy,avgxy,r;

void getvals( void ) {
int i;

for (i=0;i<20;i++) {
x[i]=(double)(i) + 3.0;
y[i]=(double)(i);
if (DEBUG) printf("[+] (%f,%f)\n",x[i],y[i]);
}
}

void getavg( void ) {
int i;
double sumx, sumy, sumxy;

//before here, validate so that sizeof(x)=sizeof(y)

for (i=0;i<N;i++) {
sumx += x[i];
sumy += y[i];
sumxy += x[i]*y[i];
}

avgx = sumx/(double)(N);
avgy = sumy/(double)(N);
avgxy = sumxy/(double)(N);

if (DEBUG) printf("[+] avgx %f avgy %f avgxy %f\n",avgx,avgy,avgxy);
}

double standev( void ) {
//standard deviation

int i;
double top=0;

for (i=0;i<N;i++) {
top += ((x[i]*x[i]) - (((double)(N))*(avgx*avgx)));
if (DEBUG) printf("[+] top += %f %f*%f=%f - %f * %f * %f = %f\n",(x[i]*x[i]) - (N*(avgx*avgx)),x[i],x[i],(x[i]*x[i]),(double)(N),avgx,avgx,(avgx*avgx));
}

if (DEBUG) printf("[+] standard dev: %f\n[+] top: %f, divide by N : %f\n",sqrt(top/(double)(N)),top,(double)(N));
return sqrt(top/(double)(N));
}

void findlinear( void ) {
double slope,yint,sx;

//here we need to find the slope, then change the
//y-int so that the line runs through (avgx,avgy)

getvals();
getavg();

sx = standev();
slope = (avgxy - (avgx*avgy))/(sx*sx);
yint = avgy - (slope*avgx);

printf("y = %fx + %f\n");
}

int main(int argc, char *argv[]) {

findlinear();
return 0;
}```
Thanks a lot, I'm pulling my hair out.

2. You might want to initialize these since you later increment them.
Code:
`  double sumx, sumy, sumxy;`
You might want to provide information to printf.
Code:
`  printf("y = %fx + %f\n");`
How may times do you do this: avgx*avgx? A temporary variable seems appropriate.

3. Also, there are two standard deviations -- one that divides by n and one that divides by (n - 1). One is the "population" standard deviation and the other is the "sample" standard deviation. Don't ask me why. Don't ask me which is which. If you're comparing numerical results with something and they're slightly off, this is the reason.

4. Originally Posted by Rashakil Fol
Also, there are two standard deviations -- one that divides by n and one that divides by (n - 1). One is the "population" standard deviation and the other is the "sample" standard deviation. Don't ask me why. Don't ask me which is which.
http://en.wikipedia.org/wiki/Standard_deviation
http://mathworld.wolfram.com/StandardDeviation.html

5. Originally Posted by crepincdotcom

I've got this one little function that calculates standard deviation. All the separate peices work, but together it blows up. I simply cannot figure out why.

I'm trying to get this formula (its hard to show in ascii): Sx = sqrt( (#SUM OF FOR I TO N# Xi^2 - n(average of X)^2) / N)
This is incorrect. In the definition, the terms to be summed are given by this:

(Xi - xavg) squared.

Then divide by N (or, maybe, N-1) and take the square root.

The calculation can be simplified a number of ways, but the formula that you give is not one of them.

What I should have said is that the calculations in the program are incorrect, and are not consistent with the definition or with the formula of your narration.

You could calculate the SUM of Xi*Xi in a loop, then
sqrt((SUM - N*xavg*xavg)/N)

This is not what your program does.
[/edit]

Regards,

Dave

6. Dave, the formula I provided is not the proper one. It is correctly inplemented in the code, I check that many times over.

The line
Code:
`double top=0;`
Does that need to be
Code:
`double top=0.0;`
by chance? It seems that the initial "top" value being incremented is funky.

Dave_Sinkula, you can see in the debug out that that is not the issue. All the little parts are there correctly, avgx, avgy, etc, but the final incrementation of top is incorrect.

Ha BTW yes arguments to prontf would help. Still, thats not the main issue.

Thanks

7. Simpler code is easier to follow.
Code:
```#include <stdio.h>
#include <math.h>

double average(const double *array, size_t size)
{
double sum = 0;
size_t i;
for ( i = 0; i < size; ++i )
{
sum += *array++;
}
return sum / size;
}

double stddev(const double *array, size_t size)
{
double sum = 0, mean = average(array, size);
size_t i;
for ( i = 0; i < size; ++i )
{
double temp = *array++ - mean;
sum += temp * temp;
}
if ( size > 1 )
{
return sqrt(sum / (size - 1));
}
return -1;
}

int main(void)
{
double array[] = {1,2,3,4,5,6,7,8,9,10};
printf("average() = %g\n", average(array, sizeof array / sizeof *array));
printf("stddev()  = %g\n", stddev(array,  sizeof array / sizeof *array));
return 0;
}

/* my output
average() = 5.5
stddev()  = 3.02765
*/```

8. Originally Posted by crepincdotcom
Dave, the formula I provided is not the proper one. It is correctly inplemented in the code, I check that many times over.
You can check it as many times as you want to. It's still wrong.

You could implement it the way I suggested, by putting the following block just after your loop in standev:

Code:
```  {
double top2;
top2 = 0;
for (i = 0; i < N; i++) {
top2 += x[i]*x[i];
}
top2 = sqrt((top2 - (double)N*avgx*avgx)/(double)N);
printf("By corrected formula: std dev = %f\n", top2);
}```
For your data, I got

By corrected formula: std dev = 5.766281
Note that this is (probably) not the way I would have implemented it, but I wanted to use terminology similar to yours so that you could compare your code with your description and then compare with this.

Originally Posted by crepincdotcom
Dave_Sinkula, you can see in the debug out that that is not the issue. All the little parts are there correctly, avgx, avgy, etc, but the final incrementation of top is incorrect.
Variables sumx, sumy and sumxy in your getavg() function should be initialized. This might have no bearing on your present problem, but somehow, someday you will have other problems due to uninitialized automatic variables. When I compiled your code with gcc, I found that the values are not initialized and all calculations in getavg() are incorrect. Unlike global variables, compilers aren't required to initialize these to zero.

Originally Posted by crepincdotcom
The line
Code:
`double top=0;`
Does that need to be
Code:
`double top=0.0;`
Some people might recommend that as a matter of style, but in fact it makes no difference. The assignment automatically promotes an integer 0 to a double 0.0.

Regards,

Dave

9. Thanks all... I think I'll go cry now.

Just kidding, but my code is pretty bad isn't it.

Dave_Sinkula, your code is very pretty and I will test it tonight. But your use of pointers and (not sure of terminology) "dynamic" or "un-predefined" arrays is somewhat beyond me. Is there a good place I can read about these? For instance, ultimatly I'd like to read in the numbers from a file, but in that case I would have to change the size of the array, since it couldn't be compiled in...

Thanks all

10. If there is some maximum size, you could do this.
Code:
```int main(void)
{
static const char filename[] = "file.csv";
FILE *file = fopen(filename, "r");
if ( file )
{
double x [ 35 ], y [ sizeof x / sizeof *x ];
size_t i;
for ( i = 0; i < sizeof x / sizeof *x; ++i )
{
if ( fscanf(file, "%lf,%lf", &x[i], &y[i]) != 2 )
{
break;
}
}
fclose(file);
printf("average() = %g\n", average(x, i));
printf("stddev()  = %g\n", stddev (x, i));
printf("average() = %g\n", average(y, i));
printf("stddev()  = %g\n", stddev (y, i));
}
else
{
perror(filename);
}
return 0;
}

/* file.csv
3,0
4,1
5,2
6,3
7,4
8,5
9,6
10,7
11,8
12,9
13,10
14,11
15,12
16,13
17,14
18,15
19,16
20,17
21,18
22,19
*/

/* my output
average() = 12.5
stddev()  = 5.91608
average() = 9.5
stddev()  = 5.91608
*/```

11. Thanks

Popular pages Recent additions