# Thread: Standard Deviation Calculation Error?

1. ## Standard Deviation Calculation Error?

So I've been given an assignment to create a table of sums and averages given some numbers, and to get the sums, averages and standard deviations for the columns. For some reason the standard deviation calculation seems wrong (I most likely got the formula wrong). Anyone know what part of the formula is wrong or how I could change it?

For the numbers given, it's just a block of numbers 16 rows by 3 columns (16 down, 3 across) and I can't use an array.
Code:
``` while(count<=16)
{
fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
sum = n1 + n2 + n3;
sum1 = sum1 + n1;
sum2 = sum2 + n2;
sum3 = sum3 + n3;
avg = sum/3;
sum4 = sum4 + sum;
sum5 = sum5 + avg;
avg1 = sum1/16;
avg2 = sum2/16;
avg3 = sum3/16;
avg4 = sum4/16;
avg5 = sum5/16;
sd1 = sqrt((pow((n1-avg1),2))/16);
sd2 = sqrt((pow((n2-avg2),2))/16);
sd3 = sqrt((pow((n3-avg3),2))/16);
sd4 = sqrt((pow((sum-avg4),2))/16);
sd5 = sqrt((pow((avg-avg5),2))/16);
count++;
}```
I'll be happy to show the whole code if needed

2. Whole code might be useful, though your calculations for standard deviation are definitely wrong. You need to do the square root as the last step, after you've added all the sum-squared errors together. It would be something like:
Code:
`std_dev = sqrt(pow((n1 - avg), 2) + pow((n2 - avg), 2) + pow((n3 - avg), 2))`
It's a little hard to tell exactly, because your variable names are not very clear. You should try something like row_1_sum, row_1_avg, row_1_sd, ..., row_16_sum, row_16_avg, row_16_sd. Do likewise for columns.

Also, you will want to look into the frewind function to go back to the beginning of the file to calculate standard deviation. You need to know the average to start calculating standard deviation, but you can't do that until you've read the file once to calculate the mean.

3. Most of the code
Code:
```#include <stdio.h>
#include <math.h>

main()
{
//Declares the variables and the sums and averages
double count = 1, sum, avg, n1, n2, n3, sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0, sum5 = 0,
avg1, avg2, avg3, avg4, avg5, sd1, sd2, sd3, sd4, sd5;
//Open the input and output files
FILE *inp, *outp;

inp = fopen("input.dat", "r");
outp = fopen("result.out", "w");

//Displays the format of the table
printf("Count     #1        #2        #3        Sum       Avg\n");
printf("_____     _____     _____     _____     _____     _____\n");
fprintf(outp, "Count     #1        #2        #3        Sum       Avg\n");
fprintf(outp, "_____     _____     _____     _____     _____     _____\n");

//Loops to read and display all numbers from the file
while(count<=16)
{
fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
sum = n1 + n2 + n3;
sum1 = sum1 + n1;
sum2 = sum2 + n2;
sum3 = sum3 + n3;
avg = sum/3;
sum4 = sum4 + sum;
sum5 = sum5 + avg;
avg1 = sum1/16;
avg2 = sum2/16;
avg3 = sum3/16;
avg4 = sum4/16;
avg5 = sum5/16;
sd1 = sqrt((pow((n1-avg1),2))/16);
sd2 = sqrt((pow((n2-avg2),2))/16);
sd3 = sqrt((pow((n3-avg3),2))/16);
sd4 = sqrt((pow((sum-avg4),2))/16);
sd5 = sqrt((pow((avg-avg5),2))/16);
printf("%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
count++;
}

printf("_____     _____     _____     _____     _____     _____\n");
printf("Sum      %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", sum1,sum2,sum3,sum4,sum5);
printf("_____     _____     _____     _____     _____     _____\n");
printf("Average  %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", avg1,avg2,avg3,avg4,avg5);
printf("_____     _____     _____     _____     _____     _____\n");
printf("Standard\nDeviation %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", sd1,sd2,sd3,sd4,sd5);```
The reason I didn't name the variables well is because we've reused the same program but modified it several times and I was too lazy to rename everything every time.
I would also just run it and view the table to check what goes where.

I have no idea what the frewind function is, this is just an introd to C class

-Also, that formula would use all 3 columns to calculate one standard deviation, when there are 5 different columns to use

-This is the input file in case someone didn't understand my explanation
Code:
```-0.043200 -0.003471 0.000000
-0.040326 -0.004851 -0.000737
-0.018204 -0.004246 -0.001530
0.022249 0.008891 0.004870
0.074892 0.044237 0.032171
0.129600 0.100233 0.089016
0.174747 0.160100 0.161792
0.200242 0.199106 0.214417
0.200242 0.199106 0.214417
0.174747 0.160100 0.161792
0.129600 0.100233 0.089016
0.074892 0.044237 0.032171
0.022249 0.008891 0.004870
-0.018204 -0.004246 -0.001530
-0.040326 -0.004851 -0.000737
-0.043200 -0.003471 0.000000```

Code:
```      while(count<=16)
{
fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
sum = n1 + n2 + n3;
sum1 = sum1 + n1;
sum2 = sum2 + n2;
sum3 = sum3 + n3;
avg = sum/3;
sum4 = sum4 + sum;
sum5 = sum5 + avg;
avg1 = sum1/16;
avg2 = sum2/16;
avg3 = sum3/16;
avg4 = sum4/16;
avg5 = sum5/16;
x1 = (pow((n1-avg1),2)) + x1;
x2 = (pow((n2-avg2),2)) + x2;
x3 = (pow((n3-avg3),2)) + x3;
x4 = (pow((sum-avg4),2)) + x4;
x5 = (pow((avg-avg5),2)) + x5;
sd1 = sqrt((x1)/16);
sd2 = sqrt((x2)/16);
sd3 = sqrt((x3)/16);
sd4 = sqrt((x4)/16);
sd5 = sqrt((x5)/16);
printf("%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
fprintf(outp, "%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
count++;
}```

5. Originally Posted by faulerwulf
My guess is no.

Tim S.

6. Originally Posted by stahta01
My guess is no.

Tim S.
Why not? (and who is Tim S.)
I believe it would work because it sums the (number-average)^2 then divides it by 16 (the number of numbers) and square roots it... I think

7. Originally Posted by faulerwulf
The reason I didn't name the variables well is because we've reused the same program but modified it several times and I was too lazy to rename everything every time.
I would also just run it and view the table to check what goes where.
Not a good excuse or habit to get into. At least I have column headers now so I know what each one is.

I have no idea what the frewind function is, this is just an introd to C class
frewind is not some super-elite function only for really advanced C users. It's part of the C standard, in the ubiquitous stdio.h header, not far from printf or scanf. Besides, there is no excuse or reason you couldn't at least have looked it up: fseek(3) - Linux manual page. There is no good way you can calculate the standard deviation of a column without reading through it a first time to determine the average, then reading through it a second time to determine how far each data point is from the average. The alternative is to store each data point in a variable like data_11_2 for the data point in row 11, column 2. But that requires 48 variables just for the data in the file, plus all the others for sums, averages and standard deviations. Quite a mess to keep track of.

-Also, that formula would use all 3 columns to calculate one standard deviation, when there are 5 different columns to use
I know, that was the standard deviation for one row, which I didn't know if you needed, since you didn't specify. It was to show you the general idea, that all the calls to pow are added up then square-rooted exactly once per standard deviation. I was hoping you would be able to extrapolate from my example how to fix your code.

Originally Posted by faulerwulf
Why not? (and who is Tim S.)
I believe it would work because it sums the (number-average)^2 then divides it by 16 (the number of numbers) and square roots it... I think
It wont work because when you read your first line, avg1 is not the average of the whole column yet, so pow(n1 - avg1, 2) is not the squared error of that data point.

You need to use frewind to go back to the beginning of the file, then you need to reread each line, and calculate the sum-squared error for each column, and square root that at the end to get your standard deviations.

Tim S. Is probably stahta01's real name.

8. okay, so I would need to use frewind (I'm assuming that means to rewind the file to the beginning) and then do the loop again, and just do almost the same thing (excluding avg) so i could use the final avg's in the equations. I looked at that page and I don't really understand how I would use it, anyone want to help explain how I would use it? (other pages may be very helpful)

rewind - C++ Reference
fseek - C++ Reference

Tim S.

10. Yep, it takes you back to the beginning of the file. From the documentation, the prototype looks like
Code:
`void rewind(FILE *stream);`
That means the function takes a FILE * to the file you want to rewind (in your case the input file pointer, inp), and returns nothing, so you just do:
Code:
`rewind(inp);`
And you're back to the beginning of the file, like you just opened it. You can do essentially the same loop again with the same scanf call to read each line.

11. Okay, I think I have it now.
Would this have solved the problem?
Code:
```      while(count<=16)
{
fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
sum = n1 + n2 + n3;
sum1 = sum1 + n1;
sum2 = sum2 + n2;
sum3 = sum3 + n3;
avg = sum/3;
sum4 = sum4 + sum;
sum5 = sum5 + avg;
avg1 = sum1/16;
avg2 = sum2/16;
avg3 = sum3/16;
avg4 = sum4/16;
avg5 = sum5/16;
printf("%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
fprintf(outp, "%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
count++;
}
//Rewinds the input file so I can have the final average to use while calculating the standard deviation
rewind(inp); count = 1;
while(count<=16)
{
fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
sum = n1 + n2 + n3;
avg = sum/3;
x1 = (pow((n1-avg1),2)) + x1;
x2 = (pow((n2-avg2),2)) + x2;
x3 = (pow((n3-avg3),2)) + x3;
x4 = (pow((sum-avg4),2)) + x4;
x5 = (pow((avg-avg5),2)) + x5;
sd1 = sqrt((x1)/16);
sd2 = sqrt((x2)/16);
sd3 = sqrt((x3)/16);
sd4 = sqrt((x4)/16);
sd5 = sqrt((x5)/16);
count++;}```
-Or should the sd1 = sqrt((x1)/16); be outside of the loop? If so, would that then solve the whole problem?

12. A set of numbers have a single sum and a single average!!

Tim S.

13. oops, haha. better now?
-Actually wait, I included the sum and avg again for the additive purpose in the x4 and x5 calculations, not to change the results, the results printed out in the first loop anyways.

14. Your code will work the way it is, but it does unnecessary calculations. The avg1...avg5 and sd1...sd5 can be outside the loop. The other calculations have to be inside the loops. Also, instead of x = x + y, you can use the shorthand version x += y, e.g.
Code:
```sum1 += n1;
...
x1 += pow((n1 - avg1), 2);
...```
Other than that, it looks good. I'll leave the final testing to you.

15. Originally Posted by anduril462
Your code will work the way it is, but it does unnecessary calculations. The avg1...avg5 and sd1...sd5 can be outside the loop. The other calculations have to be inside the loops. Also, instead of x = x + y, you can use the shorthand version x += y, e.g.
Code:
```sum1 += n1;
...
x1 += pow((n1 - avg1), 2);
...```
Other than that, it looks good. I'll leave the final testing to you.
Oh i get your statement now. Either way, it doesn't hurt to leave them in the loop. I have been taught the shorthand way but we never learned it -in class- so I left it the longhand way for when I present it. Makes it easier to read to people who haven't seen it before.