Thread: Standard Deviation Calculation Error?

  1. #1
    Registered User
    Join Date
    Nov 2011
    Posts
    8

    Standard Deviation Calculation Error?

    So I've been given an assignment to create a table of sums and averages given some numbers, and to get the sums, averages and standard deviations for the columns. For some reason the standard deviation calculation seems wrong (I most likely got the formula wrong). Anyone know what part of the formula is wrong or how I could change it?

    For the numbers given, it's just a block of numbers 16 rows by 3 columns (16 down, 3 across) and I can't use an array.
    Code:
     while(count<=16)
          { 
            fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
            sum = n1 + n2 + n3;
            sum1 = sum1 + n1;
            sum2 = sum2 + n2;
            sum3 = sum3 + n3;
            avg = sum/3;
            sum4 = sum4 + sum;
            sum5 = sum5 + avg;
            avg1 = sum1/16;
            avg2 = sum2/16;
            avg3 = sum3/16;
            avg4 = sum4/16;
            avg5 = sum5/16;
            sd1 = sqrt((pow((n1-avg1),2))/16);
            sd2 = sqrt((pow((n2-avg2),2))/16);
            sd3 = sqrt((pow((n3-avg3),2))/16);
            sd4 = sqrt((pow((sum-avg4),2))/16);
            sd5 = sqrt((pow((avg-avg5),2))/16);
            count++;
            }
    I'll be happy to show the whole code if needed

  2. #2
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Whole code might be useful, though your calculations for standard deviation are definitely wrong. You need to do the square root as the last step, after you've added all the sum-squared errors together. It would be something like:
    Code:
    std_dev = sqrt(pow((n1 - avg), 2) + pow((n2 - avg), 2) + pow((n3 - avg), 2))
    It's a little hard to tell exactly, because your variable names are not very clear. You should try something like row_1_sum, row_1_avg, row_1_sd, ..., row_16_sum, row_16_avg, row_16_sd. Do likewise for columns.

    Also, you will want to look into the frewind function to go back to the beginning of the file to calculate standard deviation. You need to know the average to start calculating standard deviation, but you can't do that until you've read the file once to calculate the mean.

  3. #3
    Registered User
    Join Date
    Nov 2011
    Posts
    8
    Most of the code
    Code:
    #include <stdio.h>
    #include <math.h>
    
    main()
    {
          //Declares the variables and the sums and averages
          double count = 1, sum, avg, n1, n2, n3, sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0, sum5 = 0,
                 avg1, avg2, avg3, avg4, avg5, sd1, sd2, sd3, sd4, sd5;
          //Open the input and output files
          FILE *inp, *outp;
          
          inp = fopen("input.dat", "r");
          outp = fopen("result.out", "w");
          
          //Displays the format of the table
          printf("Count     #1        #2        #3        Sum       Avg\n");
          printf("_____     _____     _____     _____     _____     _____\n");
          fprintf(outp, "Count     #1        #2        #3        Sum       Avg\n");
          fprintf(outp, "_____     _____     _____     _____     _____     _____\n");
          
          //Loops to read and display all numbers from the file
          while(count<=16)
          { 
            fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
            sum = n1 + n2 + n3;
            sum1 = sum1 + n1;
            sum2 = sum2 + n2;
            sum3 = sum3 + n3;
            avg = sum/3;
            sum4 = sum4 + sum;
            sum5 = sum5 + avg;
            avg1 = sum1/16;
            avg2 = sum2/16;
            avg3 = sum3/16;
            avg4 = sum4/16;
            avg5 = sum5/16;
            sd1 = sqrt((pow((n1-avg1),2))/16);
            sd2 = sqrt((pow((n2-avg2),2))/16);
            sd3 = sqrt((pow((n3-avg3),2))/16);
            sd4 = sqrt((pow((sum-avg4),2))/16);
            sd5 = sqrt((pow((avg-avg5),2))/16);
            printf("%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
            count++;
            }
          
          printf("_____     _____     _____     _____     _____     _____\n");
          printf("Sum      %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", sum1,sum2,sum3,sum4,sum5);
          printf("_____     _____     _____     _____     _____     _____\n");
          printf("Average  %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", avg1,avg2,avg3,avg4,avg5);
          printf("_____     _____     _____     _____     _____     _____\n");
          printf("Standard\nDeviation %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", sd1,sd2,sd3,sd4,sd5);
    The reason I didn't name the variables well is because we've reused the same program but modified it several times and I was too lazy to rename everything every time.
    I would also just run it and view the table to check what goes where.

    I have no idea what the frewind function is, this is just an introd to C class


    -Also, that formula would use all 3 columns to calculate one standard deviation, when there are 5 different columns to use

    -This is the input file in case someone didn't understand my explanation
    Code:
    -0.043200 -0.003471 0.000000
    -0.040326 -0.004851 -0.000737
    -0.018204 -0.004246 -0.001530
    0.022249 0.008891 0.004870
    0.074892 0.044237 0.032171
    0.129600 0.100233 0.089016
    0.174747 0.160100 0.161792
    0.200242 0.199106 0.214417
    0.200242 0.199106 0.214417
    0.174747 0.160100 0.161792
    0.129600 0.100233 0.089016
    0.074892 0.044237 0.032171
    0.022249 0.008891 0.004870
    -0.018204 -0.004246 -0.001530
    -0.040326 -0.004851 -0.000737
    -0.043200 -0.003471 0.000000
    Last edited by faulerwulf; 11-30-2011 at 12:01 PM.

  4. #4
    Registered User
    Join Date
    Nov 2011
    Posts
    8
    What do y'all think about this? Think it works correctly?
    Code:
          while(count<=16)
          { 
            fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
            sum = n1 + n2 + n3;
            sum1 = sum1 + n1;
            sum2 = sum2 + n2;
            sum3 = sum3 + n3;
            avg = sum/3;
            sum4 = sum4 + sum;
            sum5 = sum5 + avg;
            avg1 = sum1/16;
            avg2 = sum2/16;
            avg3 = sum3/16;
            avg4 = sum4/16;
            avg5 = sum5/16;
            x1 = (pow((n1-avg1),2)) + x1;
            x2 = (pow((n2-avg2),2)) + x2;
            x3 = (pow((n3-avg3),2)) + x3;
            x4 = (pow((sum-avg4),2)) + x4;
            x5 = (pow((avg-avg5),2)) + x5;
            sd1 = sqrt((x1)/16);
            sd2 = sqrt((x2)/16);
            sd3 = sqrt((x3)/16);
            sd4 = sqrt((x4)/16);
            sd5 = sqrt((x5)/16);
            printf("%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
            fprintf(outp, "%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
            count++;
            }

  5. #5
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Quote Originally Posted by faulerwulf View Post
    What do y'all think about this? Think it works correctly?
    My guess is no.

    Tim S.

  6. #6
    Registered User
    Join Date
    Nov 2011
    Posts
    8
    Quote Originally Posted by stahta01 View Post
    My guess is no.

    Tim S.
    Why not? (and who is Tim S.)
    I believe it would work because it sums the (number-average)^2 then divides it by 16 (the number of numbers) and square roots it... I think

  7. #7
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by faulerwulf View Post
    The reason I didn't name the variables well is because we've reused the same program but modified it several times and I was too lazy to rename everything every time.
    I would also just run it and view the table to check what goes where.
    Not a good excuse or habit to get into. At least I have column headers now so I know what each one is.

    I have no idea what the frewind function is, this is just an introd to C class
    frewind is not some super-elite function only for really advanced C users. It's part of the C standard, in the ubiquitous stdio.h header, not far from printf or scanf. Besides, there is no excuse or reason you couldn't at least have looked it up: fseek(3) - Linux manual page. There is no good way you can calculate the standard deviation of a column without reading through it a first time to determine the average, then reading through it a second time to determine how far each data point is from the average. The alternative is to store each data point in a variable like data_11_2 for the data point in row 11, column 2. But that requires 48 variables just for the data in the file, plus all the others for sums, averages and standard deviations. Quite a mess to keep track of.

    -Also, that formula would use all 3 columns to calculate one standard deviation, when there are 5 different columns to use
    I know, that was the standard deviation for one row, which I didn't know if you needed, since you didn't specify. It was to show you the general idea, that all the calls to pow are added up then square-rooted exactly once per standard deviation. I was hoping you would be able to extrapolate from my example how to fix your code.

    Quote Originally Posted by faulerwulf View Post
    Why not? (and who is Tim S.)
    I believe it would work because it sums the (number-average)^2 then divides it by 16 (the number of numbers) and square roots it... I think
    It wont work because when you read your first line, avg1 is not the average of the whole column yet, so pow(n1 - avg1, 2) is not the squared error of that data point.

    You need to use frewind to go back to the beginning of the file, then you need to reread each line, and calculate the sum-squared error for each column, and square root that at the end to get your standard deviations.

    Tim S. Is probably stahta01's real name.

  8. #8
    Registered User
    Join Date
    Nov 2011
    Posts
    8
    okay, so I would need to use frewind (I'm assuming that means to rewind the file to the beginning) and then do the loop again, and just do almost the same thing (excluding avg) so i could use the final avg's in the equations. I looked at that page and I don't really understand how I would use it, anyone want to help explain how I would use it? (other pages may be very helpful)

  9. #9
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Another directions link

    rewind - C++ Reference
    fseek - C++ Reference

    Tim S.

  10. #10
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Yep, it takes you back to the beginning of the file. From the documentation, the prototype looks like
    Code:
    void rewind(FILE *stream);
    That means the function takes a FILE * to the file you want to rewind (in your case the input file pointer, inp), and returns nothing, so you just do:
    Code:
    rewind(inp);
    And you're back to the beginning of the file, like you just opened it. You can do essentially the same loop again with the same scanf call to read each line.

  11. #11
    Registered User
    Join Date
    Nov 2011
    Posts
    8
    Okay, I think I have it now.
    Would this have solved the problem?
    Code:
          while(count<=16)
          { 
            fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
            sum = n1 + n2 + n3;
            sum1 = sum1 + n1;
            sum2 = sum2 + n2;
            sum3 = sum3 + n3;
            avg = sum/3;
            sum4 = sum4 + sum;
            sum5 = sum5 + avg;
            avg1 = sum1/16;
            avg2 = sum2/16;
            avg3 = sum3/16;
            avg4 = sum4/16;
            avg5 = sum5/16;
            printf("%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
            fprintf(outp, "%5.0f    %6.3f    %6.3f    %6.3f    %6.3f    %6.3f\n", count,n1,n2,n3,sum,avg);
            count++;
            }
          //Rewinds the input file so I can have the final average to use while calculating the standard deviation
          rewind(inp); count = 1;
          while(count<=16)
          { 
            fscanf(inp, "%lf%lf%lf", &n1,&n2,&n3);
            sum = n1 + n2 + n3;
            avg = sum/3;
            x1 = (pow((n1-avg1),2)) + x1;
            x2 = (pow((n2-avg2),2)) + x2;
            x3 = (pow((n3-avg3),2)) + x3;
            x4 = (pow((sum-avg4),2)) + x4;
            x5 = (pow((avg-avg5),2)) + x5;
            sd1 = sqrt((x1)/16);
            sd2 = sqrt((x2)/16);
            sd3 = sqrt((x3)/16);
            sd4 = sqrt((x4)/16);
            sd5 = sqrt((x5)/16);
            count++;}
    -Or should the sd1 = sqrt((x1)/16); be outside of the loop? If so, would that then solve the whole problem?
    Last edited by faulerwulf; 11-30-2011 at 02:59 PM. Reason: fixed code

  12. #12
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    A set of numbers have a single sum and a single average!!

    Tim S.

  13. #13
    Registered User
    Join Date
    Nov 2011
    Posts
    8
    oops, haha. better now?
    -Actually wait, I included the sum and avg again for the additive purpose in the x4 and x5 calculations, not to change the results, the results printed out in the first loop anyways.
    Last edited by faulerwulf; 11-30-2011 at 03:00 PM.

  14. #14
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Your code will work the way it is, but it does unnecessary calculations. The avg1...avg5 and sd1...sd5 can be outside the loop. The other calculations have to be inside the loops. Also, instead of x = x + y, you can use the shorthand version x += y, e.g.
    Code:
    sum1 += n1;
    ...
    x1 += pow((n1 - avg1), 2);
    ...
    Other than that, it looks good. I'll leave the final testing to you.

  15. #15
    Registered User
    Join Date
    Nov 2011
    Posts
    8
    Quote Originally Posted by anduril462 View Post
    Your code will work the way it is, but it does unnecessary calculations. The avg1...avg5 and sd1...sd5 can be outside the loop. The other calculations have to be inside the loops. Also, instead of x = x + y, you can use the shorthand version x += y, e.g.
    Code:
    sum1 += n1;
    ...
    x1 += pow((n1 - avg1), 2);
    ...
    Other than that, it looks good. I'll leave the final testing to you.
    Oh i get your statement now. Either way, it doesn't hurt to leave them in the loop. I have been taught the shorthand way but we never learned it -in class- so I left it the longhand way for when I present it. Makes it easier to read to people who haven't seen it before.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Standard Deviation, etc.
    By wovenhead in forum C Programming
    Replies: 12
    Last Post: 02-15-2011, 08:15 PM
  2. Standard deviation
    By Dontgiveup in forum C++ Programming
    Replies: 4
    Last Post: 04-22-2009, 01:46 PM
  3. help with standard deviation
    By belkins in forum C Programming
    Replies: 3
    Last Post: 10-28-2008, 11:04 PM
  4. calculation of standard deviation
    By blue_gene in forum C Programming
    Replies: 7
    Last Post: 04-19-2004, 12:50 AM
  5. Standard Deviation in C++
    By Unregistered in forum C++ Programming
    Replies: 5
    Last Post: 09-14-2001, 11:09 AM