Thread: Math Gone Kablooie

  1. #1
    Registered User crepincdotcom's Avatar
    Join Date
    Oct 2003
    Posts
    94

    Unhappy Math Gone Kablooie



    I've got this one little function that calculates standard deviation. All the separate peices work, but together it blows up. I simply cannot figure out why.

    I'm trying to get this formula (its hard to show in ascii): Sx = sqrt( (#SUM OF FOR I TO N# Xi^2 - n(average of X)^2) / N)

    Here's the function. FYI I put in a zillion extra ()'s to make sure it did things right...
    Code:
    double standev( void ) {
      //standard deviation
    
      int i;
      double top=0;
    
      for (i=0;i<N;i++) {
        top += ((x[i]*x[i]) - (((double)(N))*(avgx*avgx)));
        if (DEBUG) printf("[+] top += %f %f*%f=%f - %f * %f * %f = %f\n",(x[i]*x[i]) - (N*(avgx*avgx)),x[i],x[i],(x[i]*x[i]),(double)(N),avgx,avgx,(avgx*avgx));
      }
    
      if (DEBUG) printf("[+] standard dev: %f\n[+] top: %f, divide by N : %f\n",sqrt(top/(double)(N)),top,(double)(N));
      return sqrt(top/(double)(N));
    }
    See, the output with DEBUG set is as follows:

    Code:
    [+] top += -3252.785233 3.000000*3.000000=9.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3245.785233 4.000000*4.000000=16.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3236.785233 5.000000*5.000000=25.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3225.785233 6.000000*6.000000=36.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3212.785233 7.000000*7.000000=49.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3197.785233 8.000000*8.000000=64.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3180.785233 9.000000*9.000000=81.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3161.785233 10.000000*10.000000=100.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3140.785233 11.000000*11.000000=121.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3117.785233 12.000000*12.000000=144.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3092.785233 13.000000*13.000000=169.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3065.785233 14.000000*14.000000=196.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3036.785233 15.000000*15.000000=225.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -3005.785233 16.000000*16.000000=256.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -2972.785233 17.000000*17.000000=289.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -2937.785233 18.000000*18.000000=324.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -2900.785233 19.000000*19.000000=361.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -2861.785233 20.000000*20.000000=400.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -2820.785233 21.000000*21.000000=441.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    [+] top += -2777.785233 22.000000*22.000000=484.000000 - 20.000000 * 12.770641 * 12.770641 = 163.089262
    See, all the various peices hold the right values, but the final is crazy way off.

    Here's the full source:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    
    #define N 20
    #define DEBUG 1
    
    double x[N],y[N],avgx,avgy,avgxy,r;
    
    void getvals( void ) {
      //read from file...
      int i;
    
      for (i=0;i<20;i++) {
        x[i]=(double)(i) + 3.0;
        y[i]=(double)(i);
        if (DEBUG) printf("[+] (%f,%f)\n",x[i],y[i]);
      }
    }
    
    void getavg( void ) {
      int i;
      double sumx, sumy, sumxy;
    
      //before here, validate so that sizeof(x)=sizeof(y)
    
      for (i=0;i<N;i++) {
        sumx += x[i];
        sumy += y[i];
        sumxy += x[i]*y[i];
      }
    
      avgx = sumx/(double)(N);
      avgy = sumy/(double)(N);
      avgxy = sumxy/(double)(N);
    
      if (DEBUG) printf("[+] avgx %f avgy %f avgxy %f\n",avgx,avgy,avgxy);
    }
    
    double standev( void ) {
      //standard deviation
    
      int i;
      double top=0;
    
      for (i=0;i<N;i++) {
        top += ((x[i]*x[i]) - (((double)(N))*(avgx*avgx)));
        if (DEBUG) printf("[+] top += %f %f*%f=%f - %f * %f * %f = %f\n",(x[i]*x[i]) - (N*(avgx*avgx)),x[i],x[i],(x[i]*x[i]),(double)(N),avgx,avgx,(avgx*avgx));
      }
    
      if (DEBUG) printf("[+] standard dev: %f\n[+] top: %f, divide by N : %f\n",sqrt(top/(double)(N)),top,(double)(N));
      return sqrt(top/(double)(N));
    }
    
    void findlinear( void ) {
      double slope,yint,sx;
    
      //here we need to find the slope, then change the
      //y-int so that the line runs through (avgx,avgy)
    
      getvals();
      getavg();
    
      sx = standev();
      slope = (avgxy - (avgx*avgy))/(sx*sx);
      yint = avgy - (slope*avgx);
    
      printf("y = %fx + %f\n");
    }
    
    int main(int argc, char *argv[]) {
    
      findlinear();
      return 0;
    }
    Thanks a lot, I'm pulling my hair out.
    -Jack C
    jack {at} crepinc.com
    http://www.crepinc.com

  2. #2
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    You might want to initialize these since you later increment them.
    Code:
      double sumx, sumy, sumxy;
    You might want to provide information to printf.
    Code:
      printf("y = %fx + %f\n");
    How may times do you do this: avgx*avgx? A temporary variable seems appropriate.
    Last edited by Dave_Sinkula; 07-18-2005 at 08:49 AM. Reason: Typos -- I just hate 'em.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  3. #3
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    Also, there are two standard deviations -- one that divides by n and one that divides by (n - 1). One is the "population" standard deviation and the other is the "sample" standard deviation. Don't ask me why. Don't ask me which is which. If you're comparing numerical results with something and they're slightly off, this is the reason.

  4. #4
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by Rashakil Fol
    Also, there are two standard deviations -- one that divides by n and one that divides by (n - 1). One is the "population" standard deviation and the other is the "sample" standard deviation. Don't ask me why. Don't ask me which is which.
    http://en.wikipedia.org/wiki/Standard_deviation
    http://mathworld.wolfram.com/StandardDeviation.html
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  5. #5
    Registered User
    Join Date
    Mar 2004
    Posts
    536
    Quote Originally Posted by crepincdotcom


    I've got this one little function that calculates standard deviation. All the separate peices work, but together it blows up. I simply cannot figure out why.

    I'm trying to get this formula (its hard to show in ascii): Sx = sqrt( (#SUM OF FOR I TO N# Xi^2 - n(average of X)^2) / N)
    This is incorrect. In the definition, the terms to be summed are given by this:

    (Xi - xavg) squared.

    Then divide by N (or, maybe, N-1) and take the square root.

    The calculation can be simplified a number of ways, but the formula that you give is not one of them.

    [edit]

    What I should have said is that the calculations in the program are incorrect, and are not consistent with the definition or with the formula of your narration.

    You could calculate the SUM of Xi*Xi in a loop, then
    sqrt((SUM - N*xavg*xavg)/N)

    This is not what your program does.
    [/edit]

    Regards,

    Dave
    Last edited by Dave Evans; 07-18-2005 at 02:20 PM.

  6. #6
    Registered User crepincdotcom's Avatar
    Join Date
    Oct 2003
    Posts
    94
    Dave, the formula I provided is not the proper one. It is correctly inplemented in the code, I check that many times over.

    The line
    Code:
    double top=0;
    Does that need to be
    Code:
    double top=0.0;
    by chance? It seems that the initial "top" value being incremented is funky.

    Dave_Sinkula, you can see in the debug out that that is not the issue. All the little parts are there correctly, avgx, avgy, etc, but the final incrementation of top is incorrect.

    Ha BTW yes arguments to prontf would help. Still, thats not the main issue.

    Thanks
    -Jack C
    jack {at} crepinc.com
    http://www.crepinc.com

  7. #7
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Simpler code is easier to follow.
    Code:
    #include <stdio.h>
    #include <math.h>
    
    double average(const double *array, size_t size)
    {
       double sum = 0;
       size_t i;
       for ( i = 0; i < size; ++i )
       {
          sum += *array++;
       }
       return sum / size;
    }
    
    double stddev(const double *array, size_t size)
    {
       double sum = 0, mean = average(array, size);
       size_t i;
       for ( i = 0; i < size; ++i )
       {
          double temp = *array++ - mean;
          sum += temp * temp;
       }
       if ( size > 1 )
       {
          return sqrt(sum / (size - 1));
       }
       return -1;
    }
    
    int main(void)
    {
       double array[] = {1,2,3,4,5,6,7,8,9,10};
       printf("average() = %g\n", average(array, sizeof array / sizeof *array));
       printf("stddev()  = %g\n", stddev(array,  sizeof array / sizeof *array));
       return 0;
    }
    
    /* my output
    average() = 5.5
    stddev()  = 3.02765
    */
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  8. #8
    Registered User
    Join Date
    Mar 2004
    Posts
    536
    Quote Originally Posted by crepincdotcom
    Dave, the formula I provided is not the proper one. It is correctly inplemented in the code, I check that many times over.
    You can check it as many times as you want to. It's still wrong.

    You could implement it the way I suggested, by putting the following block just after your loop in standev:

    Code:
      {
        double top2;
        top2 = 0;
        for (i = 0; i < N; i++) {
          top2 += x[i]*x[i];
        }
        top2 = sqrt((top2 - (double)N*avgx*avgx)/(double)N);
        printf("By corrected formula: std dev = %f\n", top2);
      }
    For your data, I got

    By corrected formula: std dev = 5.766281
    Note that this is (probably) not the way I would have implemented it, but I wanted to use terminology similar to yours so that you could compare your code with your description and then compare with this.


    Quote Originally Posted by crepincdotcom
    Dave_Sinkula, you can see in the debug out that that is not the issue. All the little parts are there correctly, avgx, avgy, etc, but the final incrementation of top is incorrect.
    Variables sumx, sumy and sumxy in your getavg() function should be initialized. This might have no bearing on your present problem, but somehow, someday you will have other problems due to uninitialized automatic variables. When I compiled your code with gcc, I found that the values are not initialized and all calculations in getavg() are incorrect. Unlike global variables, compilers aren't required to initialize these to zero.

    Quote Originally Posted by crepincdotcom
    The line
    Code:
    double top=0;
    Does that need to be
    Code:
    double top=0.0;
    Some people might recommend that as a matter of style, but in fact it makes no difference. The assignment automatically promotes an integer 0 to a double 0.0.




    Regards,

    Dave

  9. #9
    Registered User crepincdotcom's Avatar
    Join Date
    Oct 2003
    Posts
    94
    Thanks all... I think I'll go cry now.

    Just kidding, but my code is pretty bad isn't it.

    Dave_Sinkula, your code is very pretty and I will test it tonight. But your use of pointers and (not sure of terminology) "dynamic" or "un-predefined" arrays is somewhat beyond me. Is there a good place I can read about these? For instance, ultimatly I'd like to read in the numbers from a file, but in that case I would have to change the size of the array, since it couldn't be compiled in...

    Thanks all
    -Jack C
    jack {at} crepinc.com
    http://www.crepinc.com

  10. #10
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    If there is some maximum size, you could do this.
    Code:
    int main(void)
    {
       static const char filename[] = "file.csv";
       FILE *file = fopen(filename, "r");
       if ( file )
       {
          double x [ 35 ], y [ sizeof x / sizeof *x ];
          size_t i;
          for ( i = 0; i < sizeof x / sizeof *x; ++i )
          {
             if ( fscanf(file, "%lf,%lf", &x[i], &y[i]) != 2 )
             {
                break;
             }
          }
          fclose(file);
          printf("average() = %g\n", average(x, i));
          printf("stddev()  = %g\n", stddev (x, i));
          printf("average() = %g\n", average(y, i));
          printf("stddev()  = %g\n", stddev (y, i));
       }
       else
       {
          perror(filename);
       }
       return 0;
    }
    
    /* file.csv
    3,0
    4,1
    5,2
    6,3
    7,4
    8,5
    9,6
    10,7
    11,8
    12,9
    13,10
    14,11
    15,12
    16,13
    17,14
    18,15
    19,16
    20,17
    21,18
    22,19
    */
    
    /* my output
    average() = 12.5
    stddev()  = 5.91608
    average() = 9.5
    stddev()  = 5.91608
    */
    Last edited by Dave_Sinkula; 07-19-2005 at 11:59 AM.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  11. #11
    Registered User crepincdotcom's Avatar
    Join Date
    Oct 2003
    Posts
    94
    Thanks
    -Jack C
    jack {at} crepinc.com
    http://www.crepinc.com

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Math
    By knightjp in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 04-01-2009, 05:36 PM
  2. Help with C++ Math
    By aonic in forum C++ Programming
    Replies: 4
    Last Post: 01-29-2005, 04:40 AM
  3. Basic Math Problem. Undefined Math Functions
    By gsoft in forum C Programming
    Replies: 1
    Last Post: 12-28-2004, 03:14 AM
  4. Math Header?
    By Rune Hunter in forum C++ Programming
    Replies: 26
    Last Post: 09-17-2004, 06:39 AM
  5. toughest math course
    By axon in forum A Brief History of Cprogramming.com
    Replies: 12
    Last Post: 10-28-2003, 10:06 PM