Calculating Variance

This is a discussion on Calculating Variance within the C++ Programming forums, part of the General Programming Boards category; Hey All, I am stuck! Why in the world does this calculate the estimated(population) variance rather than the actual variance: ...

  1. #1
    Registered User
    Join Date
    Dec 2008
    Posts
    65

    Question Calculating Variance

    Hey All,

    I am stuck! Why in the world does this calculate the estimated(population) variance rather than the actual variance:

    Code:
    double calculateVariance(StudentsGrades obj[], double meanAvg)
    {
        double variance;
    
        for(int index = 0; index < MAX_NUM_STUDENTS; index++)
        {
            if((obj[index].getStudentAvg() == 0) || (obj[index].getStudentAvg() == meanAvg))
                variance += 0.0;
            else variance += pow(obj[index].getStudentAvg() - meanAvg, 2);
        }
        variance = variance / TOTAL_STUDENTS; // TOTAL_STUDENTS is one less than
                                              // MAX_NUM_STUDENTS because
                                              // the data file has titles for each column
        return variance;
    }
    Any help would be appreciated. I use this function to calculate the standard deviation.
    I have googled every where and can't find the solution and I have used a calculator and
    MS Excel to check the numbers, and they are off.

    Cheers!

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,628
    I would initialise variance to 0.0.

    By the way, the if statement is pointless. You could make it useful by negating the condition and doing what's in the current else branch without doing what's in the current if branch, but that does not seem necessary anyway. You probably wanted to avoid the cost of the pow(), but you can do that simply by multiplication, e.g.,
    Code:
    double diff = obj[index].getStudentAvg() - meanAvg;
    variance += diff * diff;
    EDIT:
    Oh, sorry, the if statement is not pointless. In that case, negate the condition and do what is in the current else branch. There simply is no point in adding 0.0 to variance.
    Last edited by laserlight; 01-04-2009 at 02:59 AM.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Dec 2008
    Posts
    65

    Exclamation

    Okay, the code now reads as such:

    Code:
    double calculateVariance(StudentsGrades obj[], double meanAvg)
    {
        double variance = 0.0;
    
        for(int index = 0; index < MAX_NUM_STUDENTS; index++)
        {
            if(!(obj[index].getStudentAvg() == 0) || 
               !(obj[index].getStudentAvg() == meanAvg))
                    variance += (obj[index].getStudentAvg() - meanAvg) *
                                (obj[index].getStudentAvg() - meanAvg);
        }
        variance = variance / TOTAL_STUDENTS; // TOTAL_STUDENTS is one less than
                                              // MAX_NUM_STUDENTS because
                                              // the data file has titles for each column
        return variance;
    }
    The numbers are closer but they're still off by about 0.0043, I think I am going to call it good enough since these number are just approximations anyways.

    Thinks for your help laserlight!

    Cheers!

    Why in the world did changing the code as such, change the outcome at all, because there is essentially no difference what my initial post does and what the changes do?
    Now I am almost totally baffled.
    Last edited by Phyxashun; 01-04-2009 at 03:30 AM. Reason: Forgot something:

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,628
    You have a logic error though: !(A || B) = !A && !B, but you transformed it to !A || !B. The use of temporary variables can make the code much easier to read, e.g.,
    Code:
    double calculateVariance(StudentsGrades obj[], double meanAvg)
    {
        double variance = 0.0;
    
        for(int index = 0; index < MAX_NUM_STUDENTS; index++)
        {
            double studentAvg = obj[index].getStudentAvg();
            if(studentAvg != 0 && studentAvg != meanAvg)
            {
                double diff = studentAvg - meanAvg;
                variance += diff * diff;
            }
        }
    
        // TOTAL_STUDENTS is one less than MAX_NUM_STUDENTS
        // because the data file has titles for each column
        return variance / TOTAL_STUDENTS;
    }
    Quote Originally Posted by Phyxashun
    The numbers are closer but they're still off by about 0.0043, I think I am going to call it good enough since these number are just approximations anyways.
    It occurs to me that you are dealing with doubles, hence direct comparison with == or != can lead to unexpected results due to inaccuracy. You would need to check the number within a small range if you want to be really accurate.

    EDIT:
    Quote Originally Posted by Phyxashun
    Why in the world did changing the code as such, change the outcome at all, because there is essentially no difference what my initial post does and what the changes do?
    Since variance was not initialised, it would have been initialised with a garbage value that might not be 0, thus leading to an error in your calculations.
    Last edited by laserlight; 01-04-2009 at 03:35 AM.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User
    Join Date
    Dec 2008
    Posts
    65
    My wife always says that my logic is off! Thanks, that last note almost completely solved the problem, at least by my standards because now it is only off by 0.003.

    Thanks again!

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,628
    No problem

    By the way, I would suggest that you use a std::vector<StudentsGrades> instead of a fixed size array. This would make calculateVariance() more reusable (and in the future it would be even more reusable if calculateVariance() was a function template and had an iterator pair as parameters, but that may be too advanced at the moment).
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Registered User
    Join Date
    Dec 2008
    Posts
    65
    It's funny that you mention that. I was fooling around with it as a vector but started to get frustrated dealing with iterators. I'll continue looking into that modification though, as well as the template application.

    As a side note, is there anyway besides sqrt() to get the square root of the variance?

    I am told, and have read, that cmath has a lot of overhead.

    Cheers!

    EDIT: I guess a better question would be:
    Are the math libraries in boost any better at speed, and memory consumption, and portability?
    Last edited by Phyxashun; 01-04-2009 at 03:56 AM. Reason: After thought

  8. #8
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,628
    Quote Originally Posted by Phyxashun
    As a side note, is there anyway besides sqrt() to get the square root of the variance?

    I am told, and have read, that cmath has a lot of overhead.
    Suppose there is such an overhead: does it affect you? In other words, does the use of sqrt() cause your program to fail its performance requirements?

    Quote Originally Posted by Phyxashun
    Are the math libraries in boost any better at speed, and memory consumption, and portability?
    Better than your own? Probably. Better than a standard library component? Well, the only valid comparison would be in cases where it provides alternatives to the standard library.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  9. #9
    Registered User
    Join Date
    Dec 2008
    Posts
    65
    No, the inclusion of cmath does not affect my computer. I think the jist of what I meant was, is there a function available that can do the sqrt().

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Calculating : high numbers
    By MiraX33 in forum C++ Programming
    Replies: 9
    Last Post: 06-08-2006, 11:08 PM
  2. Calculating CPU Usage
    By vitaliy in forum Linux Programming
    Replies: 3
    Last Post: 08-21-2005, 09:38 AM
  3. Recursion
    By Lionmane in forum C Programming
    Replies: 11
    Last Post: 06-04-2005, 12:00 AM
  4. calculating the variance of random numbers
    By Unregistered in forum C Programming
    Replies: 18
    Last Post: 11-22-2004, 07:16 AM
  5. Taking input while calculating
    By Unregistered in forum C Programming
    Replies: 1
    Last Post: 07-12-2002, 04:47 PM

Tags for this Thread


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21