Calculating Variance

• 01-04-2009
Phyxashun
Calculating Variance
Hey All,

I am stuck! Why in the world does this calculate the estimated(population) variance rather than the actual variance:

Code:

```double calculateVariance(StudentsGrades obj[], double meanAvg) {     double variance;     for(int index = 0; index < MAX_NUM_STUDENTS; index++)     {         if((obj[index].getStudentAvg() == 0) || (obj[index].getStudentAvg() == meanAvg))             variance += 0.0;         else variance += pow(obj[index].getStudentAvg() - meanAvg, 2);     }     variance = variance / TOTAL_STUDENTS; // TOTAL_STUDENTS is one less than                                           // MAX_NUM_STUDENTS because                                           // the data file has titles for each column     return variance; }```
Any help would be appreciated. I use this function to calculate the standard deviation.
I have googled every where and can't find the solution and I have used a calculator and
MS Excel to check the numbers, and they are off.

Cheers!
• 01-04-2009
laserlight
I would initialise variance to 0.0.

By the way, the if statement is pointless. You could make it useful by negating the condition and doing what's in the current else branch without doing what's in the current if branch, but that does not seem necessary anyway. You probably wanted to avoid the cost of the pow(), but you can do that simply by multiplication, e.g.,
Code:

```double diff = obj[index].getStudentAvg() - meanAvg; variance += diff * diff;```
EDIT:
Oh, sorry, the if statement is not pointless. In that case, negate the condition and do what is in the current else branch. There simply is no point in adding 0.0 to variance.
• 01-04-2009
Phyxashun
Okay, the code now reads as such:

Code:

```double calculateVariance(StudentsGrades obj[], double meanAvg) {     double variance = 0.0;     for(int index = 0; index < MAX_NUM_STUDENTS; index++)     {         if(!(obj[index].getStudentAvg() == 0) ||           !(obj[index].getStudentAvg() == meanAvg))                 variance += (obj[index].getStudentAvg() - meanAvg) *                             (obj[index].getStudentAvg() - meanAvg);     }     variance = variance / TOTAL_STUDENTS; // TOTAL_STUDENTS is one less than                                           // MAX_NUM_STUDENTS because                                           // the data file has titles for each column     return variance; }```
The numbers are closer but they're still off by about 0.0043, I think I am going to call it good enough since these number are just approximations anyways.

Cheers!

Why in the world did changing the code as such, change the outcome at all, because there is essentially no difference what my initial post does and what the changes do?
Now I am almost totally baffled.
• 01-04-2009
laserlight
You have a logic error though: !(A || B) = !A && !B, but you transformed it to !A || !B. The use of temporary variables can make the code much easier to read, e.g.,
Code:

```double calculateVariance(StudentsGrades obj[], double meanAvg) {     double variance = 0.0;     for(int index = 0; index < MAX_NUM_STUDENTS; index++)     {         double studentAvg = obj[index].getStudentAvg();         if(studentAvg != 0 && studentAvg != meanAvg)         {             double diff = studentAvg - meanAvg;             variance += diff * diff;         }     }     // TOTAL_STUDENTS is one less than MAX_NUM_STUDENTS     // because the data file has titles for each column     return variance / TOTAL_STUDENTS; }```
Quote:

Originally Posted by Phyxashun
The numbers are closer but they're still off by about 0.0043, I think I am going to call it good enough since these number are just approximations anyways.

It occurs to me that you are dealing with doubles, hence direct comparison with == or != can lead to unexpected results due to inaccuracy. You would need to check the number within a small range if you want to be really accurate.

EDIT:
Quote:

Originally Posted by Phyxashun
Why in the world did changing the code as such, change the outcome at all, because there is essentially no difference what my initial post does and what the changes do?

Since variance was not initialised, it would have been initialised with a garbage value that might not be 0, thus leading to an error in your calculations.
• 01-04-2009
Phyxashun
My wife always says that my logic is off! Thanks, that last note almost completely solved the problem, at least by my standards because now it is only off by 0.003.

Thanks again!
• 01-04-2009
laserlight
No problem :)

By the way, I would suggest that you use a std::vector<StudentsGrades> instead of a fixed size array. This would make calculateVariance() more reusable (and in the future it would be even more reusable if calculateVariance() was a function template and had an iterator pair as parameters, but that may be too advanced at the moment).
• 01-04-2009
Phyxashun
It's funny that you mention that. I was fooling around with it as a vector but started to get frustrated dealing with iterators. I'll continue looking into that modification though, as well as the template application.

As a side note, is there anyway besides sqrt() to get the square root of the variance?

I am told, and have read, that cmath has a lot of overhead.

Cheers!

EDIT: I guess a better question would be:
Are the math libraries in boost any better at speed, and memory consumption, and portability?
• 01-04-2009
laserlight
Quote:

Originally Posted by Phyxashun
As a side note, is there anyway besides sqrt() to get the square root of the variance?

I am told, and have read, that cmath has a lot of overhead.

Suppose there is such an overhead: does it affect you? In other words, does the use of sqrt() cause your program to fail its performance requirements?

Quote:

Originally Posted by Phyxashun
Are the math libraries in boost any better at speed, and memory consumption, and portability?

Better than your own? Probably. Better than a standard library component? Well, the only valid comparison would be in cases where it provides alternatives to the standard library.
• 01-04-2009
Phyxashun
No, the inclusion of cmath does not affect my computer. I think the jist of what I meant was, is there a function available that can do the sqrt().