# calculating the mean

• 08-27-2007
bartleby84
calculating the mean
I need an algorithm to calculate the mean of a list of values. The values are some CPU times, some of these values are very different from the rest. I want to calculate the mean without these values. Anyone knows about an algorithm to make this??

Thank you.
• 08-27-2007
Moony
What do you mean by very different?
• 08-27-2007
zacs7
Something like:
Code:

```#define CPUMAX 60 #define CPUMIN 40 int mylist[] = {55, 27, 4, 11, 47, 59}; int i = 0; unsigned long int total = 0; size_t counted = 0; for(i = 0; i < sizeof(mylist) / sizeof(mylist[0]); i++) {     /* check if mylist[i] is in bound of your acceptable values */     if(mylist[i] >= CPUMIN && mylist[i] <= CPUMAX)     {         total += mylist[i];         counted++;     } } printf("The mean is &#37;d", total / counted);```
Or you can use variance (Stats), I dunno I'm not that great with statistics.
• 08-27-2007
bartleby84
I mean for example that all the values are around 0.0001 and a few 0.0008 for example or bigger. I think I'm going to calculate the variance for each value and try to mantain the variance very close to 0. Thanks for the answers!
• 08-27-2007
JFonseka
He means outliers, it's been a while since I did statistical maths, but you don't really need it, you just need to set some if statement exceptions i guess?

Code:

```if(value < x){ add to array and do mean calculation }```
• 08-27-2007
bartleby84
Yes is that! But the main problem is tu set the limit where a priori you don`t know. I'm searching for a method based on minimal variance. ;) Thanks for your attention!
• 08-27-2007
grumpy
The way to minimise your variance is to only count one value ;) If you do that, the variance is zero by definition.

Seriously, you need to specify some criterion by which you can identify an outlier. Minimising variance of the set of values you retain is not a suitable criterion.
• 08-27-2007
JFonseka
Well following from what grumpy said, define the bounds for which the values become outliers based on the greatest and least of the values that are not outliers.

Suppose we have a list of values:

2,50,75,98,101,467

We can see that 2 and 467 are outliers

So we can estimate a range for which we can expect outliers to be in.

Suppose we say, that values greater than the double of the greatest number not an outlier, which means 202 is the upper bound and a value that is half of the least number not an outlier which is anything less than 25.

This is ofcourse dependent on what the values really mean, here it's just arbitrary, I don't know what CPU times look like exactly, but I'm guessing they are small, so you must take a close look at what values are generated and run a series of tests to see the different values you do get and based on that estimate what the outliers could be.

There probably is a much better way to do this if you look at the equations for standard deviation and some equations off statistical maths, they aren't very hard to understand, I don't know about implementing them though
• 08-27-2007
zacs7
Or just do what JFonseka said (which my example outlines)...
• 08-27-2007
brewbuck
Quote:

Originally Posted by bartleby84
I need an algorithm to calculate the mean of a list of values. The values are some CPU times, some of these values are very different from the rest. I want to calculate the mean without these values. Anyone knows about an algorithm to make this??

Thank you.

Can I ask why you want to compute the mean of a bunch of CPU times? It's a common fallacy to think that you should run a task N times and then average the times -- what you should really do is simply take the smallest.