Bug #1270 Calculation difference
Submitted: 12 Sep 2003 23:19 Modified: 13 Sep 2003 7:08
Reporter: Charles Schaffner Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: User-defined functions ( UDF ) Severity:S3 (Non-critical)
Version:4.0.14b OS:Windows (Windows)
Assigned to: CPU Architecture:Any

[12 Sep 2003 23:19] Charles Schaffner
Description:
The Standard deviation function appears to be using "N" in the denominator when I beleive it should be using "N-1". I have verified the results using both manual computaion and also Excel. 

Thanks for the support and such a great product
Chuck Schaffner
cas7@cox.net

How to repeat:
any random set of numbers. calculate standard deviation and the verify the resut with excel.

Suggested fix:
Change function
[13 Sep 2003 4:23] Antony Curtis
Variance is defined as the square of the mean minus the mean of the squares.
So where does this "N-1" come from?
[13 Sep 2003 4:25] Antony Curtis
Oops, I meant the mean of the squares minus the square of the means.
[13 Sep 2003 7:08] Guilhem Bichot
Quoting some dictionary:
"The standard deviation of a random variable or list of numbers (the lowercase greek sigma) is the square of the variance. The standard deviation of the list x1, x2, x3...xn is given by the formula:

 sigma = sqrt(((x1-(avg(x)))^2 + (x1-(avg(x)))^2 +
              ...  + (xn(avg(x)))^2)/n)

The formula is used when all of the values in the population are known. If the values x1...xn are a random sample chosen from the population, then the sample Standard Deviation is calculated with same formula, except that (n-1) is used as the denominator."

When you have a random sample, the mean of the rows is not exactly the mean of the population, so easy calculus shows that you need 1/(n-1) for the variance (calculated on the sample) to be an "unbiased estimate" of the variance of the population.
In MySQL, VARIANCE() (which is the square of STDDEV()) has 1/n, it considers the rows as the population (I'm documenting this now). Excel has VAR() (sample variance) and VARP() (population variance).
[13 Sep 2003 13:09] Charles Schaffner
Thanks for the very fast response. We use it for lab results and need the n-1 value so I will make a small function that will take your result and square it and mulitply it by n/(n-1) and then take the square root.

Thanks again
cas7@cox.net
[28 Feb 2005 19:30] Paul DuBois
VAR_SAMP() and STDDEV_SAMP() functions for
calculating sample variance and standard deviation
values have been added to MySQL 5.0.3.