Bug #1270 Calculation difference
Submitted: 13 Sep 2003 1:19 Modified: 13 Sep 2003 9:08
Reporter: Charles Schaffner
Status: Not a Bug
Category:Server: UDF Severity:S3 (Non-critical)
Version:4.0.14b OS:Microsoft Windows (Windows)
Assigned to: Target Version:

[13 Sep 2003 1:19] Charles Schaffner
Description:
The Standard deviation function appears to be using "N" in the denominator when I beleive
it should be using "N-1". I have verified the results using both manual computaion and
also Excel. 

Thanks for the support and such a great product
Chuck Schaffner
cas7@cox.net

How to repeat:
any random set of numbers. calculate standard deviation and the verify the resut with
excel.

Suggested fix:
Change function
[13 Sep 2003 6:23] Antony Curtis
Variance is defined as the square of the mean minus the mean of the squares.
So where does this "N-1" come from?
[13 Sep 2003 6:25] Antony Curtis
Oops, I meant the mean of the squares minus the square of the means.
[13 Sep 2003 9:08] Guilhem Bichot
Quoting some dictionary:
"The standard deviation of a random variable or list of numbers (the lowercase greek
sigma) is the square of the variance. The standard deviation of the list x1, x2, x3...xn
is given by the formula:

 sigma = sqrt(((x1-(avg(x)))^2 + (x1-(avg(x)))^2 +
              ...  + (xn(avg(x)))^2)/n)

The formula is used when all of the values in the population are known. If the values
x1...xn are a random sample chosen from the population, then the sample Standard
Deviation is calculated with same formula, except that (n-1) is used as the
denominator."

When you have a random sample, the mean of the rows is not exactly the mean of the
population, so easy calculus shows that you need 1/(n-1) for the variance (calculated on
the sample) to be an "unbiased estimate" of the variance of the population.
In MySQL, VARIANCE() (which is the square of STDDEV()) has 1/n, it considers the rows as
the population (I'm documenting this now). Excel has VAR() (sample variance) and VARP()
(population variance).
[13 Sep 2003 15:09] Charles Schaffner
Thanks for the very fast response. We use it for lab results and need the n-1 value so I
will make a small function that will take your result and square it and mulitply it by
n/(n-1) and then take the square root.

Thanks again
cas7@cox.net
[28 Feb 2005 20:30] Paul DuBois
VAR_SAMP() and STDDEV_SAMP() functions for
calculating sample variance and standard deviation
values have been added to MySQL 5.0.3.