Bug #1270 | Calculation difference | ||
---|---|---|---|
Submitted: | 12 Sep 2003 23:19 | Modified: | 13 Sep 2003 7:08 |
Reporter: | Charles Schaffner | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: User-defined functions ( UDF ) | Severity: | S3 (Non-critical) |
Version: | 4.0.14b | OS: | Windows (Windows) |
Assigned to: | CPU Architecture: | Any |
[12 Sep 2003 23:19]
Charles Schaffner
[13 Sep 2003 4:23]
Antony Curtis
Variance is defined as the square of the mean minus the mean of the squares. So where does this "N-1" come from?
[13 Sep 2003 4:25]
Antony Curtis
Oops, I meant the mean of the squares minus the square of the means.
[13 Sep 2003 7:08]
Guilhem Bichot
Quoting some dictionary: "The standard deviation of a random variable or list of numbers (the lowercase greek sigma) is the square of the variance. The standard deviation of the list x1, x2, x3...xn is given by the formula: sigma = sqrt(((x1-(avg(x)))^2 + (x1-(avg(x)))^2 + ... + (xn(avg(x)))^2)/n) The formula is used when all of the values in the population are known. If the values x1...xn are a random sample chosen from the population, then the sample Standard Deviation is calculated with same formula, except that (n-1) is used as the denominator." When you have a random sample, the mean of the rows is not exactly the mean of the population, so easy calculus shows that you need 1/(n-1) for the variance (calculated on the sample) to be an "unbiased estimate" of the variance of the population. In MySQL, VARIANCE() (which is the square of STDDEV()) has 1/n, it considers the rows as the population (I'm documenting this now). Excel has VAR() (sample variance) and VARP() (population variance).
[13 Sep 2003 13:09]
Charles Schaffner
Thanks for the very fast response. We use it for lab results and need the n-1 value so I will make a small function that will take your result and square it and mulitply it by n/(n-1) and then take the square root. Thanks again cas7@cox.net
[28 Feb 2005 19:30]
Paul DuBois
VAR_SAMP() and STDDEV_SAMP() functions for calculating sample variance and standard deviation values have been added to MySQL 5.0.3.