| Bug #1270 | Calculation difference | ||
|---|---|---|---|
| Submitted: | 12 Sep 2003 23:19 | Modified: | 13 Sep 2003 7:08 | 
| Reporter: | Charles Schaffner | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: User-defined functions ( UDF ) | Severity: | S3 (Non-critical) | 
| Version: | 4.0.14b | OS: | Windows (Windows) | 
| Assigned to: | CPU Architecture: | Any | |
   [12 Sep 2003 23:19]
   Charles Schaffner        
  
 
   [13 Sep 2003 4:23]
   Antony Curtis        
  Variance is defined as the square of the mean minus the mean of the squares. So where does this "N-1" come from?
   [13 Sep 2003 4:25]
   Antony Curtis        
  Oops, I meant the mean of the squares minus the square of the means.
   [13 Sep 2003 7:08]
   Guilhem Bichot        
  Quoting some dictionary:
"The standard deviation of a random variable or list of numbers (the lowercase greek sigma) is the square of the variance. The standard deviation of the list x1, x2, x3...xn is given by the formula:
 sigma = sqrt(((x1-(avg(x)))^2 + (x1-(avg(x)))^2 +
              ...  + (xn(avg(x)))^2)/n)
The formula is used when all of the values in the population are known. If the values x1...xn are a random sample chosen from the population, then the sample Standard Deviation is calculated with same formula, except that (n-1) is used as the denominator."
When you have a random sample, the mean of the rows is not exactly the mean of the population, so easy calculus shows that you need 1/(n-1) for the variance (calculated on the sample) to be an "unbiased estimate" of the variance of the population.
In MySQL, VARIANCE() (which is the square of STDDEV()) has 1/n, it considers the rows as the population (I'm documenting this now). Excel has VAR() (sample variance) and VARP() (population variance).
 
   [13 Sep 2003 13:09]
   Charles Schaffner        
  Thanks for the very fast response. We use it for lab results and need the n-1 value so I will make a small function that will take your result and square it and mulitply it by n/(n-1) and then take the square root. Thanks again cas7@cox.net
   [28 Feb 2005 19:30]
   Paul DuBois        
  VAR_SAMP() and STDDEV_SAMP() functions for calculating sample variance and standard deviation values have been added to MySQL 5.0.3.

