Bug #3190 Standard Deviation STDDEV - 2 different calculations
Submitted: 16 Mar 2004 9:45 Modified: 28 Feb 2005 19:29
Reporter: Juliana Chua Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S4 (Feature request)
Version: OS:
Assigned to: Sergey Vojtovich CPU Architecture:Any
Triage: D5 (Feature request)

[16 Mar 2004 9:45] Juliana Chua
Description:
In GROUP BY functions, the STDDEV is being used for calculating standard deviation. Truth is there are 2 ways of calculating standard deviation. The one that is implemented by mysql is using the "divisor n" formula. The second type of calculation is using the "divisor n-1" formula. 

Technically both the formulas should be implemented. If you look into other statistical packages, both type of calculations are being considered for the formula with "divisor n" is mostly used in introductory courses wheresle "divisor n-1" is met more than often when studying statistical inference. 

How to repeat:
Try using the STDEV function declared in mysql and do a calculation. The answer that is fed back is using the "divisor n". 

Suggested fix:
I'm suggesting this bug to be fix else it will end up misleading users using the STDDEV function. (for example in excel, the way of calculation STDDEV is doing right now is known as STDEVP). Here's a proposal of the script i have found useful and hopefully will be implemented into mysql GROUP BY functions hence users in future can use STDDEV or STDEVP. 

function standard_deviation($std) 
{ 
	$total; 
        while(list($key,$val) = each($std)) 
        { 
		$total += $val; 
        } 
        reset($std); 
        $mean = $total/count($std); 
         
        while(list($key,$val) = each($std)) 
        { 
        	$sum += pow(($val-$mean),2); 
        } 
        $var = sqrt($sum/(count($std)-1)); 
        return $var; 
}
[16 Mar 2004 11:14] Sergei Golubchik
it's a duplicate of Bug#1270,
but as the issue is rather trivial we'll consider the adding the second variant of stddev
[15 Feb 2005 22:04] Vivienne Boogaard
I think it is very important to have both the sample and the population standard deviation calculations available to use.  I now have to RE your std() by squaring, multiplying by the population, dividing by the sample and then taking the SQRT().  This then matches other applications such as Excel.

Thanks,
Vivienne
[28 Feb 2005 14:02] Sergey Vojtovich
Fixed in 5.0.3
[28 Feb 2005 19:29] Paul Dubois
Noted in 5.0.3 changelog.

Function descriptions were added to:

http://dev.mysql.com/doc/mysql/en/group-by-functions.html