Bug #3190 Standard Deviation STDDEV - 2 different calculations
Submitted: 16 Mar 2004 10:45 Modified: 28 Feb 2005 20:29
Reporter: Juliana Chua
Status: Closed
Category:Server Severity:S4 (Feature request)
Version: OS:
Assigned to: Sergey Vojtovich Target Version:
Triage: D5 (Feature request)

[16 Mar 2004 10:45] Juliana Chua
Description:
In GROUP BY functions, the STDDEV is being used for calculating standard deviation. Truth
is there are 2 ways of calculating standard deviation. The one that is implemented by
mysql is using the "divisor n" formula. The second type of calculation is using the
"divisor n-1" formula. 

Technically both the formulas should be implemented. If you look into other statistical
packages, both type of calculations are being considered for the formula with "divisor n"
is mostly used in introductory courses wheresle "divisor n-1" is met more than often when
studying statistical inference. 

How to repeat:
Try using the STDEV function declared in mysql and do a calculation. The answer that is
fed back is using the "divisor n". 

Suggested fix:
I'm suggesting this bug to be fix else it will end up misleading users using the STDDEV
function. (for example in excel, the way of calculation STDDEV is doing right now is
known as STDEVP). Here's a proposal of the script i have found useful and hopefully will
be implemented into mysql GROUP BY functions hence users in future can use STDDEV or
STDEVP. 

function standard_deviation($std) 
{ 
	$total; 
        while(list($key,$val) = each($std)) 
        { 
		$total += $val; 
        } 
        reset($std); 
        $mean = $total/count($std); 
         
        while(list($key,$val) = each($std)) 
        { 
        	$sum += pow(($val-$mean),2); 
        } 
        $var = sqrt($sum/(count($std)-1)); 
        return $var; 
}
[16 Mar 2004 12:14] Sergei Golubchik
it's a duplicate of Bug#1270,
but as the issue is rather trivial we'll consider the adding the second variant of stddev
[15 Feb 2005 23:04] Vivienne Boogaard
I think it is very important to have both the sample and the population standard deviation
calculations available to use.  I now have to RE your std() by squaring, multiplying by
the population, dividing by the sample and then taking the SQRT().  This then matches
other applications such as Excel.

Thanks,
Vivienne
[28 Feb 2005 15:02] Sergey Vojtovich
Fixed in 5.0.3
[28 Feb 2005 20:29] Paul DuBois
Noted in 5.0.3 changelog.

Function descriptions were added to:

http://dev.mysql.com/doc/mysql/en/group-by-functions.html