Bug #46567 Aggregate function documentation needs clarification for non-GROUP BY behavior
Submitted: 5 Aug 2009 14:24 Modified: 7 Aug 2009 1:26
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.0, 5.1, 5.4/6.0 OS:Any
Assigned to: Paul DuBois CPU Architecture:Any
Tags: Aggregate Function, GROUP BY

[5 Aug 2009 14:24] Patrick Crews
Description:
The documentation for aggregate functions needs some clarification.

From
http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.

This does not clearly state what to expect results-wise.  In these cases, a single row is returned - the aggregate value and arbitrary values for the non-aggregates (which can change depending on things like optimizer switches, etc).

The 'equivalent to grouping on all rows' statement is problematic - one expects that aggregates operate on all rows and someone might interpret it as a mistake for 'grouping on all fields' (which would return different results - see below

How to repeat:
Read the document at here:
http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html

create table t1(a int);
insert into t1 values(1),(2),(3),(4),(5);
create table t2(b int);
insert into t2 values(4),(5),(6),(7);

select max(a),b from t1,t2;
+--------+------+
| max(a) | b    |
+--------+------+
|      5 |    4 | 
+--------+------+

select max(a),b from t1,t2 group by b
    -> ;
+--------+------+
| max(a) | b    |
+--------+------+
|      5 |    4 | 
|      5 |    5 | 
|      5 |    6 | 
|      5 |    7 | 
+--------+------+

Suggested fix:
Describe the proper server behavior more clearly.
[7 Aug 2009 1:26] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

"This does not clearly state what to expect results-wise.  In these cases, a single row is
returned - the aggregate value and arbitrary values for the non-aggregates (which can
change depending on things like optimizer switches, etc)."

This is explained in http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html, so I'll add a cross reference to that section.

"The 'equivalent to grouping on all rows' statement is problematic - one expects that
aggregates operate on all rows and someone might interpret it as a mistake for 'grouping
on all fields' (which would return different results - see below"

I don't think it's reasonable to suppose that a user will think that "row" means "field" (column), so I've made no change here. But if you have an alternative phrasing that you think is better, by all means please suggest it.