Bug #47280 | strange results from count(*) with order by multiple columns without where/group | ||
---|---|---|---|
Submitted: | 11 Sep 2009 18:32 | Modified: | 18 Dec 2009 13:21 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 4.0, 5.0, 5.1 | OS: | Any |
Assigned to: | Jørgen Løland | CPU Architecture: | Any |
[11 Sep 2009 18:32]
Shane Bester
[11 Sep 2009 18:47]
MySQL Verification Team
Seems to only happen when 'Select tables optimized away' is seen to satisfy the query. Of course innodb cannot optimize count(*), so it returns the result by counting each row.
[12 Sep 2009 12:37]
Valeriy Kravchuk
Thank you for the bug report. Verified just as described. Note that one needs non-strict sql_mode for this: mysql> insert into t3 values(),(); Query OK, 2 rows affected, 1 warning (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 to insert rows without error message.
[14 Sep 2009 5:02]
Roel Van de Paar
add: set sql_mode='' to top of testcase
[14 Sep 2009 5:06]
Roel Van de Paar
Interesting to note is that in both case 2 rows are shown by explain: mysql> explain select count(*) as b from t3 order by b,a; #2 row +----+-------------+-------+------+---------------+------+---------+------+------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-----------------+ | 1 | SIMPLE | t3 | ALL | NULL | NULL | NULL | NULL | 2 | Using temporary | +----+-------------+-------+------+---------------+------+---------+------+------+-----------------+ 1 row in set (0.00 sec) mysql> alter table t3 engine=innodb; Query OK, 2 rows affected (0.04 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> explain select count(*) as b from t3 order by b,a; #1 row +----+-------------+-------+------+---------------+------+---------+------+------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-----------------+ | 1 | SIMPLE | t3 | ALL | NULL | NULL | NULL | NULL | 2 | Using temporary | +----+-------------+-------+------+---------------+------+---------+------+------+-----------------+ 1 row in set (0.00 sec)
[25 Sep 2009 6:49]
Jørgen Løland
Also repeatable in mysql-6.0-codebase
[25 Sep 2009 12:47]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/84648 3133 Jorgen Loland 2009-09-25 Bug#47280 - strange results from count(*) with order by multiple columns without where/group Bug#46518 - Removing ORDER BY from aggregate query changes result set (produces extra rows) Simple SELECT with implicit grouping returns many rows if the ORDER BY clause contains the aggregate function from the SELECT list. Queries with implicit grouping should only return a single record. Function opt_sum_query is called if a query performs grouping but has no fields in the group list. The function tries to substitute count(), min() and max() aggregate functions with constants. In the queries that failed, opt_sum_query found a non-aggregate field and gave up the optimization. However, non-aggregate fields can be substituted with a const in this case: When the select list contains non-aggregate fields not in the group list, the server is free to return any value from the group, so the value in such fields is indeterminate unless all values are the same. @ mysql-test/r/func_group.result Test for BUG#47280 @ mysql-test/t/func_group.test Test for BUG#47280 @ sql/opt_sum.cc Function opt_sum_func gave up const optimization of min/max/count if a non-aggregate field was encountered. This was incorrect since this optimization is only done if the group list is empty (hence, only zero or one row is returned), and the server is free to choose any of the field values in the group. @ sql/sql_select.cc Improved comment for why min/max/count can be optimize away
[29 Sep 2009 14:25]
Martin Hansson
Review sent by email
[30 Sep 2009 12:59]
Jørgen Løland
The above patch fixes the bug by not giving up const substitution if a non-aggregate field is found in opt_sum.cc#opt_sum_query(). Both reviewers asked something like: "OK, this particular query is fixed by not giving up optimization too early. But isn't there a separate problem we're hiding now?" The answer is interesting for future reference; it is 'no' because: With the patch, the only way for opt_sum_query() to return 0 ("no errors, but could not optimize all fields to consts") is if there is an aggregate function that could not be optimized away (example: COUNT(i) where i can have NULL values). In this case, the field that could not be optimized away is not tagged as const. During JOIN::exec(), the end_select (end_send or end_send_group) function pointer is decided on based on join->sort_and_group bool variable, which in turn is true if sum_func_count>0. sum_func_count is calculated in count_field_types() like this: 14653 param->field_count=param->sum_func_count=param->func_count= 14654 param->hidden_field_count=0; and then for every aggregate field: 14663 if (! field->const_item()) 14671 param->sum_func_count++; We know that 1) if there is a field that cannot be made const, it is an aggregate function 2) if there is an aggregate function that could not be made const, then sum_func_count>0, in which case end_send_group function is used to return the rows. This function returns the correct single row since there is no group_list. If end_send had been used to return the rows of const aggregate fields, all qualifying rows would have been in the result set. This is exactly what used to happen before the patch.
[1 Oct 2009 9:30]
Jørgen Løland
BUG#46518 tagged as duplicate of this bug.
[1 Oct 2009 10:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/85334 3146 Jorgen Loland 2009-10-01 Bug#47280 - strange results from count(*) with order by multiple columns without where/group Simple SELECT with implicit grouping returns many rows if the query orders by the aggregated column in the SELECT list. Queries with implicit grouping should only return a single record. Function opt_sum_query is called if a query performs grouping but has no fields in the group list. The function tries to substitute count(), min() and max() aggregate functions with constants. In the queries that failed, opt_sum_query found a non-aggregate field and gave up the optimization. However, non-aggregate fields can be substituted with a const in this case: When the select list contains non-aggregate fields not in the group list, the server is free to return any value from the group, so the value in such fields is indeterminate unless all values are the same. @ mysql-test/r/func_group.result Test for BUG#47280. Also modifies result from existing test. @ mysql-test/t/func_group.test Test for BUG#47280 @ sql/opt_sum.cc Function opt_sum_func gave up const optimization of min/max/count if a non-aggregate field was encountered. This was incorrect since this optimization is only done if the group list is empty (hence, only zero or one row is returned), and the server is free to choose any of the field values in the group. @ sql/sql_select.cc Improved comment for why min/max/count can be optimized away
[1 Oct 2009 14:42]
Martin Hansson
Review sent by email.
[5 Oct 2009 11:42]
Jørgen Løland
Making a field const has a slightly different meaning than assumed above. Non-aggregate fields can therefore not be handled as the previous patch does. Further analysis of the queries SELECT MAX(`pk`) field2 FROM D ORDER BY field2, `pk`; #Fails - produces 100 rows. SELECT MAX(`pk`) field2 FROM D; #Correct - produces 1 row Reveals that only the first query enters this block in JOIN::exec: 1778 /* Create a tmp table if distinct or if the sort is too complicated */ 1779 if (need_tmp) 1780 { (...) 2013 count_field_types(select_lex, &curr_join->tmp_table_param, 2014 *curr_all_fields, 0); 2015 } This count_field_types sets sum_func_count to 0 because the aggregate function in the query was made a const in opt_sum_query() (in JOIN::optimize()). Hence, at line 2021, sum_func_count=0 for the first query but 1 for the second query. 2021 if (curr_join->group || curr_join->tmp_table_param.sum_func_count || 2022 (procedure && (procedure->flags & PROC_GROUP))) 2023 { 2024 if (make_group_fields(this, curr_join)) so only the second query executes make_group_fields() on line 2024. This call is the key to get a result set with only one record as it should. The "if" on line 2021 is there to check if the query needs to perform grouping of the results. It fails because "sum_func_count==0" does not always imply "no aggregates functions".
[5 Oct 2009 12:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/85746 3155 Jorgen Loland 2009-10-05 Bug#47280 - strange results from count(*) with order by multiple columns without where/group Simple SELECT with implicit grouping returns many rows if the query orders by the aggregated column in the SELECT list. Queries with implicit grouping should only return a single record. The problem is that the "if" in JOIN:exec() that decides if execution needs to handle grouping assumes that tmp_table_param.sum_func_count==0 means that there are no aggregate functions in the query. This is incorrect if sum_func_count was recounted after optimizing the aggregate functions away. Consider two queries: 1) SELECT MAX(pk) max, pk FROM D ORDER BY max; 2) SELECT MAX(pk) max, pk FROM D; max(pk) is optimized away by opt_sum_query() in JOIN::optimize for both queries, but the sum_func_count is only recalculated for the former query in the "if(need_tmp)" block in JOIN:exec(). Hence, when evaluating whether grouping is needed in JOIN::exec(), sum_func_count is 0 for the first query and 1 for the second query. The fix for this bug is to remove the assumption that sum_func_count==0 means that there is no need for grouping. This is done by introducing variable "bool implicit_grouping" in the JOIN object. @ mysql-test/r/func_group.result Add test for BUG#47280 @ mysql-test/t/func_group.test Add test for BUG#47280 @ sql/sql_select.cc Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required. @ sql/sql_select.h Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause.
[6 Oct 2009 13:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/85902 3147 Jorgen Loland 2009-10-06 Bug#47280 - strange results from count(*) with order by multiple columns without where/group Simple SELECT with implicit grouping used to return many rows if the query was ordered by the aggregated column in the SELECT list. This was incorrect because queries with implicit grouping should only return a single record. The problem was that when JOIN:exec() decided if execution needed to handle grouping, it was assumed that sum_func_count==0 meant that there were no aggregate functions in the query. This assumption was not correct in JOIN::exec() because the aggregate functions might have been optimized away during JOIN::optimize(). The reason why queries without ordering behaved correctly was that sum_func_count is only recalculated if the optimizer chooses to use temporary tables (which it does in the ordered case). Hence, non-ordered queries were correctly treated as grouped. The fix for this bug was to remove the assumption that sum_func_count==0 means that there is no need for grouping. This was done by introducing variable "bool implicit_grouping" in the JOIN object. @ mysql-test/r/func_group.result Add test for BUG#47280 @ mysql-test/t/func_group.test Add test for BUG#47280 @ sql/opt_sum.cc Improve comment for opt_sum_query() @ sql/sql_class.h Add comment for variables in TMP_TABLE_PARAM @ sql/sql_select.cc Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required. Also added comments for: optimization of aggregate fields for implicitly grouped queries (JOIN::optimize) and choice of end_select method (JOIN::execute) @ sql/sql_select.h Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause. Also added comment to sort_and_group variable.
[7 Oct 2009 13:58]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/86018 3147 Jorgen Loland 2009-10-07 Bug#47280 - strange results from count(*) with order by multiple columns without where/group Simple SELECT with implicit grouping used to return many rows if the query was ordered by the aggregated column in the SELECT list. This was incorrect because queries with implicit grouping should only return a single record. The problem was that when JOIN:exec() decided if execution needed to handle grouping, it was assumed that sum_func_count==0 meant that there were no aggregate functions in the query. This assumption was not correct in JOIN::exec() because the aggregate functions might have been optimized away during JOIN::optimize(). The reason why queries without ordering behaved correctly was that sum_func_count is only recalculated if the optimizer chooses to use temporary tables (which it does in the ordered case). Hence, non-ordered queries were correctly treated as grouped. The fix for this bug was to remove the assumption that sum_func_count==0 means that there is no need for grouping. This was done by introducing variable "bool implicit_grouping" in the JOIN object. @ mysql-test/r/func_group.result Add test for BUG#47280 @ mysql-test/t/func_group.test Add test for BUG#47280 @ sql/opt_sum.cc Improve comment for opt_sum_query() @ sql/sql_class.h Add comment for variables in TMP_TABLE_PARAM @ sql/sql_select.cc Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required. Also added comments for: optimization of aggregate fields for implicitly grouped queries (JOIN::optimize) and choice of end_select method (JOIN::execute) @ sql/sql_select.h Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause. Also added comment to sort_and_group variable.
[14 Oct 2009 8:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/86762 3166 Jorgen Loland 2009-10-14 Bug#47280 - strange results from count(*) with order by multiple columns without where/group Simple SELECT with implicit grouping used to return many rows if the query was ordered by the aggregated column in the SELECT list. This was incorrect because queries with implicit grouping should only return a single record. The problem was that when JOIN:exec() decided if execution needed to handle grouping, it was assumed that sum_func_count==0 meant that there were no aggregate functions in the query. This assumption was not correct in JOIN::exec() because the aggregate functions might have been optimized away during JOIN::optimize(). The reason why queries without ordering behaved correctly was that sum_func_count is only recalculated if the optimizer chooses to use temporary tables (which it does in the ordered case). Hence, non-ordered queries were correctly treated as grouped. The fix for this bug was to remove the assumption that sum_func_count==0 means that there is no need for grouping. This was done by introducing variable "bool implicit_grouping" in the JOIN object. @ mysql-test/r/func_group.result Add test for BUG#47280 @ mysql-test/t/func_group.test Add test for BUG#47280 @ sql/opt_sum.cc Improve comment for opt_sum_query() @ sql/sql_class.h Add comment for variables in TMP_TABLE_PARAM @ sql/sql_select.cc Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required. Also added comments for: optimization of aggregate fields for implicitly grouped queries (JOIN::optimize) and choice of end_select method (JOIN::execute) @ sql/sql_select.h Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause. Also added comment to sort_and_group variable.
[14 Oct 2009 8:55]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/86765 3655 Jorgen Loland 2009-10-14 [merge] Merge BUG#47280 to mysql-pe from mysql-5.1-bugteam @ mysql-test/r/func_group.result Add test for BUG#47280 @ mysql-test/t/func_group.test Add test for BUG#47280 @ sql/opt_sum.cc Improve comment for opt_sum_query() @ sql/sql_class.h Add comment for variables in TMP_TABLE_PARAM @ sql/sql_select.cc Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required. Also added comments for: optimization of aggregate fields for implicitly grouped queries (JOIN::optimize) and choice of end_select method (JOIN::execute) @ sql/sql_select.h Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause. Also added comment to sort_and_group variable.
[14 Oct 2009 8:57]
Jørgen Løland
Pushed to mysql-5.1-bugteam and merged to mysql-pe
[14 Oct 2009 16:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/86853 3168 Jorgen Loland 2009-10-14 Followup patch for BUG#47280 Temporary tables may set join->group to 0 even though there is grouping. Also need to test if sum_func_count>0 when JOIN::exec() decides whether to present results in a grouped manner. @ sql/sql_select.cc Temporary tables may set join->group to 0 even though there is grouping. Also need to test if sum_func_count>0 when JOIN::exec() decides whether to present results in a grouped manner.
[15 Oct 2009 11:49]
Bugs System
Pushed into 5.1.41 (revid:joro@sun.com-20091015114812-i3gm8km6gfruny5x) (version source revid:jorgen.loland@sun.com-20091014162001-s9fs95q5qu3a5m3t) (merge vers: 5.1.40) (pib:13)
[15 Oct 2009 23:43]
Paul DuBois
Noted in 5.1.41 changelog. A simple SELECT with implicit grouping could return many rows rather than a single row if the query was ordered by the aggregated column in the select list. Setting report to NDI pending push into 5.5.x+.
[22 Oct 2009 6:33]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091019135554-s1pvptt6i750lfhv) (merge vers: 6.0.14-alpha) (pib:13)
[22 Oct 2009 7:05]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091022060553-znkmxm0g0gm6ckvw) (version source revid:alik@sun.com-20091019131708-bc6pv55x6287a0wc) (merge vers: 5.5.0-beta) (pib:13)
[22 Oct 2009 19:56]
Paul DuBois
Noted in 5.5.0, 6.0.14 changelogs.
[18 Dec 2009 10:29]
Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:45]
Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 11:00]
Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:14]
Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)
[18 Dec 2009 13:21]
MC Brown
Already noted in earlier changelogs.