Bug #47280 strange results from count(*) with order by multiple columns without where/group
Submitted: 11 Sep 2009 18:32 Modified: 18 Dec 2009 13:21
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:4.0, 5.0, 5.1 OS:Any
Assigned to: Jørgen Løland CPU Architecture:Any

[11 Sep 2009 18:32] Shane Bester
Description:
in 5.1.38, myisam, memory, archive vs innodb gives different results for:

mysql> select count(*) as b from t3 order by b,a;
+---+
| b |
+---+
| 2 |
| 2 |
+---+
2 rows in set (0.00 sec)

mysql> alter table t3 engine=innodb;
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select count(*) as b from t3 order by b,a;
+---+
| b |
+---+
| 2 |
+---+
1 row in set (0.00 sec)

How to repeat:
drop table if exists t3;
create table t3(a int not null)engine=myisam;
insert into t3 values(),();
select count(*) as b from t3;
select count(*) as b from t3 order by b,a; #2 row
alter table t3 engine=innodb;
select count(*) as b from t3 order by b,a; #1 row
[11 Sep 2009 18:47] MySQL Verification Team
Seems to only happen when 'Select tables optimized away' is seen to satisfy the query.  Of course innodb cannot optimize count(*), so it returns the result by counting each row.
[12 Sep 2009 12:37] Valeriy Kravchuk
Thank you for the bug report. Verified just as described. 

Note that one needs non-strict sql_mode for this:

mysql> insert into t3 values(),();
Query OK, 2 rows affected, 1 warning (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

to insert rows without error message.
[14 Sep 2009 5:02] Roel Van de Paar
add:
set sql_mode=''
to top of testcase
[14 Sep 2009 5:06] Roel Van de Paar
Interesting to note is that in both case 2 rows are shown by explain:

mysql> explain select count(*) as b from t3 order by b,a; #2 row
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra           |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------+
|  1 | SIMPLE      | t3    | ALL  | NULL          | NULL | NULL    | NULL |    2 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------+
1 row in set (0.00 sec)

mysql> alter table t3 engine=innodb;
Query OK, 2 rows affected (0.04 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> explain select count(*) as b from t3 order by b,a; #1 row
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra           |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------+
|  1 | SIMPLE      | t3    | ALL  | NULL          | NULL | NULL    | NULL |    2 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------+
1 row in set (0.00 sec)
[25 Sep 2009 6:49] Jørgen Løland
Also repeatable in mysql-6.0-codebase
[25 Sep 2009 12:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84648

3133 Jorgen Loland	2009-09-25
      Bug#47280 - strange results from count(*) with order by multiple 
                  columns without where/group
      Bug#46518 - Removing ORDER BY from aggregate query changes result 
                  set (produces extra rows)
      
      Simple SELECT with implicit grouping returns many rows if the 
      ORDER BY clause contains the aggregate function from the SELECT
      list. Queries with implicit grouping should only return a single
      record.
      
      Function opt_sum_query is called if a query performs grouping but
      has no fields in the group list. The function tries to substitute
      count(), min() and max() aggregate functions with constants. In
      the queries that failed, opt_sum_query found a non-aggregate
      field and gave up the optimization. 
      
      However, non-aggregate fields can be substituted with a const in
      this case: When the select list contains non-aggregate fields not
      in the group list, the server is free to return any value from
      the group, so the value in such fields is indeterminate unless
      all values are the same.
     @ mysql-test/r/func_group.result
        Test for BUG#47280
     @ mysql-test/t/func_group.test
        Test for BUG#47280
     @ sql/opt_sum.cc
        Function opt_sum_func gave up const optimization of min/max/count if a non-aggregate field was encountered. This was incorrect since this optimization is only done if the group list is empty (hence, only zero or one row is returned), and the server is free to choose any of the field values in the group.
     @ sql/sql_select.cc
        Improved comment for why min/max/count can be optimize away
[29 Sep 2009 14:25] Martin Hansson
Review sent by email
[30 Sep 2009 12:59] Jørgen Løland
The above patch fixes the bug by not giving up const substitution if a non-aggregate field is found in opt_sum.cc#opt_sum_query().

Both reviewers asked something like: "OK, this particular query is fixed by not giving up optimization too early. But isn't there a separate problem we're hiding now?" 

The answer is interesting for future reference; it is 'no' because:

With the patch, the only way for opt_sum_query() to return 0 ("no errors, but could not optimize all fields to consts") is if there is an aggregate function that could not be optimized away (example: COUNT(i) where i can have NULL values). In this case, the field that could not be optimized away is not tagged as const.

During JOIN::exec(), the end_select (end_send or end_send_group) function pointer is decided on based on join->sort_and_group bool variable, which in turn is true if sum_func_count>0. sum_func_count is calculated in count_field_types() like this:

14653    param->field_count=param->sum_func_count=param->func_count=
14654       param->hidden_field_count=0;

and then for every aggregate field:

14663          if (! field->const_item())
14671              param->sum_func_count++;

We know that
  1) if there is a field that cannot be made const, it is an aggregate function
  2) if there is an aggregate function that could not be made const, then
     sum_func_count>0, in which case end_send_group function is used to return 
     the rows. This function returns the correct single row since there is no
     group_list. 

If end_send had been used to return the rows of const aggregate fields, all qualifying rows would have been in the result set. This is exactly what used to happen before the patch.
[1 Oct 2009 9:30] Jørgen Løland
BUG#46518 tagged as duplicate of this bug.
[1 Oct 2009 10:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85334

3146 Jorgen Loland	2009-10-01
      Bug#47280 - strange results from count(*) with order by multiple 
                  columns without where/group
      
      Simple SELECT with implicit grouping returns many rows if the query
      orders by the aggregated column in the SELECT list. Queries with
      implicit grouping should only return a single record.
            
      Function opt_sum_query is called if a query performs grouping but has
      no fields in the group list. The function tries to substitute count(),
      min() and max() aggregate functions with constants. In the queries
      that failed, opt_sum_query found a non-aggregate field and gave up the
      optimization.
            
      However, non-aggregate fields can be substituted with a const in this
      case: When the select list contains non-aggregate fields not in the
      group list, the server is free to return any value from the group, so
      the value in such fields is indeterminate unless all values are the
      same.
     @ mysql-test/r/func_group.result
        Test for BUG#47280. Also modifies result from existing test.
     @ mysql-test/t/func_group.test
        Test for BUG#47280
     @ sql/opt_sum.cc
        Function opt_sum_func gave up const optimization of min/max/count if a non-aggregate field was encountered. This was incorrect since this optimization is only done if the group list is empty (hence, only zero or one row is returned), and the server is free to choose any of the field values in the group.
     @ sql/sql_select.cc
        Improved comment for why min/max/count can be optimized away
[1 Oct 2009 14:42] Martin Hansson
Review sent by email.
[5 Oct 2009 11:42] Jørgen Løland
Making a field const has a slightly different meaning than assumed above. Non-aggregate fields can therefore not be handled as the previous patch does.

Further analysis of the queries

SELECT MAX(`pk`) field2  
FROM D  
ORDER BY field2, `pk`; 

#Fails - produces 100 rows.

SELECT MAX(`pk`) field2  
FROM D;

#Correct - produces 1 row

Reveals that only the first query enters this block in JOIN::exec:

1778  /* Create a tmp table if distinct or if the sort is too complicated */
1779  if (need_tmp)
1780  {
        (...)
2013    count_field_types(select_lex, &curr_join->tmp_table_param, 
2014                      *curr_all_fields, 0);
2015  }

This count_field_types sets sum_func_count to 0 because the aggregate function in the query was made a const in opt_sum_query() (in JOIN::optimize()).

Hence, at line 2021, sum_func_count=0 for the first query but 1 for the second query.

2021  if (curr_join->group || curr_join->tmp_table_param.sum_func_count ||
2022      (procedure && (procedure->flags & PROC_GROUP)))
2023  {
2024    if (make_group_fields(this, curr_join))

so only the second query executes make_group_fields() on line 2024. This call is the key to get a result set with only one record as it should.

The "if" on line 2021 is there to check if the query needs to perform grouping of the results. It fails because "sum_func_count==0" does not always imply "no aggregates functions".
[5 Oct 2009 12:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85746

3155 Jorgen Loland	2009-10-05
      Bug#47280 - strange results from count(*) with order by multiple 
                  columns without where/group
         
      Simple SELECT with implicit grouping returns many rows if the query
      orders by the aggregated column in the SELECT list. Queries with
      implicit grouping should only return a single record.
                  
      The problem is that the "if" in JOIN:exec() that decides if execution 
      needs to handle grouping assumes that tmp_table_param.sum_func_count==0
      means that there are no aggregate functions in the query. This is 
      incorrect if sum_func_count was recounted after optimizing the aggregate 
      functions away. Consider two queries:
      
      1) SELECT MAX(pk) max, pk FROM D ORDER BY max;
      2) SELECT MAX(pk) max, pk FROM D;
      
      max(pk) is optimized away by opt_sum_query() in JOIN::optimize for both
      queries, but the sum_func_count is only recalculated for the 
      former query in the "if(need_tmp)" block in JOIN:exec().
      
      Hence, when evaluating whether grouping is needed in JOIN::exec(),
      sum_func_count is 0 for the first query and 1 for the second 
      query. 
      
      The fix for this bug is to remove the assumption that sum_func_count==0
      means that there is no need for grouping. This is done by introducing 
      variable "bool implicit_grouping" in the JOIN object. 
     @ mysql-test/r/func_group.result
        Add test for BUG#47280
     @ mysql-test/t/func_group.test
        Add test for BUG#47280
     @ sql/sql_select.cc
        Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required.
     @ sql/sql_select.h
        Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause.
[6 Oct 2009 13:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85902

3147 Jorgen Loland	2009-10-06
      Bug#47280 - strange results from count(*) with order by multiple 
                  columns without where/group
               
      Simple SELECT with implicit grouping used to return many rows if
      the query was ordered by the aggregated column in the SELECT
      list. This was incorrect because queries with implicit grouping
      should only return a single record.
                        
      The problem was that when JOIN:exec() decided if execution needed
      to handle grouping, it was assumed that sum_func_count==0 meant
      that there were no aggregate functions in the query. This
      assumption was not correct in JOIN::exec() because the aggregate
      functions might have been optimized away during JOIN::optimize().
            
      The reason why queries without ordering behaved correctly was
      that sum_func_count is only recalculated if the optimizer chooses
      to use temporary tables (which it does in the ordered case). 
      Hence, non-ordered queries were correctly treated as grouped.
            
      The fix for this bug was to remove the assumption that
      sum_func_count==0 means that there is no need for grouping. This
      was done by introducing variable "bool implicit_grouping" in the
      JOIN object.
     @ mysql-test/r/func_group.result
        Add test for BUG#47280
     @ mysql-test/t/func_group.test
        Add test for BUG#47280
     @ sql/opt_sum.cc
        Improve comment for opt_sum_query()
     @ sql/sql_class.h
        Add comment for variables in TMP_TABLE_PARAM
     @ sql/sql_select.cc
        Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required. Also added comments for: optimization of aggregate fields for implicitly grouped queries (JOIN::optimize) and choice of end_select method (JOIN::execute)
     @ sql/sql_select.h
        Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause. Also added comment to sort_and_group variable.
[7 Oct 2009 13:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/86018

3147 Jorgen Loland	2009-10-07
      Bug#47280 - strange results from count(*) with order by multiple 
                  columns without where/group
                     
      Simple SELECT with implicit grouping used to return many rows if
      the query was ordered by the aggregated column in the SELECT
      list. This was incorrect because queries with implicit grouping
      should only return a single record.
                              
      The problem was that when JOIN:exec() decided if execution needed
      to handle grouping, it was assumed that sum_func_count==0 meant
      that there were no aggregate functions in the query. This
      assumption was not correct in JOIN::exec() because the aggregate
      functions might have been optimized away during JOIN::optimize().
                  
      The reason why queries without ordering behaved correctly was
      that sum_func_count is only recalculated if the optimizer chooses
      to use temporary tables (which it does in the ordered case). 
      Hence, non-ordered queries were correctly treated as grouped.
                  
      The fix for this bug was to remove the assumption that
      sum_func_count==0 means that there is no need for grouping. This
      was done by introducing variable "bool implicit_grouping" in the
      JOIN object.
     @ mysql-test/r/func_group.result
        Add test for BUG#47280
     @ mysql-test/t/func_group.test
        Add test for BUG#47280
     @ sql/opt_sum.cc
        Improve comment for opt_sum_query()
     @ sql/sql_class.h
        Add comment for variables in TMP_TABLE_PARAM
     @ sql/sql_select.cc
        Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required. Also added comments for: optimization of aggregate fields for implicitly grouped queries (JOIN::optimize) and choice of end_select method (JOIN::execute)
     @ sql/sql_select.h
        Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause. Also added comment to sort_and_group variable.
[14 Oct 2009 8:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/86762

3166 Jorgen Loland	2009-10-14
      Bug#47280 - strange results from count(*) with order by multiple 
                  columns without where/group
                           
      Simple SELECT with implicit grouping used to return many rows if
      the query was ordered by the aggregated column in the SELECT
      list. This was incorrect because queries with implicit grouping
      should only return a single record.
                                    
      The problem was that when JOIN:exec() decided if execution needed
      to handle grouping, it was assumed that sum_func_count==0 meant
      that there were no aggregate functions in the query. This
      assumption was not correct in JOIN::exec() because the aggregate
      functions might have been optimized away during JOIN::optimize().
                        
      The reason why queries without ordering behaved correctly was
      that sum_func_count is only recalculated if the optimizer chooses
      to use temporary tables (which it does in the ordered case).
      Hence, non-ordered queries were correctly treated as grouped.
                        
      The fix for this bug was to remove the assumption that
      sum_func_count==0 means that there is no need for grouping. This
      was done by introducing variable "bool implicit_grouping" in the
      JOIN object.
     @ mysql-test/r/func_group.result
        Add test for BUG#47280
     @ mysql-test/t/func_group.test
        Add test for BUG#47280
     @ sql/opt_sum.cc
        Improve comment for opt_sum_query()
     @ sql/sql_class.h
        Add comment for variables in TMP_TABLE_PARAM
     @ sql/sql_select.cc
        Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required. Also added comments for: optimization of aggregate fields for implicitly grouped queries  (JOIN::optimize) and choice of end_select method (JOIN::execute)
     @ sql/sql_select.h
        Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause. Also added comment to sort_and_group variable.
[14 Oct 2009 8:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/86765

3655 Jorgen Loland	2009-10-14 [merge]
      Merge BUG#47280 to mysql-pe from mysql-5.1-bugteam
     @ mysql-test/r/func_group.result
        Add test for BUG#47280
     @ mysql-test/t/func_group.test
        Add test for BUG#47280
     @ sql/opt_sum.cc
        Improve comment for opt_sum_query()
     @ sql/sql_class.h
        Add comment for variables in TMP_TABLE_PARAM
     @ sql/sql_select.cc
        Introduce and use variable implicit_grouping instead of (!group_list && sum_func_count) in places that need to test if grouping is required. Also added comments for: optimization of aggregate fields for implicitly grouped queries (JOIN::optimize) and choice of end_select method (JOIN::execute)
     @ sql/sql_select.h
        Add variable implicit_grouping, which will be TRUE for queries that contain aggregate functions but no GROUP BY clause. Also added comment to sort_and_group variable.
[14 Oct 2009 8:57] Jørgen Løland
Pushed to mysql-5.1-bugteam and merged to mysql-pe
[14 Oct 2009 16:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/86853

3168 Jorgen Loland	2009-10-14
      Followup patch for BUG#47280
      
      Temporary tables may set join->group to 0 even though there is 
      grouping. Also need to test if sum_func_count>0 when JOIN::exec() 
      decides whether to present results in a grouped manner.
     @ sql/sql_select.cc
        Temporary tables may set join->group to 0 even though there is 
        grouping. Also need to test if sum_func_count>0 when JOIN::exec() 
        decides whether to present results in a grouped manner.
[15 Oct 2009 11:49] Bugs System
Pushed into 5.1.41 (revid:joro@sun.com-20091015114812-i3gm8km6gfruny5x) (version source revid:jorgen.loland@sun.com-20091014162001-s9fs95q5qu3a5m3t) (merge vers: 5.1.40) (pib:13)
[15 Oct 2009 23:43] Paul DuBois
Noted in 5.1.41 changelog.

A simple SELECT with implicit grouping could return many rows rather
than a single row if the query was ordered by the aggregated column 
in the select list. 

Setting report to NDI pending push into 5.5.x+.
[22 Oct 2009 6:33] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091019135554-s1pvptt6i750lfhv) (merge vers: 6.0.14-alpha) (pib:13)
[22 Oct 2009 7:05] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091022060553-znkmxm0g0gm6ckvw) (version source revid:alik@sun.com-20091019131708-bc6pv55x6287a0wc) (merge vers: 5.5.0-beta) (pib:13)
[22 Oct 2009 19:56] Paul DuBois
Noted in 5.5.0, 6.0.14 changelogs.
[18 Dec 2009 10:29] Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:45] Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 11:00] Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:14] Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)
[18 Dec 2009 13:21] MC Brown
Already noted in earlier changelogs.