Bug #46328 Use of aggregate function without GROUP BY clause returns many rows (vs. one )
Submitted: 21 Jul 2009 20:54 Modified: 22 Nov 2010 0:49
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Jørgen Løland CPU Architecture:Any
Tags: Aggregate Function, bad results, GROUP BY, regression

[21 Jul 2009 20:54] Patrick Crews
Description:
Using an aggregate function in a query without a GROUP BY clause is returning only a single row in 5.0 and 5.1.  (5.4 seems to be behaving correctly per the docs):

SELECT  MAX( X .`int_key`  )  , X .`int_key` field1  FROM CC X  STRAIGHT_JOIN BBB  ORDER  BY field1   ;
returns only a single row:
 9	0
despite the fact that there are multiple values for X.int_key in table CC.

Per the docs: http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.

In azalea, we receive 20 rows for the same query.

I was not able to see a difference in behavior by altering optimizer_switch / engine_condition_pushdown values.

How to repeat:
Test case (for 5.1), full rqg-generated test output follows:
SET SESSION optimizer_switch = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on';

SET SESSION engine_condition_pushdown = 'ON';

--disable_warnings
DROP TABLE IF EXISTS CC, BBB;
--enable_warnings

CREATE TABLE `CC` (
  `int_key` int(11) NOT NULL,
  KEY `int_key` (`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (0),(0),(0),(2),(2),(2),(2),(3),(4),(5),(5),(6),(6),(6),(8),(8),(8),(8),(9),(9);
CREATE TABLE `BBB` (
  `int_key` int(11) NOT NULL,
  KEY `int_key` (`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `BBB` VALUES (3);

 
SELECT  MAX( X .`int_key`  )  , X .`int_key` field1  
FROM CC X  STRAIGHT_JOIN BBB  
ORDER  BY field1   ;

DROP TABLE CC, BBB;

# End of test case for query 0

# Begin test case for query 1

--disable_warnings
DROP TABLE IF EXISTS CC, BBB;
--enable_warnings

CREATE TABLE `CC` (
  `int_key` int(11) NOT NULL,
  `date_nokey` date NOT NULL,
  `varchar_nokey` varchar(1) NOT NULL,
  KEY `int_key` (`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (8,'2007-02-14','q'),(8,'2002-10-03','m'),(3,'2006-12-02','j'),(2,'2007-05-02','z'),(2,'2001-11-18','a'),(6,'2006-09-09',''),(8,'0000-00-00','e'),(9,'2003-07-22','t'),(2,'2001-12-22','q'),(6,'0000-00-00','b'),(5,'2006-09-02','w'),(2,'0000-00-00','m'),(4,'0000-00-00','x'),(9,'2001-02-28',''),(6,'0000-00-00','w'),(5,'2007-05-19','x'),(0,'2005-02-15','e'),(0,'2000-10-19','e'),(8,'2005-07-07','p'),(0,'2008-10-18','x');
CREATE TABLE `BBB` (
  `int_key` int(11) NOT NULL,
  `date_nokey` date NOT NULL,
  `varchar_nokey` varchar(1) NOT NULL,
  KEY `int_key` (`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `BBB` VALUES (3,'0000-00-00','');

 
SELECT  MAX( X . `int_key` ) , X . `int_key` AS field1 
FROM CC X STRAIGHT_JOIN BBB Y     
ORDER BY field1  , field1  , X . `date_nokey`  , X . `varchar_nokey`  , field1 DESC , field1 ASC ;

DROP TABLE CC, BBB;

# End of test case for query 1

######## FULL test case ##########

# Server0: version = 5.1.38-debug-log
# Server1: version = 5.4.4-alpha-debug-log

# The value of optimizer_switch is distinct between the two servers:
# Server 0 : SET SESSION optimizer_switch = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on';
# Server 1 : SET SESSION optimizer_switch = 'firstmatch=off,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=off,materialization=off,semijoin=off';

# The value of optimizer_use_mrr is distinct between the two servers:
# Server 0 : SET SESSION optimizer_use_mrr = '';
# Server 1 : SET SESSION optimizer_use_mrr = 'disable';

# The value of engine_condition_pushdown is distinct between the two servers:
# Server 0 : SET SESSION engine_condition_pushdown = 'ON';
# Server 1 : SET SESSION engine_condition_pushdown = 0;

# The value of join_cache_level is distinct between the two servers:
# Server 0 : SET SESSION join_cache_level = '';
# Server 1 : SET SESSION join_cache_level = 1;

# Begin test case for query 0

--disable_warnings
DROP TABLE IF EXISTS CC, BBB;
--enable_warnings

CREATE TABLE `CC` (
  `int_key` int(11) NOT NULL,
  KEY `int_key` (`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (0),(0),(0),(2),(2),(2),(2),(3),(4),(5),(5),(6),(6),(6),(8),(8),(8),(8),(9),(9);
CREATE TABLE `BBB` (
  `int_key` int(11) NOT NULL,
  KEY `int_key` (`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `BBB` VALUES (3);

 
SELECT  MAX( X .`int_key`  )  , X .`int_key` field1  
FROM CC X  STRAIGHT_JOIN BBB  
ORDER  BY field1   ;

# Diff:

# --- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen33574-1248195707-server0.dump	2009-07-21 13:01:47.000000000 -0400
# +++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen33574-1248195707-server1.dump	2009-07-21 13:01:47.000000000 -0400
# @@ -1 +1,20 @@
#  9	0
# +9	0
# +9	0
# +9	2
# +9	2
# +9	2
# +9	2
# +9	3
# +9	4
# +9	5
# +9	5
# +9	6
# +9	6
# +9	6
# +9	8
# +9	8
# +9	8
# +9	8
# +9	9
# +9	9

DROP TABLE CC, BBB;

# End of test case for query 0

# Begin test case for query 1

--disable_warnings
DROP TABLE IF EXISTS CC, BBB;
--enable_warnings

CREATE TABLE `CC` (
  `int_key` int(11) NOT NULL,
  `date_nokey` date NOT NULL,
  `varchar_nokey` varchar(1) NOT NULL,
  KEY `int_key` (`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (8,'2007-02-14','q'),(8,'2002-10-03','m'),(3,'2006-12-02','j'),(2,'2007-05-02','z'),(2,'2001-11-18','a'),(6,'2006-09-09',''),(8,'0000-00-00','e'),(9,'2003-07-22','t'),(2,'2001-12-22','q'),(6,'0000-00-00','b'),(5,'2006-09-02','w'),(2,'0000-00-00','m'),(4,'0000-00-00','x'),(9,'2001-02-28',''),(6,'0000-00-00','w'),(5,'2007-05-19','x'),(0,'2005-02-15','e'),(0,'2000-10-19','e'),(8,'2005-07-07','p'),(0,'2008-10-18','x');
CREATE TABLE `BBB` (
  `int_key` int(11) NOT NULL,
  `date_nokey` date NOT NULL,
  `varchar_nokey` varchar(1) NOT NULL,
  KEY `int_key` (`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `BBB` VALUES (3,'0000-00-00','');

 
SELECT  MAX( X . `int_key` ) , X . `int_key` AS field1 
FROM CC X STRAIGHT_JOIN BBB Y     
ORDER BY field1  , field1  , X . `date_nokey`  , X . `varchar_nokey`  , field1 DESC , field1 ASC ;

# Diff:

# --- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen33574-1248195709-server0.dump	2009-07-21 13:01:49.000000000 -0400
# +++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen33574-1248195709-server1.dump	2009-07-21 13:01:49.000000000 -0400
# @@ -1 +1,20 @@
# +9	0
# +9	0
# +9	0
# +9	2
# +9	2
# +9	2
# +9	2
# +9	3
# +9	4
# +9	5
# +9	5
# +9	6
# +9	6
# +9	6
#  9	8
# +9	8
# +9	8
# +9	8
# +9	9
# +9	9

DROP TABLE CC, BBB;

# End of test case for query 1

Suggested fix:
Ensure aggregate functions work as documented.
[5 Aug 2009 14:24] Patrick Crews
After discussion with PeterG, 5.0 and 5.1 are producing the correct behavior - one should only expect a single row in this case.

From this, azalea's behavior is incorrect and needs to be fixed.

Also wrote:
Bug#46567	Aggregate function documentation needs clarification for non-GROUP BY behavior
to better clarify correct behavior.
[9 Sep 2009 19:48] Guilhem Bichot
GROUP BY => Optimizer.
[16 Sep 2009 14:39] Timour Katchaounov
Let's compare the query plans when we use STRAIGHT_JOIN (that forces the order
'X, BBB') and JOIN (that lets the optimizer figure a better order 'BBB, X'):

mysql> explain SELECT  MAX( X .`int_key`  )  , X .`int_key` field1   FROM CC X  STRAIGHT_JOIN BBB   ORDER  BY field1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: X
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20
        Extra: Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: BBB
         type: index
possible_keys: NULL
          key: int_key
      key_len: 4
          ref: NULL
         rows: 1
        Extra: Using index; Using join buffer

mysql> explain SELECT  MAX( X .`int_key`  )  , X .`int_key` field1   FROM CC X JOIN BBB   ORDER  BY field1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: BBB
         type: system
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: X
         type: index
possible_keys: NULL
          key: int_key
      key_len: 4
          ref: NULL
         rows: 20
        Extra: Using index

As we can see, the problem exists with just the STRAIGHT_JOIN query, which
forces suboptimal join order (in this case cartesian product). Therefore
the problem *might* be somehow related to changes in the Block-nested loop
join algorithm. AFAIR the implementation of BKA has done some changes to
that code. This bug may or may not be related to BKA.

Another guess is that it is possible that the MAX field is not computed over
the correct buffer(s). Notice that in the problem query we use a temp table,
thus there may be some problem with the field over which we compute the MAX
function.
[17 Sep 2009 9:34] Jørgen Løland
This bug was introduced by the fix for Bug#42955 "Wrong results returned by join queries with group by/order by when BKA is used"
[23 Sep 2009 13:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84384

2835 Jorgen Loland	2009-09-23
      Bug#46328 - Use of aggregate function without GROUP BY clause returns 
      many rows (vs. one )
      
      If a join buffer is used to join a table the ordering by an index for the
      first non-constant table cannot be employed unless ORDER BY/GROUP BY clauses
      are optimized away or there is implicit grouping (aggregate functions but no 
      GROUP BY clause). 
      
      The JOIN::get_sort_by_join_tab function is employed to choose whether to sort 
      result with filesort or to use index scan. It checked if GROUP BY was optimized 
      away, but not if the query contained implicit grouping. This caused the optimizer
      to choose wrong execution plan.
     @ mysql-test/r/join_cache.result
        Added test case for BUG#46328
     @ mysql-test/t/join_cache.test
        Added test case for BUG#46328
     @ sql/sql_select.h
        The JOIN::get_sort_by_join_tab function now checks if the query contains implicit grouping
[23 Sep 2009 14:22] Timour Katchaounov
A comment on the patch:
One needs to be very careful when introducing one-line predicate functions.
If we do that without care, our namespace will soon be polluted with a
gazillion of such functions with similar meaning.

In this case, you add a function that is not used anywhere else, although
it seems like a concept that should be needed. If no one needed such a
function in the last 5 years, it is unlikely it will be ever needed.

Please either find other uses for this function, or remove it.
[24 Sep 2009 7:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84451

2838 Jorgen Loland	2009-09-24
      Bug#46328 - Use of aggregate function without GROUP BY clause 
      returns many rows (vs. one )
            
      If a join buffer is used to join a table the ordering by an index 
      for the first non-constant table cannot be employed unless 
      ORDER BY/GROUP BY clauses are optimized away or there is implicit 
      grouping (aggregate functions but no GROUP BY clause). 
     @ mysql-test/r/join_cache.result
        Added test case for BUG#46328
     @ mysql-test/t/join_cache.test
        Added test case for BUG#46328
     @ sql/sql_select.h
        The JOIN::get_sort_by_join_tab function now checks if the query contains implicit grouping
[25 Sep 2009 12:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84645

2840 Jorgen Loland	2009-09-25
      Bug#46328 - Use of aggregate function without GROUP BY clause 
      returns many rows (vs. one )
                  
      If a join buffer is used to join a table the ordering by an index 
      for the first non-constant table cannot be employed unless 
      ORDER BY/GROUP BY clauses are optimized away or there is implicit 
      grouping (aggregate functions but no GROUP BY clause). 
     @ mysql-test/r/join_cache.result
        Added test case for BUG#46328
     @ mysql-test/t/join_cache.test
        Added test case for BUG#46328
     @ sql/sql_select.h
        The JOIN::get_sort_by_join_tab function now checks if the query contains implicit grouping
[25 Sep 2009 12:22] Jørgen Løland
Pushed to mysql-6.0-codebase-bugfixing
[30 Sep 2009 8:17] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20090929093622-1mooerbh12e97zux) (version source revid:alik@sun.com-20090927203924-087s36mrs0uxepwb) (merge vers: 6.0.14-alpha) (pib:11)
[9 Oct 2009 8:47] Bugs System
Pushed into 6.0.14-alpha (revid:alik@ibmvm-20091009083208-0o0f0i9w1sq3c1kn) (version source revid:jon.hauglid@sun.com-20090929073341-4ev88lvoxsil8t41) (merge vers: 6.0.14-alpha) (pib:12)
[12 Oct 2009 15:33] Paul DuBois
Noted in 6.0.14 changelog.

Use of an aggregate function without a GROUP BY clause could return
several rows rather than a single row. 

Setting report to NDI pending push into 5.5.x.
[26 Nov 2009 13:39] Paul DuBois
Problem is in 6.0.x only. Closing.
[6 May 2010 14:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/107670

3137 oystein.grovlen@sun.com	2010-05-06
      Bug#46328 - Use of aggregate function without GROUP BY clause 
      returns many rows (vs. one )
      
      (Backporting of revid:jorgen.loland@sun.com-20090925121901-kic7bs68mvcfbpzn)
                    
      If a join buffer is used to join a table the ordering by an index 
      for the first non-constant table cannot be employed unless 
      ORDER BY/GROUP BY clauses are optimized away or there is implicit 
      grouping (aggregate functions but no GROUP BY clause). 
     @ mysql-test/r/join_cache.result
        Added test case for BUG#46328
     @ mysql-test/t/join_cache.test
        Added test case for BUG#46328
     @ sql/sql_select.h
        The JOIN::get_sort_by_join_tab function now checks if the query contains implicit grouping
[16 Aug 2010 6:32] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:22] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)