Bug #45267 GROUP BY does not group NULLs on certain JOINs
Submitted: 2 Jun 2009 13:01 Modified: 22 Nov 2010 0:47
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.4 OS:Any
Assigned to: Evgeny Potemkin CPU Architecture:Any
Tags: regression

[2 Jun 2009 13:01] Philip Stoev
Description:
Executing queries such as:

SELECT table1 .`datetime_nokey` field1  FROM AA table1  RIGHT  JOIN (C table2  JOIN BB  ON table2 .`date_nokey` ) ON table2 .`date_key`  GROUP  BY field1;

returns a result set that contains 22 separate NULL values rather than the single NULL value expected. There are two possible explanations for this:

A) GROUP BY treats all NULLs as distinct and creates one group for each, which is against the SQL standard.

B) The GROUP BY does not get applied at all.

How to repeat:
A test case will be uploaded shortly.
[2 Jun 2009 13:10] Philip Stoev
Test case. Apologies for not simplifying it to the absolute possible minimum.

SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `AA` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `int_key` int(11) NOT NULL,
  `date_key` date NOT NULL,
  `date_nokey` date NOT NULL,
  `time_key` time NOT NULL,
  `time_nokey` time NOT NULL,
  `datetime_key` datetime NOT NULL,
  `datetime_nokey` datetime NOT NULL,
  `varchar_key` varchar(1) NOT NULL,
  `varchar_nokey` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `int_key` (`int_key`),
  KEY `date_key` (`date_key`),
  KEY `time_key` (`time_key`),
  KEY `datetime_key` (`datetime_key`),
  KEY `varchar_key` (`varchar_key`)
) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1;
SET character_set_client = @saved_cs_client;
SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `C` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `int_key` int(11) NOT NULL,
  `date_key` date NOT NULL,
  `date_nokey` date NOT NULL,
  `time_key` time NOT NULL,
  `time_nokey` time NOT NULL,
  `datetime_key` datetime NOT NULL,
  `datetime_nokey` datetime NOT NULL,
  `varchar_key` varchar(1) NOT NULL,
  `varchar_nokey` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `int_key` (`int_key`),
  KEY `date_key` (`date_key`),
  KEY `time_key` (`time_key`),
  KEY `datetime_key` (`datetime_key`),
  KEY `varchar_key` (`varchar_key`)
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;
SET character_set_client = @saved_cs_client;
INSERT INTO `C` VALUES (1,0,8,'0000-00-00','0000-00-00','00:16:10','00:16:10','2008-09-03 14:25:40','2008-09-03 14:25:40','h','h'),(2,7,1,'2008-05-01','2008-05-01','00:00:00','00:00:00','2001-01-13 00:00:00','2001-01-13 00:00:00','',''),(3,0,2,'2002-02-04','2002-02-04','00:00:00','00:00:00','0000-00-00 00:00:00','0000-00-00 00:00:00','x','x'),(4,2,2,'0000-00-00','0000-00-00','16:29:24','16:29:24','2000-10-16 01:39:08','2000-10-16 01:39:08','w','w'),(5,1,0,'0000-00-00','0000-00-00','09:23:32','09:23:32','0000-00-00 00:00:00','0000-00-00 00:00:00','p','p'),(6,3,7,'0000-00-00','0000-00-00','00:00:00','00:00:00','2007-12-02 00:00:00','2007-12-02 00:00:00','o','o'),(7,3,0,'0000-00-00','0000-00-00','00:00:00','00:00:00','2008-09-11 00:00:00','2008-09-11 00:00:00','',''),(8,0,6,'0000-00-00','0000-00-00','13:59:04','13:59:04','0000-00-00 00:00:00','0000-00-00 00:00:00','s','s'),(9,7,9,'2005-02-05','2005-02-05','09:01:06','09:01:06','0000-00-00 00:00:00','0000-00-00 00:00:00','d','d'),(10,5,2,'2005-12-12','2005-12-12','00:00:00','00:00:00','0000-00-00 00:00:00','0000-00-00 00:00:00','n','n'),(11,0,1,'2001-02-27','2001-02-27','21:06:46','21:06:46','0000-00-00 00:00:00','0000-00-00 00:00:00','o','o'),(12,2,2,'2001-09-05','2001-09-05','00:00:00','00:00:00','0000-00-00 00:00:00','0000-00-00 00:00:00','',''),(13,6,7,'0000-00-00','0000-00-00','14:45:34','14:45:34','2003-07-28 02:34:08','2003-07-28 02:34:08','w','w'),(14,1,4,'2000-10-11','2000-10-11','15:04:12','15:04:12','0000-00-00 00:00:00','0000-00-00 00:00:00','o','o'),(15,0,8,'2006-04-10','2006-04-10','00:00:00','00:00:00','0000-00-00 00:00:00','0000-00-00 00:00:00','x','x'),(16,0,9,'2005-03-26','2005-03-26','15:55:23','15:55:23','2004-03-17 00:32:27','2004-03-17 00:32:27','p','p'),(17,1,0,'2003-02-03','2003-02-03','16:30:00','16:30:00','2004-12-27 19:20:00','2004-12-27 19:20:00','d','d'),(18,0,0,'0000-00-00','0000-00-00','00:00:00','00:00:00','0000-00-00 00:00:00','0000-00-00 00:00:00','h','h'),(19,0,4,'0000-00-00','0000-00-00','14:13:26','14:13:26','2008-11-09 05:53:48','2008-11-09 05:53:48','o','o'),(20,0,0,'2004-01-20','2004-01-20','00:00:00','00:00:00','2009-10-11 06:58:04','2009-10-11 06:58:04','k','k');
SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `BB` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `int_key` int(11) NOT NULL,
  `date_key` date NOT NULL,
  `date_nokey` date NOT NULL,
  `time_key` time NOT NULL,
  `time_nokey` time NOT NULL,
  `datetime_key` datetime NOT NULL,
  `datetime_nokey` datetime NOT NULL,
  `varchar_key` varchar(1) NOT NULL,
  `varchar_nokey` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `int_key` (`int_key`),
  KEY `date_key` (`date_key`),
  KEY `time_key` (`time_key`),
  KEY `datetime_key` (`datetime_key`),
  KEY `varchar_key` (`varchar_key`)
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;
SET character_set_client = @saved_cs_client;
INSERT INTO `BB` VALUES (10,0,8,'2005-03-23','2005-03-23','19:39:13','19:39:13','0000-00-00 00:00:00','0000-00-00 00:00:00','g','g'),(11,8,5,'2000-11-14','2000-11-14','03:43:53','03:43:53','0000-00-00 00:00:00','0000-00-00 00:00:00','b','b');

SELECT table1 .`datetime_nokey` field1  FROM AA table1  RIGHT  JOIN (C table2  JOIN BB  ON table2 .`date_nokey` ) ON table2 .`date_key`  GROUP  BY field1;
[26 Jun 2009 8:53] Philip Stoev
A tigthter test case:

CREATE TABLE `AA` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `time_key` time NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `time_key` (`time_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `C` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `time_key` time NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `time_key` (`time_key`)
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;

INSERT INTO `C` VALUES (1,5,'20:53:30'),(2,3,'00:00:00'),(3,0,'12:37:08'),(4,3,'                                                                              00:00:00'),(5,1,'15:59:14'),(6,0,'04:43:51'),(7,1,'00:00:00'),(8,7,'09:51:25'),(                                                                              9,1,'00:00:00'),(10,0,'00:00:00'),(11,0,'00:00:00'),(12,8,'18:50:55'),(13,4,'00:                                                                              00:00'),(14,9,'08:22:36'),(15,0,'00:00:00'),(16,2,'09:37:21'),(17,0,'05:15:03'),                                                                              (18,8,'20:11:43'),(19,5,'00:00:00'),(20,1,'00:00:00');

CREATE TABLE `BB` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `time_key` time NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `time_key` (`time_key`)
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;

INSERT INTO `BB` VALUES (10,9,'22:36:46'),(11,0,'08:46:46');

/*

OPTIMIZER SETTINGS:

*/

SET SESSION optimizer_switch = 'firstmatch=on,index_merge=on,index_merge_union=o                                                                              n,index_merge_sort_union=on,index_merge_intersection=on,loosescan=on,materializa                                                                              tion=off,semijoin=off';
SET SESSION optimizer_use_mrr = 'force';
SET SESSION engine_condition_pushdown = '1';
SET SESSION join_cache_level = '1';
SET GLOBAL optimizer_switch = 'firstmatch=on,index_merge=on,index_merge_union=on                                                                              ,index_merge_sort_union=on,index_merge_intersection=on,loosescan=on,materializat                                                                              ion=off,semijoin=off';
SET GLOBAL optimizer_use_mrr = 'force';
SET GLOBAL engine_condition_pushdown = '1';
SET GLOBAL join_cache_level = '1';

/*

ORIGINAL QUERY:

 SELECT table1 . `pk` AS field1 FROM (AA AS table1 RIGHT OUTER JOIN (BB AS table                                                                              2 STRAIGHT_JOIN C AS table3 ON ( table3 . `datetime_nokey` = table2 . `int_nokey                                                                              ` )) ON ( table3 . `time_key` = table2 . `time_key` )) WHERE  table2 . `int_noke                                                                              y` <> 6  GROUP BY field1  ;

ORIGINAL DIFF:

--- /tmp//randgen18812-server0.dump     2009-06-26 11:51:46.000000000 +0300
+++ /tmp//randgen18812-server1.dump     2009-06-26 11:51:46.000000000 +0300
@@ -1 +1,7 @@
 NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL

SIMPLIFIED QUERY:

 SELECT table1 .`pk` field1  FROM AA table1  RIGHT  JOIN BB table2  STRAIGHT_JOI                                                                              N C  ON table2 .`int_nokey`  ON table2 .`time_key`  GROUP  BY field1   ;

SIMPLIFIED DIFF:

--- /tmp//randgen18812-server0.dump     2009-06-26 11:51:46.000000000 +0300
+++ /tmp//randgen18812-server1.dump     2009-06-26 11:51:46.000000000 +0300
@@ -1 +1,20 @@
 NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL
+NULL

*/
[8 Jul 2009 20:04] Philip Stoev
Here is another test case, showing that even after using SELECT DISTINCT , the NULLs are separate rows in the result set.

--disable_warnings
DROP TABLE IF EXISTS AA, C, BB;
--enable_warnings

CREATE TABLE `AA` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `time_key` time NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `time_key` (`time_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `C` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `time_key` time NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `time_key` (`time_key`)
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;
INSERT INTO `C` VALUES (1,5,'20:53:30'),(2,3,'00:00:00'),(3,0,'12:37:08'),(4,3,'00:00:00'),(5,1,'15:59:14'),(6,0,'04:43:51'),(7,1,'00:00:00'),(8,7,'09:51:25'),(9,1,'00:00:00'),(10,0,'00:00:00'),(11,0,'00:00:00'),(12,8,'18:50:55'),(13,4,'00:00:00'),(14,9,'08:22:36'),(15,0,'00:00:00'),(16,2,'09:37:21'),(17,0,'05:15:03'),(18,8,'20:11:43'),(19,5,'00:00:00'),(20,1,'00:00:00');
CREATE TABLE `BB` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) NOT NULL,
  `time_key` time NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `time_key` (`time_key`)
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;
INSERT INTO `BB` VALUES (10,9,'22:36:46'),(11,0,'08:46:46');

SELECT DISTINCT table1 .`pk` field1
FROM AA table1  RIGHT  JOIN BB table2  STRAIGHT_JOIN C  ON table2 .`int_nokey`  ON table2 .`time_key`
GROUP BY field1;
[13 Jul 2009 15:27] Philip Stoev
This regression was caused by the following changeset:

revno: 2727
revision-id: igor@mysql.com-20090310013411-xxe9x3d9kqxuo6ar
parent: sergefp@mysql.com-20090301182607-xglayyk67lwson93
committer: Igor Babaev <igor@mysql.com>
branch nick: mysql-6.0-opt-bug42955
timestamp: Mon 2009-03-09 18:34:11 -0700
message:
  Fixed bug #42955.
  If a join buffer is employed to join a table through BNL or
  BKA algorithm for a query with a ORDER BY / GROUP BY clause
  then the result set has always to be sorted unless the clause
  can be optimized away.

The regression is present even if all possible 5.4 optimizations have been turned off.
[21 Jul 2009 11:20] Evgeny Potemkin
Slightly changed query reports wrong error:

mysql> SELECT DISTINCT table1 .`pk` field1 FROM BB table2  LEFT JOIN AA table1  STRAIGHT_JOIN C  ON table2 .`int_nokey`  ON table2 .`time_key` GROUP BY field1;
ERROR 1054 (42S22): Unknown error 1054
[21 Jul 2009 12:09] Philip Stoev
The error for code 1054 is as follows:

Error: 1054 SQLSTATE: 42S22 (ER_BAD_FIELD_ERROR) 
Message: Unknown column '%s' in '%s'
[23 Jul 2009 12:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79167

2825 Evgeny Potemkin	2009-07-23
      Bug#45267: Incomplete check caused wrong result.
      
      If a join buffer is used to join a table the ordering by an index for the
      first non-constant table cannot be employed unless ORDER BY/GROUP BY clauses
      are optimized away. The JOIN::get_sort_by_join_tab function is employed to
      choose whether to sort result with filesort or to use index scan.
      It wasn't checking whether the GROUP BY clause was optimized away and thus
      returned wrong result in this case. All this caused the optimizer to choose
      wrong execution plan and return wrong query result.
      
      The JOIN::get_sort_by_join_tab function now checks whether the GROUP BY
      clause was optimized away.
     @ mysql-test/r/join_cache.result
        Added a test case for the bug#45267.
     @ mysql-test/t/join_cache.test
        Added a test case for the bug#45267.
     @ sql/sql_select.h
        Bug#45267: Incomplete check caused wrong result.
        The JOIN::get_sort_by_join_tab function now checks whether the GROUP BY
        clause was optimized away.
[23 Jul 2009 12:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79173

2825 Evgeny Potemkin	2009-07-22
      Bug#45267: Incomplete check caused wrong result.
      
      If a join buffer is used to join a table the ordering by an index for the
      first non-constant table cannot be employed unless ORDER BY/GROUP BY clauses
      are optimized away. The JOIN::get_sort_by_join_tab function is employed to
      choose whether to sort result with filesort or to use index scan.
      It wasn't checking whether the GROUP BY clause was optimized away and thus
      returned wrong result in this case. All this caused the optimizer to choose
      wrong execution plan and return wrong query result.
      
      The JOIN::get_sort_by_join_tab function now checks whether the GROUP BY
      clause was optimized away.
     @ mysql-test/r/join_cache.result
        Added a test case for the bug#45267.
     @ mysql-test/t/join_cache.test
        Added a test case for the bug#45267.
     @ sql/sql_select.h
        Bug#45267: Incomplete check caused wrong result.
        The JOIN::get_sort_by_join_tab function now checks whether the GROUP BY
        clause was optimized away.
[4 Sep 2009 7:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82408

2825 Evgeny Potemkin	2009-09-04
      Bug#45267: Incomplete check caused wrong result.
      
      If a join buffer is used to join a table the ordering by an index for the
      first non-constant table cannot be employed unless ORDER BY/GROUP BY clauses
      are optimized away. The JOIN::get_sort_by_join_tab function is employed to
      choose whether to sort result with filesort or to use index scan.
      It wasn't checking whether the GROUP BY clause was optimized away and thus
      returned wrong result in this case. All this caused the optimizer to choose
      wrong execution plan and return wrong query result.
      
      The JOIN::get_sort_by_join_tab function now checks whether the GROUP BY
      clause was optimized away.
     @ mysql-test/r/join_cache.result
        Added a test case for the bug#45267.
     @ mysql-test/t/join_cache.test
        Added a test case for the bug#45267.
     @ sql/sql_select.h
        Bug#45267: Incomplete check caused wrong result.
        The JOIN::get_sort_by_join_tab function now checks whether the GROUP BY
        clause was optimized away.
[8 Sep 2009 8:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82634

2811 Evgeny Potemkin	2009-09-08 [merge]
      Auto-merged fix for te bug#45267.
[15 Sep 2009 13:52] Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090915134838-5nj3ycjfsqc2vr2f) (version source revid:epotemkin@mysql.com-20090908080055-t3xlolxjdj4xsn4s) (merge vers: 5.4.4-alpha) (pib:11)
[16 Sep 2009 1:24] Paul DuBois
Noted in 5.4.4 changelog.

NULL values were not grouped properly for some joins containing GROUP
BY.
[12 May 2010 15:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108169

3167 Evgeny Potemkin	2010-05-12
      Bug#45267: Incomplete check caused wrong result.
      
      If a join buffer is used to join a table the ordering by an index for the
      first non-constant table cannot be employed unless ORDER BY/GROUP BY clauses
      are optimized away. The JOIN::get_sort_by_join_tab function is employed to
      choose whether to sort result with filesort or to use index scan.
      It wasn't checking whether the GROUP BY clause was optimized away and thus
      returned wrong result in this case. All this caused the optimizer to choose
      wrong execution plan and return wrong query result.
      
      The JOIN::get_sort_by_join_tab function now checks whether the GROUP BY
      clause was optimized away.
      
      Original revid:epotemkin@mysql.com-20090904065824-s97w8gwsls2sqhcy
      Only the test case is added as the code is already present.
     @ mysql-test/r/join_cache.result
        A test case added for the bug#45267.
     @ mysql-test/t/join_cache.test
        A test case added for the bug#45267.
[16 Aug 2010 6:40] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:20] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[22 Nov 2010 0:47] Paul DuBois
Noted in 5.6.1 changelog.