Bug #48920 COUNT DISTINCT returns 1 for NULL values when in a subquery in the select list
Submitted: 19 Nov 2009 19:43 Modified: 7 Mar 2010 1:32
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Jørgen Løland CPU Architecture:Any
Tags: regression, SELECT subquery, subquery

[19 Nov 2009 19:43] Patrick Crews
Description:
6.0 is differing from 5.0 and 5.1 on certain queries that contain a SELECT clause subquery + a correlated subquery.

5.0 and 5.1 are in agreement results-wise, while 6.0 differs substantially.  This is occurring for a number of queries that follow this pattern.

At present, I am not entirely certain which version is correct (will try to run a 3-way comparison with other DBMS's when time permits and will update the bug).

However:
SELECT (  
SELECT  COUNT(  DISTINCT SUBQUERY4_t1 .`int_key`  )  
FROM CC SUBQUERY4_t1  JOIN CC SUBQUERY4_t2  ON SUBQUERY4_t2 .`varchar_nokey`  = SUBQUERY4_t1 .`varchar_key`  
WHERE SUBQUERY4_t2 .`varchar_nokey`  = table1 .`varchar_nokey`  )  
FROM (  
SELECT * 
FROM D  ) table1   ;

Produces the diff seen below in the test case.

How to repeat:
MTR test case - set up for running against 6.0, please comment / uncomment SET statements as needed for other versions of the server:

#/* Server0: MySQL 6.0.14-alpha-gcov-debug-log */
#/* Server1: MySQL 5.1.41-gcov-debug-log */

#/* The value of optimizer_switch is distinct between the two servers: */
SET SESSION optimizer_switch = 'firstmatch=off,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=off,materialization=off,semijoin=off' ;
#/* Server 1 : SET SESSION optimizer_switch = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on' */;

#/* The value of optimizer_use_mrr is distinct between the two servers: */
SET SESSION optimizer_use_mrr = 'force' ;
# Server 1 : SET SESSION optimizer_use_mrr = '' ;

#/* The value of engine_condition_pushdown is common between the two servers: */
/*!50400 SET SESSION engine_condition_pushdown = ON */;

#/* The value of join_cache_level is distinct between the two servers: */
SET SESSION join_cache_level = 1; 
#/* Server 1 : SET SESSION join_cache_level = '' */;

#/* The value of debug is distinct between the two servers: */
SET SESSION debug = 'd,optimizer_no_icp' ;
#/* Server 1 : SET SESSION debug = '' */;

#/* Begin test case for query 0 */

--disable_warnings
DROP TABLE /*! IF EXISTS */ CC;
DROP TABLE /*! IF EXISTS */ D;
--enable_warnings

CREATE TABLE `CC` (
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `int_key` (`int_key`),
  KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MEMORY DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (8,'v','v');
INSERT INTO `CC` VALUES (9,'r','r');
INSERT INTO `CC` VALUES (9,'a','a');
INSERT INTO `CC` VALUES (186,'m','m');
INSERT INTO `CC` VALUES (NULL,'y','y');
INSERT INTO `CC` VALUES (2,'j','j');
INSERT INTO `CC` VALUES (3,'d','d');
INSERT INTO `CC` VALUES (0,'z','z');
INSERT INTO `CC` VALUES (133,'e','e');
INSERT INTO `CC` VALUES (1,'h','h');
INSERT INTO `CC` VALUES (8,'b','b');
INSERT INTO `CC` VALUES (5,'s','s');
INSERT INTO `CC` VALUES (5,'e','e');
INSERT INTO `CC` VALUES (8,'j','j');
INSERT INTO `CC` VALUES (6,'e','e');
INSERT INTO `CC` VALUES (51,'f','f');
INSERT INTO `CC` VALUES (4,'v','v');
INSERT INTO `CC` VALUES (7,'x','x');
INSERT INTO `CC` VALUES (6,'m','m');
INSERT INTO `CC` VALUES (4,'c','c');
CREATE TABLE `D` (
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `int_key` (`int_key`),
  KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MEMORY DEFAULT CHARSET=latin1;
INSERT INTO `D` VALUES (NULL,'r','r');
INSERT INTO `D` VALUES (0,'c','c');
INSERT INTO `D` VALUES (0,'o','o');
INSERT INTO `D` VALUES (7,'c','c');
INSERT INTO `D` VALUES (8,'d','d');
INSERT INTO `D` VALUES (4,'v','v');
INSERT INTO `D` VALUES (6,'m','m');
INSERT INTO `D` VALUES (5,'j','j');
INSERT INTO `D` VALUES (NULL,'f','f');
INSERT INTO `D` VALUES (NULL,'n','n');
INSERT INTO `D` VALUES (8,'z','z');
INSERT INTO `D` VALUES (8,'h','h');
INSERT INTO `D` VALUES (8,'q','q');
INSERT INTO `D` VALUES (1,'w','w');
INSERT INTO `D` VALUES (1,'z','z');
INSERT INTO `D` VALUES (5,'j','j');
INSERT INTO `D` VALUES (2,'a','a');
INSERT INTO `D` VALUES (7,'m','m');
INSERT INTO `D` VALUES (6,'n','n');
INSERT INTO `D` VALUES (4,'e','e');
INSERT INTO `D` VALUES (7,'u','u');
INSERT INTO `D` VALUES (0,'s','s');
INSERT INTO `D` VALUES (9,'u','u');
INSERT INTO `D` VALUES (3,'r','r');
INSERT INTO `D` VALUES (5,'g','g');
INSERT INTO `D` VALUES (1,'o','o');
INSERT INTO `D` VALUES (1,'w','w');
INSERT INTO `D` VALUES (5,'b','b');
INSERT INTO `D` VALUES (9,NULL,NULL);
INSERT INTO `D` VALUES (2,'y','y');
INSERT INTO `D` VALUES (5,'y','y');
INSERT INTO `D` VALUES (248,'u','u');
INSERT INTO `D` VALUES (0,'p','p');
INSERT INTO `D` VALUES (8,'s','s');
INSERT INTO `D` VALUES (1,'e','e');
INSERT INTO `D` VALUES (255,'d','d');
INSERT INTO `D` VALUES (9,'d','d');
INSERT INTO `D` VALUES (9,'c','c');
INSERT INTO `D` VALUES (3,'b','b');
INSERT INTO `D` VALUES (9,'t','t');
INSERT INTO `D` VALUES (6,NULL,NULL);
INSERT INTO `D` VALUES (4,'y','y');
INSERT INTO `D` VALUES (60,'c','c');
INSERT INTO `D` VALUES (7,'d','d');
INSERT INTO `D` VALUES (1,'x','x');
INSERT INTO `D` VALUES (6,'p','p');
INSERT INTO `D` VALUES (4,'e','e');
INSERT INTO `D` VALUES (NULL,'g','g');
INSERT INTO `D` VALUES (8,'x','x');
INSERT INTO `D` VALUES (0,'s','s');
INSERT INTO `D` VALUES (8,'e','e');
INSERT INTO `D` VALUES (151,'l','l');
INSERT INTO `D` VALUES (7,'p','p');
INSERT INTO `D` VALUES (6,'h','h');
INSERT INTO `D` VALUES (NULL,'m','m');
INSERT INTO `D` VALUES (23,'n','n');
INSERT INTO `D` VALUES (2,'v','v');
INSERT INTO `D` VALUES (4,'b','b');
INSERT INTO `D` VALUES (NULL,'x','x');
INSERT INTO `D` VALUES (NULL,'r','r');
INSERT INTO `D` VALUES (77,'t','t');
INSERT INTO `D` VALUES (NULL,'w','w');
INSERT INTO `D` VALUES (NULL,'w','w');
INSERT INTO `D` VALUES (7,'k','k');
INSERT INTO `D` VALUES (1,'a','a');
INSERT INTO `D` VALUES (9,'t','t');
INSERT INTO `D` VALUES (6,'z','z');
INSERT INTO `D` VALUES (2,'e','e');
INSERT INTO `D` VALUES (3,'q','q');
INSERT INTO `D` VALUES (0,'e','e');
INSERT INTO `D` VALUES (NULL,'v','v');
INSERT INTO `D` VALUES (6,'d','d');
INSERT INTO `D` VALUES (3,'u','u');
INSERT INTO `D` VALUES (195,'o','o');
INSERT INTO `D` VALUES (5,'b','b');
INSERT INTO `D` VALUES (2,'c','c');
INSERT INTO `D` VALUES (7,'q','q');
INSERT INTO `D` VALUES (25,NULL,NULL);
INSERT INTO `D` VALUES (NULL,'h','h');
INSERT INTO `D` VALUES (0,'d','d');
INSERT INTO `D` VALUES (98,'w','w');
INSERT INTO `D` VALUES (6,'m','m');
INSERT INTO `D` VALUES (5,'i','i');
INSERT INTO `D` VALUES (0,'w','w');
INSERT INTO `D` VALUES (3,'f','f');
INSERT INTO `D` VALUES (1,'k','k');
INSERT INTO `D` VALUES (1,'v','v');
INSERT INTO `D` VALUES (147,'c','c');
INSERT INTO `D` VALUES (3,'y','y');
INSERT INTO `D` VALUES (3,'h','h');
INSERT INTO `D` VALUES (NULL,NULL,NULL);
INSERT INTO `D` VALUES (2,'t','t');
INSERT INTO `D` VALUES (1,'l','l');
INSERT INTO `D` VALUES (8,'a','a');
INSERT INTO `D` VALUES (8,'r','r');
INSERT INTO `D` VALUES (8,'s','s');
INSERT INTO `D` VALUES (0,'z','z');
INSERT INTO `D` VALUES (1,'j','j');
INSERT INTO `D` VALUES (8,'c','c');
INSERT INTO `D` VALUES (5,'f','f');

 
SELECT (  
SELECT  COUNT(  DISTINCT SUBQUERY4_t1 .`int_key`  )  
FROM CC SUBQUERY4_t1  JOIN CC SUBQUERY4_t2  ON SUBQUERY4_t2 .`varchar_nokey`  = SUBQUERY4_t1 .`varchar_key`  
WHERE SUBQUERY4_t2 .`varchar_nokey`  = table1 .`varchar_nokey`  )  
FROM (  
SELECT * 
FROM D  ) table1   ;

#/* Diff: */

#/* --- /tmp//randgen16286-1258569305-server0.dump	2009-11-18 13:35:05.000000000 -0500
# +++ /tmp//randgen16286-1258569305-server1.dump	2009-11-18 13:35:05.000000000 -0500
# @@ -4,6 +4,41 @@
#  0
#  0
#  0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
# +0
#  1
#  1
#  1
# @@ -46,33 +81,6 @@
#  1
#  1
#  1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -1
# -2
#  2
#  2
#  2
# @@ -84,14 +92,6 @@
#  2
#  2
#  2
# -2
# -2
# -2
# -3
# -3
# -3
# -3
# -3
#  3
#  3
#  3 */

DROP TABLE CC;
DROP TABLE D;
#/* End of test case for query 0 */
[20 Nov 2009 16:10] Patrick Crews
UPDATE:  MySQL 6.0 (based on the codebase-bugfixing tree) is not in agreement with Postgres or JavaDB on this query, thus indicating that 6.0 is incorrect (it is off from 5.0, 5.1, javadb, and postgres)
[8 Dec 2009 14:43] Øystein Grøvlen
6.0 result does not depend on optimizer_switch settings.
Query plan is same for both 5.1 and 6.0:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	<derived3>	ALL	NULL	NULL	NULL	NULL	100	100.00	
3	DERIVED	D	ALL	NULL	NULL	NULL	NULL	100	100.00	
2	DEPENDENT SUBQUERY	SUBQUERY4_t1	ALL	varchar_key	NULL	NULL	NULL	20	75.00	Using where
2	DEPENDENT SUBQUERY	SUBQUERY4_t2	ALL	NULL	NULL	NULL	NULL	20	100.00	Using where; Using join buffer
Warnings:
Note	1276	Field or reference 'table1.varchar_nokey' of SELECT #2 was resolved in SELECT #1
Note	1003	select (select count(distinct `test`.`SUBQUERY4_t1`.`int_key`) AS `COUNT(  DISTINCT SUBQUERY4_t1 .``int_key``  )` from `test`.`CC` `SUBQUERY4_t1` join `test`.`CC` `SUBQUERY4_t2` where ((`test`.`SUBQUERY4_t2`.`varchar_nokey` = `test`.`SUBQUERY4_t1`.`varchar_key`) and (`test`.`SUBQUERY4_t1`.`varchar_key` = `table1`.`varchar_nokey`))) AS `(  
SELECT  COUNT(  DISTINCT SUBQUERY4_t1 .``int_key``  )  
FROM CC SUBQUERY4_t1  JOIN CC SUBQUERY4_t2  ON SUBQUERY4_t2 .``varchar_nokey``  =
SUBQUERY4_t1 .``varchar_key``  
WHERE SUBQUERY4_t2 .``varchar_nokey``  = table1 .``varchar_nokey``  )` from (select `test`.`D`.`int_key` AS `int_key`,`test`.`D`.`varchar_key` AS `varchar_key`,`test`.`D`.`varchar_nokey` AS `varchar_nokey` from `test`.`D`) `table1`
[21 Dec 2009 10:13] Jørgen Løland
Reduced test case:
------------------
SET SESSION debug = 'd,optimizer_no_icp' ;
CREATE TABLE t1 (
i int(11) DEFAULT NULL,
v varchar(1) DEFAULT NULL,
KEY i_key (i)
);
INSERT INTO t1 VALUES (8,'v');
INSERT INTO t1 VALUES (9,'r');
INSERT INTO t1 VALUES (NULL,'y');
CREATE TABLE t2 (
i int(11) DEFAULT NULL,
v varchar(1) DEFAULT NULL,
KEY i_key (i)
);
INSERT INTO t2 VALUES (NULL,'r');
INSERT INTO t2 VALUES (0,'c');
INSERT INTO t2 VALUES (0,'o');
INSERT INTO t2 VALUES (7,'c');

--echo Failing query without count:
SELECT i, v, (SELECT DISTINCT i FROM t1 WHERE v  = t2.v) as subsel FROM t2;

--echo Failing query:
SELECT i, v, (SELECT COUNT(DISTINCT i) FROM t1 WHERE v  = t2.v) as subsel FROM t2;

Output:
-------
Failing query without count:
SELECT i, v, (SELECT DISTINCT i FROM t1 WHERE v  = t2.v) as subsel FROM t2;
i	v	subsel
NULL	r	9
0	c	NULL
0	o	NULL
7	c	NULL

Failing query:
SELECT i, v, (SELECT COUNT(DISTINCT i) FROM t1 WHERE v  = t2.v) as subsel FROM t2;
i	v	subsel
NULL	r	1
0	c	1
0	o	1
7	c	1

The problem:
------------
The count() function returns 1 for count(NULL), but correct output should be 0 like in 5.1: 

SELECT i, v, (SELECT COUNT(DISTINCT i) FROM t1 WHERE v  = t2.v) as subsel FROM t2;
i	v	subsel
0	c	0
0	o	0
7	c	0
 
This problem does not appear for other count(NULL)s:

SELECT COUNT(NULL);
COUNT(NULL)
0
[21 Dec 2009 10:15] Jørgen Løland
Oops... cut'n paste error. Output from 5.1 also contains the "NULL record":

SELECT i, v, (SELECT COUNT(DISTINCT i) FROM t1 WHERE v  = t2.v) as subsel FROM t2;
i	v	subsel
NULL	r	1
0	c	0
0	o	0
7	c	0
[22 Dec 2009 14:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95377

3792 Jorgen Loland	2009-12-22
      Bug#48920: COUNT DISTINCT returns 1 for NULL values when in a 
                 subquery in the select list
      
      When a dependent subquery with count(distinct <col>) was 
      evaluated multiple times, the Distinct_Aggregator was reused. 
      However, the Aggregator was not reset, so when the subquery was
      evaluated for the next record in the outer select, old dependent
      info was used.
      
      The fix is to clear() the existing aggregator in 
      Item_sum::set_aggregator(). This ensures that the aggregator is
      reevaluated with the new dependent information.
      if the agg
     @ mysql-test/r/subselect3.result
        Added test case for BUG#48920
     @ mysql-test/r/subselect3_jcl6.result
        Added test case for BUG#48920
     @ mysql-test/t/subselect3.test
        Added test case for BUG#48920
     @ sql/item_sum.cc
        If an aggregator exists when Item_sum::set_aggregator() is
        called (i.e., set_aggregator is called in a dependent
        subquery), the aggregator is reset so that the aggregator is
        reevaluated with the dependent information from the outer
        record being evaluated.
[23 Dec 2009 8:36] Tor Didriksen
You should commit it to next-mr-bugfixing.
[4 Jan 2010 9:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95868

2955 Jorgen Loland	2010-01-04
      Bug#48920: COUNT DISTINCT returns 1 for NULL values when in a 
                 subquery in the select list
            
      When a dependent subquery with count(distinct <col>) was 
      evaluated multiple times, the Distinct_Aggregator was reused. 
      However, the Aggregator was not reset, so when the subquery was
      evaluated for the next record in the outer select, old dependent
      info was used.
            
      The fix is to clear() the existing aggregator in 
      Item_sum::set_aggregator(). This ensures that the aggregator is
      reevaluated with the new dependent information.
     @ mysql-test/r/subselect3.result
        Added test case for BUG#48920
     @ mysql-test/t/subselect3.test
        Added test case for BUG#48920
     @ sql/item_sum.cc
        If an aggregator exists when Item_sum::set_aggregator() is
        called (i.e., set_aggregator is called in a dependent
        subquery), the aggregator is reset so that the aggregator is
        reevaluated with the dependent information from the outer
        record being evaluated.
[4 Jan 2010 9:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95871

3805 Jorgen Loland	2010-01-04 [merge]
      Merge BUG#48920 from next-mr-bugfixing to 6.0-codebase-bugfixing
[4 Jan 2010 9:51] Jørgen Løland
Pushed to mysql-next-mr-bugfixing and merged to mysql-6.0-codebase-bugfixing
[12 Jan 2010 16:25] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100112162328-2sblcul1kl08bbib) (version source revid:guilhem@mysql.com-20100108092756-k0zzf4kvx9b7bh38) (merge vers: 6.0.14-alpha) (pib:15)
[12 Jan 2010 16:25] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100112162142-nrjoxn6h2nrnh2o0) (version source revid:guilhem@mysql.com-20100108092756-k0zzf4kvx9b7bh38) (pib:15)
[23 Jan 2010 0:28] Paul DuBois
Noted in 6.0.14 changelog.

A dependent subquery containing COUNT(DISTINCT col_name)) could be
evaluated incorrectly. 

Setting report to NDI pending push to Celosia.
[6 Mar 2010 10:57] Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:guilhem@mysql.com-20100108092756-k0zzf4kvx9b7bh38) (merge vers: 6.0.14-alpha) (pib:16)
[7 Mar 2010 1:32] Paul DuBois
Noted in 5.5.3 changelog.