Bug #69841 SELECT COUNT(DISTINCT a,b) incorrectly counts rows containing NULL
Submitted: 25 Jul 2013 23:41 Modified: 1 Oct 2013 18:37
Reporter: Arthur O'Dwyer Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.5/5.6 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression
Triage: Needs Triage: D2 (Serious)

[25 Jul 2013 23:41] Arthur O'Dwyer
Description:
Contrary to MySQL's documentation,
SELECT COUNT(DISTINCT x,y,...) incorrectly counts rows where y,... are NULL --- even though it is documented not to count "rows containing NULL" --- if y,... are part of a unique key. Adding and removing this key causes the output of COUNT DISTINCT to change.

(Documentation reference -- http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct says: "In MySQL, you can obtain the number of distinct expression combinations that do not contain NULL by giving a list of expressions.")

How to repeat:
CREATE TABLE t (a INT, b INT);
INSERT INTO t VALUES (1,2), (1,NULL), (NULL,2), (NULL,NULL);
SELECT DISTINCT a,b FROM t;
SELECT COUNT(*) FROM (SELECT DISTINCT a,b FROM t) AS tt;
SELECT COUNT(DISTINCT a,b) FROM t;

-- This query correctly produces
+---------------------+
| COUNT(DISTINCT a,b) |
+---------------------+
|                   1 |
+---------------------+

-- Now add a unique key and try again...

ALTER TABLE t ADD KEY k(a,b);
SELECT DISTINCT a,b FROM t;
SELECT COUNT(*) FROM (SELECT DISTINCT a,b FROM t) AS tt;
SELECT COUNT(DISTINCT a,b) FROM t;

-- This last query incorrectly produces
+---------------------+
| COUNT(DISTINCT a,b) |
+---------------------+
|                   2 |
+---------------------+

The row being counted incorrectly is (1,NULL).
[26 Jul 2013 1:09] Miguel Solorzano
Thank you for the bug report.

mysql 5.0 > ALTER TABLE t ADD KEY k(a,b);
Query OK, 4 rows affected (0.10 sec)
Records: 4  Duplicates: 0  Warnings: 0

<cut>

mysql 5.0 > SELECT COUNT(DISTINCT a,b) FROM t;
+---------------------+
| COUNT(DISTINCT a,b) |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0.00 sec)

mysql 5.0 > SHOW VARIABLES LIKE "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 5.0.97-Win X64      |
| version_comment         | Source distribution |
| version_compile_machine | unknown             |
| version_compile_os      | Win64               |
+-------------------------+---------------------+
5 rows in set (0.00 sec)

------------------------------------------------------

mysql 5.1 > ALTER TABLE t ADD KEY k(a,b);
Query OK, 4 rows affected (0.11 sec)
Records: 4  Duplicates: 0  Warnings: 0

<cut>

mysql 5.1 > SELECT COUNT(DISTINCT a,b) FROM t;
+---------------------+
| COUNT(DISTINCT a,b) |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0.00 sec)

mysql 5.1 > SHOW VARIABLES LIKE "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 5.1.72-Win X64      |
| version_comment         | Source distribution |
| version_compile_machine | unknown             |
| version_compile_os      | Win64               |
+-------------------------+---------------------+
5 rows in set (0.00 sec)

mysql 5.1 >
------------------------------------------------------

mysql 5.5 > ALTER TABLE t ADD KEY k(a,b);
Query OK, 0 rows affected (0.17 sec)
Records: 0  Duplicates: 0  Warnings: 0

<CUT>

mysql 5.5 > SELECT COUNT(DISTINCT a,b) FROM t;
+---------------------+
| COUNT(DISTINCT a,b) |
+---------------------+
|                   2 |
+---------------------+
1 row in set (0.00 sec)

mysql 5.5 > SHOW VARIABLES LIKE "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| innodb_version          | 5.5.34              |
| protocol_version        | 10                  |
| slave_type_conversions  |                     |
| version                 | 5.5.34              |
| version_comment         | Source distribution |
| version_compile_machine | AMD64               |
| version_compile_os      | Win64               |
+-------------------------+---------------------+
7 rows in set (0.03 sec)

mysql 5.5 >

------------------------------------------------------

mysql 5.6 > ALTER TABLE t ADD KEY k(a,b);
Query OK, 0 rows affected (0.50 sec)
Records: 0  Duplicates: 0  Warnings: 0

<cut>

mysql 5.6 > SELECT COUNT(DISTINCT a,b) FROM t;
+---------------------+
| COUNT(DISTINCT a,b) |
+---------------------+
|                   2 |
+---------------------+
1 row in set (0.00 sec)

mysql 5.6 > SHOW VARIABLES LIKE "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| innodb_version          | 5.6.14              |
| protocol_version        | 10                  |
| slave_type_conversions  |                     |
| version                 | 5.6.14              |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | Win64               |
+-------------------------+---------------------+
7 rows in set (0.00 sec)

mysql 5.6 >
[1 Oct 2013 18:37] Paul Dubois
Noted in 5.5.35, 5.6.15, 5.7.3 changelogs.

COUNT(DISTINCT) should not count NULL values, but they were counted
when the optimizer used Loose Index Scan.
[4 Dec 2013 9:05] Laurynas Biveinis
5.5$ bzr log -r 4465
------------------------------------------------------------
revno: 4465
committer: Neeraj Bisht <neeraj.x.bisht@oracle.com>
branch nick: 5.5
timestamp: Wed 2013-09-04 10:45:55 +0530
message:
  Bug#17222452 - SELECT COUNT(DISTINCT A,B) INCORRECTLY COUNTS ROWS 
  	       CONTAINING NULL
  
  Problem:-
  In MySQL, We can obtain the number of distinct expression
  combinations that do not contain NULL by giving a list of 
  expressions in COUNT(DISTINCT).
  However rows with NULL values are
  incorrectly included in the count when loose index scan is 
  used.
  
  Analysis:-
  In case of loose index scan, we check whether the field is null or 
  not and increase the count in Item_sum_count::add().
  But there we are checking for the first field in COUNT(DISTINCT), 
  not for every field. This is causing an incorrect result.
  
  Solution:-
  Check all field in Item_sum_count::add(), whether there values 
  are null or not. Then only increment the count.
  ******
  Bug#17222452 - SELECT COUNT(DISTINCT A,B) INCORRECTLY COUNTS ROWS 
  	       CONTAINING NULL
  
  Problem:-
  In MySQL, We can obtain the number of distinct expression
  combinations that do not contain NULL by giving a list of 
  expressions in COUNT(DISTINCT).
  However rows with NULL values are
  incorrectly included in the count when loose index scan is 
  used.
  
  Analysis:-
  In case of loose index scan, we check whether the field is null or 
  not and increase the count in Item_sum_count::add().
  But there we are checking for the first field in COUNT(DISTINCT), 
  not for every field. This is causing an incorrect result.
  
  Solution:-
  Check all field in Item_sum_count::add(), whether there values 
  are null or not. Then only increment the count.