Bug #68749 Basic SELECT count(distinct id) is broken.
Submitted: 22 Mar 2013 18:47 Modified: 4 Dec 2013 22:23
Reporter: Igor Babaev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.5, 5.6 OS:Any
Assigned to: CPU Architecture:Any

[22 Mar 2013 18:47] Igor Babaev
Description:
SELECT COUNT(DISTINCT id) FROM t1 can return an incorrect result in any release of 5.5/5.6 if there is a row in t1 with id=NULL:

mysql> SELECT SQL_NO_CACHE count(*) sm FROM t2;
+---------+
| sm      |
+---------+
| 2097153 |
+---------+
1 row in set (0.00 sec)

mysql> SELECT SQL_NO_CACHE DISTINCT id FROM t2 LIMIT 1;
+------+
| id   |
+------+
|   10 |
+------+
1 row in set (0.01 sec)

mysql> SET @@tmp_table_size=200000;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @@max_heap_table_size=200000;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SELECT SQL_NO_CACHE count(DISTINCT id) sm FROM t2;
+----+
| sm |
+----+
|  0 |
+----+
1 row in set (3.96 sec)

How to repeat:
The following test case will allow you to reproduce the problem:

CREATE TABLE t1 (id INTEGER) ENGINE=InnoDB;
CREATE TABLE t2 (id INTEGER) ENGINE=InnoDB;

INSERT INTO t1 (id) VALUES (1), (1), (1),(1);
INSERT INTO t1 (id) SELECT id FROM t1;
INSERT INTO t1 (id) SELECT id FROM t1;
INSERT INTO t1 (id) SELECT id FROM t1;
INSERT INTO t1 (id) SELECT id FROM t1;
INSERT INTO t1 (id) SELECT id FROM t1;
INSERT INTO t1 SELECT id+1 FROM t1;
INSERT INTO t1 SELECT id+2 FROM t1;
INSERT INTO t1 SELECT id+4 FROM t1;
INSERT INTO t1 SELECT id+8 FROM t1;
INSERT INTO t1 SELECT id+16 FROM t1;
INSERT INTO t1 SELECT id+32 FROM t1;
INSERT INTO t1 SELECT id+64 FROM t1;
INSERT INTO t1 SELECT id+128 FROM t1;
INSERT INTO t1 SELECT id+256 FROM t1;
INSERT INTO t1 SELECT id+512 FROM t1;
INSERT INTO t1 SELECT id+1024 FROM t1;
INSERT INTO t1 SELECT id+2048 FROM t1;
INSERT INTO t1 SELECT id+4096 FROM t1;
INSERT INTO t1 SELECT id+8192 FROM t1;

INSERT INTO t2 SELECT id FROM t1 ORDER BY id*rand();
INSERT INTO t2 VALUE(NULL);

SET @tmp_table_size_saved = @@tmp_table_size;
SET @max_heap_table_size_saved = @@max_heap_table_size;

--echo # With default tmp_table_size / max_heap_table_size
SELECT SQL_NO_CACHE count(DISTINCT id) sm FROM t2;

SET @@tmp_table_size=1024*256;

--echo # With reduced tmp_table_size
SELECT SQL_NO_CACHE count(DISTINCT id) sm FROM t2;

SET @@tmp_table_size=@tmp_table_size_saved;
SET @@max_heap_table_size=1024*256;

--echo # With reduced max_heap_table_size
SELECT SQL_NO_CACHE count(DISTINCT id) sm FROM t2;

SET @@max_heap_table_size=@max_heap_table_size_saved;

--echo # Back to default tmp_table_size / max_heap_table_size
SELECT SQL_NO_CACHE count(DISTINCT id) sm FROM t2;

DROP TABLE t1,t2;
[23 Mar 2013 19:36] MySQL Verification Team
Hello Igor,

Thank you for the report.
Verified as described.

Thanks,
Umesh
[4 Dec 2013 22:23] Paul DuBois
Noted in 5.5.36, 5.6.16, 5.7.4 changelogs.

COUNT(DISTINCT) produces an incorrect result when it uses a Unique
Tree and its last inserted row has a NULL value.
[5 Dec 2013 15:03] Paul DuBois
Revised changelog entry:

COUNT(DISTINCT) sometimes produced an incorrect result when the last
read row contained a NULL value.
[2 Feb 2014 17:37] Laurynas Biveinis
5.5$ bzr log -r 4557
------------------------------------------------------------
revno: 4557
committer: Guilhem Bichot <guilhem.bichot@oracle.com>
branch nick: 5.5
timestamp: Wed 2013-12-04 12:32:42 +0100
message:
  Bug#16539979 - BASIC SELECT COUNT(DISTINCT ID) IS BROKEN
  Bug#17867117 - ERROR RESULT WHEN "COUNT + DISTINCT + CASE WHEN" NEED MERGE_WALK 
  
  Problem:
  COUNT DISTINCT gives incorrect result when it uses a Unique
  Tree and its last inserted record has null value.
  
  Here is how COUNT DISTINCT is processed, given that this query is not
  using loose index scan.
  
  When a row is produced as a result of joining tables (there is only
  one table here), we store the SELECTed value in a Unique tree. This
  allows elimination of any duplicates, and thus implements DISTINCT.
  
  When we have processed all rows like this, we walk the Unique tree,
  counting its elements, in Aggregator_distinct::endup() (tree->walk());
  for each element we call Item_sum_count::add(). Such function wants to
  ignore any NULL value, for that it checks item_sum -> args[0] ->
  null_value. It is a mistake: when walking the Unique tree, the value
  to be aggregated is not item_sum ->args[0] but rather table ->
  field[0].
  
  Solution:
  instead of item_sum -> args[0] -> null_value, use arg_is_null(), which
  knows where to look (like in fix for bug 57932).
  
  As a consequence of this solution, we have to make arg_is_null() a
  little more general:
  1) Because it was so far only used for AVG() (which always has a
  single argument), this function was looking at a single argument; now
  that it has to work with COUNT(DISTINCT expression1,expression2), it
  must look at all arguments.
  2) Because we start using arg_is_null () for COUNT(DISTINCT), i.e. in
  Item_sum_count::add (), it implies that we are also using it for
  COUNT(no DISTINCT) (same add ()). For COUNT(no DISTINCT), the
  nullness to check is that of item_sum -> args[0]. But the null_value
  of such item is reliable only if val_*() has been called on it. So far
  arg_is_null() was always used after a call to arg_val*(), so could
  rely on null_value; but for COUNT, there is no call to arg_val*(), so
  arg_is_null() has to call is_null() instead.
  
  Testcase for 16539979 by Neeraj. Testcase for 17867117 contributed by
  Xiaobin Lin from Taobao.
[4 Mar 2014 21:22] Barb Trout
We are using MySQL Percona Server 5.6.15-56.  I believe we are experiencing this bug.  Has this bug been fixed?  If so, what version(s) was it fixed in?  If not, is it planned to be in a future release?
[5 Mar 2014 8:16] Guilhem Bichot
Hi If you scroll above in the report:

   [4 Dec 2013 22:23] Paul Dubois
   Noted in 5.5.36, 5.6.16, 5.7.4 changelogs.

So: fixed in MySQL 5.6.16 which is available on dev.mysql.com. For Percona, I don't know.