Bug #71028 error result when "count + distinct + case when" need merge_walk
Submitted: 27 Nov 2013 21:26 Modified: 14 Jan 2014 18:21
Reporter: xiaobin lin (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:5.5+ OS:Any
Assigned to: Guilhem Bichot CPU Architecture:Any
Tags: case when, count, distinct, regression, tmp_table_size

[27 Nov 2013 21:26] xiaobin lin
Description:
Consider a sql like:
select count(distinct case when xxx then f end) from tb;

When tmp_table_size is too small to fill all the rows, merge_walk needed.

But when merge_walk, it check the condition again, using THE LAST row in the match result set.
This leads to bug when the "case when" condition return null in the last row, it will lead to the result as 0, error result.

How to repeat:
drop table if exists tb;
set tmp_table_size=1024;
create table tb(id int auto_increment primary key, v varchar(32))engine=myisam charset=gbk;
insert into tb(v) values("aaa");
insert into tb(v) (select v from tb);
insert into tb(v) (select v from tb);
insert into tb(v) (select v from tb);
insert into tb(v) (select v from tb);
insert into tb(v) (select v from tb);
insert into tb(v) (select v from tb);

update tb set v=concat(v, id);
select count(distinct case when id<=64 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=64 then id end) |
+----------------------------------------------+
|                                           64 |
+----------------------------------------------+
select count(distinct case when id<=63 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=63 then id end) |
+----------------------------------------------+
|                                            0 |
+----------------------------------------------+

Suggested fix:
A geek method shown in the patch file. Looking forward to other methods :)
[27 Nov 2013 21:28] xiaobin lin
base on 5.6

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: merge_waik_bug2.5612.diff (application/octet-stream, text), 406 bytes.

[27 Nov 2013 22:07] MySQL Verification Team
[20:05:04][Pochita:]~ miguel$ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.6.14 MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE DATABASE xy;
Query OK, 1 row affected (0.07 sec)

mysql> USE xy
Database changed
mysql> set tmp_table_size=1024;
Query OK, 0 rows affected (0.00 sec)

mysql> create table tb(id int auto_increment primary key, v varchar(32))engine=myisam charset=gbk;
Query OK, 0 rows affected (0.05 sec)

mysql> insert into tb(v) values("aaa");
Query OK, 1 row affected (0.00 sec)

mysql> insert into tb(v) (select v from tb);
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into tb(v) (select v from tb);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into tb(v) (select v from tb);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into tb(v) (select v from tb);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> insert into tb(v) (select v from tb);
Query OK, 16 rows affected (0.00 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql> insert into tb(v) (select v from tb);
Query OK, 32 rows affected (0.00 sec)
Records: 32  Duplicates: 0  Warnings: 0

mysql>
mysql> update tb set v=concat(v, id);
Query OK, 64 rows affected (0.00 sec)
Rows matched: 64  Changed: 64  Warnings: 0

mysql> select count(distinct case when id<=64 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=64 then id end) |
+----------------------------------------------+
|                                           64 |
+----------------------------------------------+
1 row in set (0.00 sec)

mysql> select count(distinct case when id<=63 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=63 then id end) |
+----------------------------------------------+
|                                            0 |
+----------------------------------------------+
1 row in set (0.00 sec)
[27 Nov 2013 22:12] MySQL Verification Team
c:\dbs>c:\dbs\5.1\bin\mysql -uroot --port=3510 --prompt="mysql 5.1 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.74-Win X64 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.1 > CREATE DATABASE xy;
Query OK, 1 row affected (0.03 sec)

mysql 5.1 > USE xy;
Database changed
mysql 5.1 > set tmp_table_size=1024;
Query OK, 0 rows affected (0.00 sec)

mysql 5.1 > create table tb(id int auto_increment primary key, v varchar(32))engine=myisam charset=gbk;
Query OK, 0 rows affected (0.08 sec)

mysql 5.1 > insert into tb(v) values("aaa");
Query OK, 1 row affected (0.01 sec)

mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 16 rows affected (0.00 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 32 rows affected (0.08 sec)
Records: 32  Duplicates: 0  Warnings: 0

mysql 5.1 >
mysql 5.1 > update tb set v=concat(v, id);
Query OK, 64 rows affected (0.00 sec)
Rows matched: 64  Changed: 64  Warnings: 0

mysql 5.1 > select count(distinct case when id<=64 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=64 then id end) |
+----------------------------------------------+
|                                           64 |
+----------------------------------------------+
1 row in set (0.00 sec)

mysql 5.1 > select count(distinct case when id<=63 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=63 then id end) |
+----------------------------------------------+
|                                           63 |
+----------------------------------------------+
1 row in set (0.00 sec)

mysql 5.1 >
[27 Nov 2013 22:19] MySQL Verification Team
Thank you for the bug report and contribution.
[27 Nov 2013 22:29] MySQL Verification Team
Forgot to paste:

c:\dbs>c:\dbs\5.5\bin\mysql -uroot --port=3550 --prompt="mysql 5.5 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.36 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.5 > CREATE DATABASE xy;
Query OK, 1 row affected (0.02 sec)

mysql 5.5 > USE xy;
Database changed
mysql 5.5 > set tmp_table_size=1024;
Query OK, 0 rows affected (0.00 sec)

mysql 5.5 > create table tb(id int auto_increment primary key, v varchar(32))engine=myisam charset=gbk;
Query OK, 0 rows affected (0.13 sec)

mysql 5.5 > insert into tb(v) values("aaa");
Query OK, 1 row affected (0.00 sec)

mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 1 row affected (0.02 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 16 rows affected (0.00 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 32 rows affected (0.09 sec)
Records: 32  Duplicates: 0  Warnings: 0

mysql 5.5 >
mysql 5.5 > update tb set v=concat(v, id);
Query OK, 64 rows affected (0.00 sec)
Rows matched: 64  Changed: 64  Warnings: 0

mysql 5.5 > select count(distinct case when id<=64 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=64 then id end) |
+----------------------------------------------+
|                                           64 |
+----------------------------------------------+
1 row in set (0.00 sec)

mysql 5.5 > select count(distinct case when id<=63 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=63 then id end) |
+----------------------------------------------+
|                                            0 |
+----------------------------------------------+
1 row in set (0.00 sec)

mysql 5.5 >
[28 Nov 2013 13:29] Guilhem Bichot
Hello. Coincidence: I have been working on this issue two weeks ago (http://bugs.mysql.com/bug.php?id=68749 ); the patch I made is under review currently. If it is accepted, I'll add your testcase to it.
I looked at the "geek method" - thanks for contributing code! It shines by its compactness - one line! However, I prefer to not change Item::maybe_null because this boolean is supposed to be set during query resolution (it tells if an expression may have a NULL value), and to not change during execution.
Imagine we change it during execution, then execute the statement a second time (which is possible, if it is part of a stored procedure, or a prepared statement). Then the second execution will start with maybe_null==false, which is incorrect (the aggregated item may be NULL, after all; it's only after going through the Unique tree, that values in the tree cannot be NULL). With the current code base, we re-resolve on second execution, so maybe_null will be set back to "true" in fix_length_and_dec(), so it will work. But in the future, it would be more efficient to not re-resolve (why recompute some information which is not changing between executions, like NULLability, number of decimals of DECIMAL results, number of selected columns...). And if we don't re-resolve anymore, maybe_null won't be corrected.
Nevertheless, reviewers will look at your patch too.
[2 Dec 2013 12:50] Guilhem Bichot
Hello Xiaobin. I'm attaching the patch which has just been approved by two reviewers.
[2 Dec 2013 12:51] Guilhem Bichot
approved patch

Attachment: patch.txt (text/plain), 13.86 KiB.

[3 Dec 2013 15:02] xiaobin lin
hi, Guilhem 
 It looks good to me,  Thank you.
[14 Jan 2014 18:21] Paul DuBois
Noted in 5.5.36, 5.6.16, 5.7.4 changelogs.

COUNT(DISTINCT) sometimes produced an incorrect result when the last
read row contained a NULL value.
[2 Feb 2014 17:38] Laurynas Biveinis
5.5$ bzr log -r 4557
------------------------------------------------------------
revno: 4557
committer: Guilhem Bichot <guilhem.bichot@oracle.com>
branch nick: 5.5
timestamp: Wed 2013-12-04 12:32:42 +0100
message:
  Bug#16539979 - BASIC SELECT COUNT(DISTINCT ID) IS BROKEN
  Bug#17867117 - ERROR RESULT WHEN "COUNT + DISTINCT + CASE WHEN" NEED MERGE_WALK 
  
  Problem:
  COUNT DISTINCT gives incorrect result when it uses a Unique
  Tree and its last inserted record has null value.
  
  Here is how COUNT DISTINCT is processed, given that this query is not
  using loose index scan.
  
  When a row is produced as a result of joining tables (there is only
  one table here), we store the SELECTed value in a Unique tree. This
  allows elimination of any duplicates, and thus implements DISTINCT.
  
  When we have processed all rows like this, we walk the Unique tree,
  counting its elements, in Aggregator_distinct::endup() (tree->walk());
  for each element we call Item_sum_count::add(). Such function wants to
  ignore any NULL value, for that it checks item_sum -> args[0] ->
  null_value. It is a mistake: when walking the Unique tree, the value
  to be aggregated is not item_sum ->args[0] but rather table ->
  field[0].
  
  Solution:
  instead of item_sum -> args[0] -> null_value, use arg_is_null(), which
  knows where to look (like in fix for bug 57932).
  
  As a consequence of this solution, we have to make arg_is_null() a
  little more general:
  1) Because it was so far only used for AVG() (which always has a
  single argument), this function was looking at a single argument; now
  that it has to work with COUNT(DISTINCT expression1,expression2), it
  must look at all arguments.
  2) Because we start using arg_is_null () for COUNT(DISTINCT), i.e. in
  Item_sum_count::add (), it implies that we are also using it for
  COUNT(no DISTINCT) (same add ()). For COUNT(no DISTINCT), the
  nullness to check is that of item_sum -> args[0]. But the null_value
  of such item is reliable only if val_*() has been called on it. So far
  arg_is_null() was always used after a call to arg_val*(), so could
  rely on null_value; but for COUNT, there is no call to arg_val*(), so
  arg_is_null() has to call is_null() instead.
  
  Testcase for 16539979 by Neeraj. Testcase for 17867117 contributed by
  Xiaobin Lin from Taobao.