| Bug #71028 | error result when "count + distinct + case when" need merge_walk | ||
|---|---|---|---|
| Submitted: | 27 Nov 2013 21:26 | Modified: | 14 Jan 2014 18:21 | 
| Reporter: | xiaobin lin (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: DML | Severity: | S3 (Non-critical) | 
| Version: | 5.5+ | OS: | Any | 
| Assigned to: | Guilhem Bichot | CPU Architecture: | Any | 
| Tags: | case when, count, distinct, regression, tmp_table_size | ||
   [27 Nov 2013 21:28]
   xiaobin lin        
  base on 5.6 (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: merge_waik_bug2.5612.diff (application/octet-stream, text), 406 bytes.
   [27 Nov 2013 22:07]
   MySQL Verification Team        
  [20:05:04][Pochita:]~ miguel$ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.6.14 MySQL Community Server (GPL)
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> CREATE DATABASE xy;
Query OK, 1 row affected (0.07 sec)
mysql> USE xy
Database changed
mysql> set tmp_table_size=1024;
Query OK, 0 rows affected (0.00 sec)
mysql> create table tb(id int auto_increment primary key, v varchar(32))engine=myisam charset=gbk;
Query OK, 0 rows affected (0.05 sec)
mysql> insert into tb(v) values("aaa");
Query OK, 1 row affected (0.00 sec)
mysql> insert into tb(v) (select v from tb);
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0
mysql> insert into tb(v) (select v from tb);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0
mysql> insert into tb(v) (select v from tb);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql> insert into tb(v) (select v from tb);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0
mysql> insert into tb(v) (select v from tb);
Query OK, 16 rows affected (0.00 sec)
Records: 16  Duplicates: 0  Warnings: 0
mysql> insert into tb(v) (select v from tb);
Query OK, 32 rows affected (0.00 sec)
Records: 32  Duplicates: 0  Warnings: 0
mysql>
mysql> update tb set v=concat(v, id);
Query OK, 64 rows affected (0.00 sec)
Rows matched: 64  Changed: 64  Warnings: 0
mysql> select count(distinct case when id<=64 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=64 then id end) |
+----------------------------------------------+
|                                           64 |
+----------------------------------------------+
1 row in set (0.00 sec)
mysql> select count(distinct case when id<=63 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=63 then id end) |
+----------------------------------------------+
|                                            0 |
+----------------------------------------------+
1 row in set (0.00 sec)
 
   [27 Nov 2013 22:12]
   MySQL Verification Team        
  c:\dbs>c:\dbs\5.1\bin\mysql -uroot --port=3510 --prompt="mysql 5.1 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.74-Win X64 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql 5.1 > CREATE DATABASE xy;
Query OK, 1 row affected (0.03 sec)
mysql 5.1 > USE xy;
Database changed
mysql 5.1 > set tmp_table_size=1024;
Query OK, 0 rows affected (0.00 sec)
mysql 5.1 > create table tb(id int auto_increment primary key, v varchar(32))engine=myisam charset=gbk;
Query OK, 0 rows affected (0.08 sec)
mysql 5.1 > insert into tb(v) values("aaa");
Query OK, 1 row affected (0.01 sec)
mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0
mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0
mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0
mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 16 rows affected (0.00 sec)
Records: 16  Duplicates: 0  Warnings: 0
mysql 5.1 > insert into tb(v) (select v from tb);
Query OK, 32 rows affected (0.08 sec)
Records: 32  Duplicates: 0  Warnings: 0
mysql 5.1 >
mysql 5.1 > update tb set v=concat(v, id);
Query OK, 64 rows affected (0.00 sec)
Rows matched: 64  Changed: 64  Warnings: 0
mysql 5.1 > select count(distinct case when id<=64 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=64 then id end) |
+----------------------------------------------+
|                                           64 |
+----------------------------------------------+
1 row in set (0.00 sec)
mysql 5.1 > select count(distinct case when id<=63 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=63 then id end) |
+----------------------------------------------+
|                                           63 |
+----------------------------------------------+
1 row in set (0.00 sec)
mysql 5.1 >
 
   [27 Nov 2013 22:19]
   MySQL Verification Team        
  Thank you for the bug report and contribution.
   [27 Nov 2013 22:29]
   MySQL Verification Team        
  Forgot to paste:
c:\dbs>c:\dbs\5.5\bin\mysql -uroot --port=3550 --prompt="mysql 5.5 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.36 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql 5.5 > CREATE DATABASE xy;
Query OK, 1 row affected (0.02 sec)
mysql 5.5 > USE xy;
Database changed
mysql 5.5 > set tmp_table_size=1024;
Query OK, 0 rows affected (0.00 sec)
mysql 5.5 > create table tb(id int auto_increment primary key, v varchar(32))engine=myisam charset=gbk;
Query OK, 0 rows affected (0.13 sec)
mysql 5.5 > insert into tb(v) values("aaa");
Query OK, 1 row affected (0.00 sec)
mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 1 row affected (0.02 sec)
Records: 1  Duplicates: 0  Warnings: 0
mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0
mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0
mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 16 rows affected (0.00 sec)
Records: 16  Duplicates: 0  Warnings: 0
mysql 5.5 > insert into tb(v) (select v from tb);
Query OK, 32 rows affected (0.09 sec)
Records: 32  Duplicates: 0  Warnings: 0
mysql 5.5 >
mysql 5.5 > update tb set v=concat(v, id);
Query OK, 64 rows affected (0.00 sec)
Rows matched: 64  Changed: 64  Warnings: 0
mysql 5.5 > select count(distinct case when id<=64 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=64 then id end) |
+----------------------------------------------+
|                                           64 |
+----------------------------------------------+
1 row in set (0.00 sec)
mysql 5.5 > select count(distinct case when id<=63 then id end) from tb;
+----------------------------------------------+
| count(distinct case when id<=63 then id end) |
+----------------------------------------------+
|                                            0 |
+----------------------------------------------+
1 row in set (0.00 sec)
mysql 5.5 >
 
   [28 Nov 2013 13:29]
   Guilhem Bichot        
  Hello. Coincidence: I have been working on this issue two weeks ago (http://bugs.mysql.com/bug.php?id=68749 ); the patch I made is under review currently. If it is accepted, I'll add your testcase to it. I looked at the "geek method" - thanks for contributing code! It shines by its compactness - one line! However, I prefer to not change Item::maybe_null because this boolean is supposed to be set during query resolution (it tells if an expression may have a NULL value), and to not change during execution. Imagine we change it during execution, then execute the statement a second time (which is possible, if it is part of a stored procedure, or a prepared statement). Then the second execution will start with maybe_null==false, which is incorrect (the aggregated item may be NULL, after all; it's only after going through the Unique tree, that values in the tree cannot be NULL). With the current code base, we re-resolve on second execution, so maybe_null will be set back to "true" in fix_length_and_dec(), so it will work. But in the future, it would be more efficient to not re-resolve (why recompute some information which is not changing between executions, like NULLability, number of decimals of DECIMAL results, number of selected columns...). And if we don't re-resolve anymore, maybe_null won't be corrected. Nevertheless, reviewers will look at your patch too.
   [2 Dec 2013 12:50]
   Guilhem Bichot        
  Hello Xiaobin. I'm attaching the patch which has just been approved by two reviewers.
   [2 Dec 2013 12:51]
   Guilhem Bichot        
  approved patch
Attachment: patch.txt (text/plain), 13.86 KiB.
   [3 Dec 2013 15:02]
   xiaobin lin        
  hi, Guilhem It looks good to me, Thank you.
   [14 Jan 2014 18:21]
   Paul DuBois        
  Noted in 5.5.36, 5.6.16, 5.7.4 changelogs. COUNT(DISTINCT) sometimes produced an incorrect result when the last read row contained a NULL value.
   [2 Feb 2014 17:38]
   Laurynas Biveinis        
  5.5$ bzr log -r 4557 ------------------------------------------------------------ revno: 4557 committer: Guilhem Bichot <guilhem.bichot@oracle.com> branch nick: 5.5 timestamp: Wed 2013-12-04 12:32:42 +0100 message: Bug#16539979 - BASIC SELECT COUNT(DISTINCT ID) IS BROKEN Bug#17867117 - ERROR RESULT WHEN "COUNT + DISTINCT + CASE WHEN" NEED MERGE_WALK Problem: COUNT DISTINCT gives incorrect result when it uses a Unique Tree and its last inserted record has null value. Here is how COUNT DISTINCT is processed, given that this query is not using loose index scan. When a row is produced as a result of joining tables (there is only one table here), we store the SELECTed value in a Unique tree. This allows elimination of any duplicates, and thus implements DISTINCT. When we have processed all rows like this, we walk the Unique tree, counting its elements, in Aggregator_distinct::endup() (tree->walk()); for each element we call Item_sum_count::add(). Such function wants to ignore any NULL value, for that it checks item_sum -> args[0] -> null_value. It is a mistake: when walking the Unique tree, the value to be aggregated is not item_sum ->args[0] but rather table -> field[0]. Solution: instead of item_sum -> args[0] -> null_value, use arg_is_null(), which knows where to look (like in fix for bug 57932). As a consequence of this solution, we have to make arg_is_null() a little more general: 1) Because it was so far only used for AVG() (which always has a single argument), this function was looking at a single argument; now that it has to work with COUNT(DISTINCT expression1,expression2), it must look at all arguments. 2) Because we start using arg_is_null () for COUNT(DISTINCT), i.e. in Item_sum_count::add (), it implies that we are also using it for COUNT(no DISTINCT) (same add ()). For COUNT(no DISTINCT), the nullness to check is that of item_sum -> args[0]. But the null_value of such item is reliable only if val_*() has been called on it. So far arg_is_null() was always used after a call to arg_val*(), so could rely on null_value; but for COUNT, there is no call to arg_val*(), so arg_is_null() has to call is_null() instead. Testcase for 16539979 by Neeraj. Testcase for 17867117 contributed by Xiaobin Lin from Taobao.


Description: Consider a sql like: select count(distinct case when xxx then f end) from tb; When tmp_table_size is too small to fill all the rows, merge_walk needed. But when merge_walk, it check the condition again, using THE LAST row in the match result set. This leads to bug when the "case when" condition return null in the last row, it will lead to the result as 0, error result. How to repeat: drop table if exists tb; set tmp_table_size=1024; create table tb(id int auto_increment primary key, v varchar(32))engine=myisam charset=gbk; insert into tb(v) values("aaa"); insert into tb(v) (select v from tb); insert into tb(v) (select v from tb); insert into tb(v) (select v from tb); insert into tb(v) (select v from tb); insert into tb(v) (select v from tb); insert into tb(v) (select v from tb); update tb set v=concat(v, id); select count(distinct case when id<=64 then id end) from tb; +----------------------------------------------+ | count(distinct case when id<=64 then id end) | +----------------------------------------------+ | 64 | +----------------------------------------------+ select count(distinct case when id<=63 then id end) from tb; +----------------------------------------------+ | count(distinct case when id<=63 then id end) | +----------------------------------------------+ | 0 | +----------------------------------------------+ Suggested fix: A geek method shown in the patch file. Looking forward to other methods :)