Bug #82044 COUNT(DISTINCT) query with WHERE clause returns an Incorrect number of rows
Submitted: 29 Jun 2016 7:59 Modified: 12 Jul 2016 15:19
Reporter: 광복 이 Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7.9 OS:Ubuntu (12.04.4 LTS)
Assigned to: CPU Architecture:Any
Tags: 5.7.13, COUNT(DISTINCT) query, MySQL, ncorrect number of rows

[29 Jun 2016 7:59] 광복 이
Description:
The following  2 queries does the same things.
And actually return the SAME result not using NOT IN clause.

Query 1. SELECT COUNT(DISTINCT a.id) AS totalUser
         from test.test_user a;
Query 2. select  count(*) totalUser
         from (SELECT 1  AS totalUser
              from test.test_user a                 
              group by a.id)  a;

Now I add WHERE  and NOT IN clause in the above queries.
But the results from two queries are different.
The correct one is from Query 2 using group by statement.
(Of course,the target table is frozen,in other words there were no DMLs on the table.)

I can't find any similar bugs from bug lists.
My query is very simple and the target is not so big.

I do the same test in 5.6.13 Commercial edition.
In 5.6.13 Commercial Edition, the results from 2 queries are  same.

Because of this problem, I decided to upgrade to 5.7.13 from 5.7.9.
Now I want to know whether the above query 1 could return a correct result in MySQL 5.7.13.

How to repeat:

<< test_user table Information>>
Table size : 1.2GB
# of rows  : 3,486,886
desc  test_user;
+--------------+---------------------+------+-----+---------------------+-----------------------------+
| Field        | Type                | Null | Key | Default             | Extra                       |
+--------------+---------------------+------+-----+---------------------+-----------------------------+
| seq_no       | bigint(20) unsigned | NO   | PRI | NULL                | auto_increment              |
| ID           | varchar(36)         | NO   | MUL | NULL                |                             |
| A_ID         | varchar(10)         | NO   |     | NULL                |                             |
| B_ID         | varchar(10)         | NO   |     | NULL                |                             |
| C_ID         | varchar(10)         | NO   |     | NULL                |                             |
| X            | varchar(32)         | NO   |     | NULL                |                             |
| Y            | varchar(32)         | YES  |     | NULL                |                             |
| Z            | varchar(300)        | NO   |     | NULL                |                             |
| W            | varchar(128)        | YES  |     | NULL                |                             |
| USG          | bigint(20)          | NO   |     | NULL                |                             |
+--------------+---------------------+------+-----+---------------------+-----------------------------+

<< test >> 

Query 1.
test> SELECT COUNT(DISTINCT a.id) AS totalUser
    -> from test.test_user a
    -> WHERE  a.USG > 0
    -> AND a.id NOT IN
    ->    ( 'sfdlu3lkff'
    ->    , 'wwj3n7ovnf'
    ->    , '2lblkzbo5u'
    ->    , 'qanyhuilni'
    ->    , 'ax0jzgzkbc'
    ->    , 'sf7ihinlkk'
    ->    );
+-----------------+
| totalUser       |
+-----------------+
|         3452684 |
+-----------------+
1 row in set (34.61 sec)

Here is the plan.
+----+-------------+-------+------------+------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | a     | NULL       | ALL  | NULL | NULL    | NULL | 3301001 |    23.10 | Using where |
+----+-------------+-------+------------+------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

Query 2. 
 test>select  count(*) totalUser
    ->      from (SELECT 1  AS totalUser
    ->            from   test.test_user a
    ->            WHERE  a.USG > 0
    ->            AND a.id NOT IN
    ->               ( 'sfdlu3lkff'
    ->               , 'wwj3n7ovnf'
    ->               , '2lblkzbo5u'
    ->               , 'qanyhuilni'
    ->               , 'ax0jzgzkbc'
    ->               ,'sf7ihinlkk'
    ->               )   
    ->             group by a.id)  a;

+-----------------+
| totalUser       |
+-----------------+
|         3452174 |
+-----------------+
1 row in set (8.85 sec)
[4 Jul 2016 15:13] MySQL Verification Team
Thank you for your bug report.

Can you let us know whether you have tested 5.7.13 and whether this problems is solved now ????
[5 Jul 2016 0:30] 광복 이
This test  was done in 5.7.9.

I want to update to 5.7.13 soon.
But I'm hesitatingfor the this bug.
This count(distinct) pattern queries are one of main service queries of my business.

In fact, I don't know whether the bug is solved or not in  Mysql5.7.13.
I'm waiting for your REPLY.
Please let me know the bug is resolved in 5.7.13 ~
[5 Jul 2016 13:49] MySQL Verification Team
I have checked ChangeLog from 5.7.9 to 5.7.14 and I did not find any reference similar to the behavior that you describe.

Hence, what we need is to repeat this bug. So, please can you upload your table to this bug report using the "Files" tab. We need your data in order to repeat the problem. Only if we repeat the problem would we be able to verify this report as an unfixed bug.
[12 Jul 2016 15:19] MySQL Verification Team
I have just tested this with 5.7.13 and there are no problems. Please, do note that it was tested with and without an index on the aggregated column ...

Here is copy / paste from my session:

 ./client/mysql sinbas -e " show create table t1"
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                       |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id1` bigint(20) NOT NULL,
  `id2` bigint(20) NOT NULL,
  `id3` bigint(20) NOT NULL,
  `c1` bigint(20) DEFAULT NULL,
  `c2` bigint(20) NOT NULL,
  `seqid` bigint(20) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id1`,`id2`),
  UNIQUE KEY `ix_seq` (`seqid`),
  KEY `sinisa` (`id1`)
) ENGINE=InnoDB AUTO_INCREMENT=1782 DEFAULT CHARSET=utf8 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[smilivoj@local mysql-5.7.13]$ ./client/mysql sinbas -e " select count(distinct id1) from t1"
+---------------------+
| count(distinct id1) |
+---------------------+
|                 876 |
+---------------------+
[smilivoj@local mysql-5.7.13]$ ./client/mysql sinbas -e " select count(*) total from (select 1 from t1 group by t1.id1) a"
+-------+
| total |
+-------+
|   876 |
+-------+
[smilivoj@local mysql-5.7.13]$ ./client/mysql sinbas -e "drop index sinisa on t1"
[smilivoj@local mysql-5.7.13]$ ./client/mysql sinbas -e " show create table t1"+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                               |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id1` bigint(20) NOT NULL,
  `id2` bigint(20) NOT NULL,
  `id3` bigint(20) NOT NULL,
  `c1` bigint(20) DEFAULT NULL,
  `c2` bigint(20) NOT NULL,
  `seqid` bigint(20) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id1`,`id2`),
  UNIQUE KEY `ix_seq` (`seqid`)
) ENGINE=InnoDB AUTO_INCREMENT=1782 DEFAULT CHARSET=utf8 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[smilivoj@local mysql-5.7.13]$ ./client/mysql sinbas -e " select count(distinct id1) from t1"+---------------------+
| count(distinct id1) |
+---------------------+
|                 876 |
+---------------------+
[smilivoj@local mysql-5.7.13]$ ./client/mysql sinbas -e " select count(*) total from (select 1 from t1 group by t1.id1) a"
+-------+
| total |
+-------+
|   876 |
+-------+

This effort definitely closes this bug.