Bug #74806 COUNT(DISTINCT ... ) wrong with "Using index"
Submitted: 12 Nov 2014 4:23 Modified: 12 Nov 2014 15:34
Reporter: Rick James Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.6.12 OS:Any
Assigned to: CPU Architecture:Any
Tags: COUNT DISTINCT

[12 Nov 2014 4:23] Rick James
Description:
SELECT
    COUNT(DISTINCT a)   AS expect_1,
    COUNT(DISTINCT a,b) AS expect_2
    FROM count_distinct;
may return wrong value for shorter list.

With INDEX(a,b), the select returns 2,2.
Without the index, it returns 1,2.

Output from the "how to repeat":

mysql> DROP TABLE IF EXISTS count_distinct;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE count_distinct (
    ->     a INT NOT NULL,
    ->     b INT NOT NULL
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO count_distinct (a,b) VALUES
    ->     (11,22),
    ->     (11,33),
    ->     (11,22);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT
    ->     COUNT(DISTINCT a)   AS expect_1,
    ->     COUNT(DISTINCT a,b) AS expect_2
    ->     FROM count_distinct;
+----------+----------+
| expect_1 | expect_2 |
+----------+----------+
|        1 |        2 |
+----------+----------+
1 row in set (0.00 sec)

mysql> ALTER TABLE count_distinct ADD index(a,b);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT
    ->     COUNT(DISTINCT a)   AS expect_1,
    ->     COUNT(DISTINCT a,b) AS expect_2
    ->     FROM count_distinct;
+----------+----------+
| expect_1 | expect_2 |
+----------+----------+
|        2 |        2 |
+----------+----------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT
    ->     COUNT(DISTINCT a)   AS expect_1,
    ->     COUNT(DISTINCT a,b) AS expect_2
    ->     FROM count_distinct;
| id | select_type | table          | type  | possible_keys | key  | key_len | ref  | rows
 | Extra
|  1 | SIMPLE      | count_distinct | range | a             | a    | 8       | NULL |    4
 | Using index for group-by (scanning) |
1 row in set (0.00 sec)

mysql> SELECT * FROM count_distinct;
+----+----+
| a  | b  |
+----+----+
| 11 | 22 |
| 11 | 22 |
| 11 | 33 |
+----+----+
3 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE '%big%';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| big_tables      | OFF   |
| sql_big_selects | OFF   |
+-----------------+-------+
2 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'query_cache%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 0       |
| query_cache_type             | OFF     |
| query_cache_wlock_invalidate | OFF     |
+------------------------------+---------+
5 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| version                 | 5.6.12-log                   |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Win64                        |
+-------------------------+------------------------------+
4 rows in set (0.00 sec)

How to repeat:
DROP TABLE IF EXISTS count_distinct;
CREATE TABLE count_distinct (
    a INT NOT NULL,
    b INT NOT NULL
) ENGINE=InnoDB;
INSERT INTO count_distinct (a,b) VALUES
    (11,22),
    (11,33),
    (11,22);
SELECT
    COUNT(DISTINCT a)   AS expect_1,
    COUNT(DISTINCT a,b) AS expect_2
    FROM count_distinct;
ALTER TABLE count_distinct ADD index(a,b);
SELECT
    COUNT(DISTINCT a)   AS expect_1,
    COUNT(DISTINCT a,b) AS expect_2
    FROM count_distinct;
EXPLAIN SELECT
    COUNT(DISTINCT a)   AS expect_1,
    COUNT(DISTINCT a,b) AS expect_2
    FROM count_distinct;
SELECT * FROM count_distinct;
SHOW VARIABLES LIKE '%big%';
SHOW VARIABLES LIKE 'query_cache%';
SHOW VARIABLES LIKE 'version%';

Suggested fix:
unknown
[12 Nov 2014 10:02] MySQL Verification Team
Hello Rick James,

Thank you for the bug report and test case.
I'm able repeat this issue with 5.6.12 using provided test case but after checking with GA(5.6.21) and latest builds(5.6.23, 5.7.6 and 5.5.42) I can confirm that this issue is no longer repeatable. Could you please confirm with latest GA 5.6.21?

Thanks,
Umesh
[12 Nov 2014 10:05] MySQL Verification Team
Test case results with GA and latest builds

Attachment: 74806_results.txt (text/plain), 17.15 KiB.

[12 Nov 2014 15:34] Rick James
Probable answer:

----- 2013-09-21 5.7.2 Milestone 12 & 2013-07-31 5.6.13 & 2013-07-31 5.5.33 -- Bugs Fixed -- -----

Incorrect results could be returned from queries that used several aggr_func(DISTINCT) functions (where aggr_func() is an aggregate function such as COUNT()) when these referred to different columns of the same composite key. (Bug #12328597)