Bug #70038 Wrong select count distinct with a field included in two-column unique key
Submitted: 14 Aug 2013 20:47 Modified: 19 Nov 2013 17:20
Reporter: Michal Sulik Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.5/5.6 OS:Any
Assigned to: CPU Architecture:Any
Tags: COUNT DISTINCT, innodb, SELECT, unique key

[14 Aug 2013 20:47] Michal Sulik
Description:
Selecting count distinct with where from InnoDB table returns wrong number if both fields used in count and where are included in two-column unique key.

I have checked it at MySQL 5.5.32 and Percona 5.5.32 on Ubuntu (32-bit):

mysql> SHOW VARIABLES LIKE "version%";
+-------------------------+-------------------------+
| Variable_name           | Value                   |
+-------------------------+-------------------------+
| version                 | 5.5.32-0ubuntu0.12.04.1 |
| version_comment         | (Ubuntu)                |
| version_compile_machine | i686                    |
| version_compile_os      | debian-linux-gnu        |
+-------------------------+-------------------------+

mysql> SHOW VARIABLES LIKE "version%";
+-------------------------+------------------------------------+
| Variable_name           | Value                              |
+-------------------------+------------------------------------+
| version                 | 5.5.32-31.0-log                    |
| version_comment         | Percona Server (GPL), Release 31.0 |
| version_compile_machine | x86_64                             |
| version_compile_os      | Linux                              |
+-------------------------+------------------------------------+

How to repeat:
CREATE TABLE tmp (
  id int NOT NULL AUTO_INCREMENT,
  a int NOT NULL,
  b int NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY ba (b, a)
) ENGINE=InnoDB;

-- Insert 10 rows for a = 1.
BEGIN;
INSERT INTO tmp (a, b) VALUES(1,101),(1,102),(1,103),(1,104),(1,105),(1,106),(1,107),(1,108),(1,109),(1,110);
COMMIT;

-- These selects return correct values.
SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp; -- Returns 10.
SELECT SQL_NO_CACHE DISTINCT b FROM tmp WHERE a = 1; -- Returns 10 rows.
SELECT SQL_NO_CACHE COUNT(DISTINCT b), GROUP_CONCAT(DISTINCT b) FROM tmp WHERE a = 1; -- Returns correct values.
SELECT SQL_NO_CACHE COUNT(DISTINCT b), COUNT(*) FROM tmp WHERE a = 1; -- Returns 10, 10

-- Returns 5, should return 10. Compare with previous queries.
SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1;

-- Let us add some rows with a = 2;
BEGIN;
INSERT INTO tmp (a, b) VALUES(2,101),(2,102),(2,103);
COMMIT;

SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1; -- Returns 7, should be 10.

-- After flush it is correct.
FLUSH TABLE tmp;
SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1;

DROP TABLE tmp;

# # #

What is interesting, when I change the UNIQUE KEY from (b, a) to (a, b) it works correctly:

DROP TABLE IF EXISTS tmp;
CREATE TABLE tmp (
  id int NOT NULL AUTO_INCREMENT,
  a int NOT NULL,
  b int NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY ab (a, b)
) ENGINE=InnoDB;

-- Insert 10 rows for a = 1.
BEGIN;
INSERT INTO tmp (a, b) VALUES(1,101),(1,102),(1,103),(1,104),(1,105),(1,106),(1,107),(1,108),(1,109),(1,110);
COMMIT;

-- Works correctly:
SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1;

DROP TABLE tmp;

# # #

What is more interesting, when I insert 200 rows to my first table (with UNIQUE KEY (b, a)), it works correctly:

DROP TABLE IF EXISTS tmp;
CREATE TABLE tmp (
	id int NOT NULL AUTO_INCREMENT,
	a int NOT NULL,
	b int NOT NULL,
	PRIMARY KEY (id),
	UNIQUE KEY ba (b, a)
) ENGINE=InnoDB;

-- Insert 200 rows for a = 1.
BEGIN;
INSERT INTO tmp (a, b) VALUES
(1,101),(1,102),(1,103),(1,104),(1,105),(1,106),(1,107),(1,108),(1,109),(1,110),
(1,111),(1,112),(1,113),(1,114),(1,115),(1,116),(1,117),(1,118),(1,119),(1,120),
(1,121),(1,122),(1,123),(1,124),(1,125),(1,126),(1,127),(1,128),(1,129),(1,130),
(1,131),(1,132),(1,133),(1,134),(1,135),(1,136),(1,137),(1,138),(1,139),(1,140),
(1,141),(1,142),(1,143),(1,144),(1,145),(1,146),(1,147),(1,148),(1,149),(1,150),
(1,151),(1,152),(1,153),(1,154),(1,155),(1,156),(1,157),(1,158),(1,159),(1,160),
(1,161),(1,162),(1,163),(1,164),(1,165),(1,166),(1,167),(1,168),(1,169),(1,170),
(1,171),(1,172),(1,173),(1,174),(1,175),(1,176),(1,177),(1,178),(1,179),(1,180),
(1,181),(1,182),(1,183),(1,184),(1,185),(1,186),(1,187),(1,188),(1,189),(1,190),
(1,191),(1,192),(1,193),(1,194),(1,195),(1,196),(1,197),(1,198),(1,199),(1,200),
(1,201),(1,202),(1,203),(1,204),(1,205),(1,206),(1,207),(1,208),(1,209),(1,210),
(1,211),(1,212),(1,213),(1,214),(1,215),(1,216),(1,217),(1,218),(1,219),(1,220),
(1,221),(1,222),(1,223),(1,224),(1,225),(1,226),(1,227),(1,228),(1,229),(1,230),
(1,231),(1,232),(1,233),(1,234),(1,235),(1,236),(1,237),(1,238),(1,239),(1,240),
(1,241),(1,242),(1,243),(1,244),(1,245),(1,246),(1,247),(1,248),(1,249),(1,250),
(1,251),(1,252),(1,253),(1,254),(1,255),(1,256),(1,257),(1,258),(1,259),(1,260),
(1,261),(1,262),(1,263),(1,264),(1,265),(1,266),(1,267),(1,268),(1,269),(1,270),
(1,271),(1,272),(1,273),(1,274),(1,275),(1,276),(1,277),(1,278),(1,279),(1,280),
(1,281),(1,282),(1,283),(1,284),(1,285),(1,286),(1,287),(1,288),(1,289),(1,290),
(1,291),(1,292),(1,293),(1,294),(1,295),(1,296),(1,297),(1,298),(1,299),(1,300);
COMMIT;

-- Returns correct values.
SELECT SQL_NO_CACHE COUNT(DISTINCT b), COUNT(*) FROM tmp WHERE a = 1;
SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1;

DROP TABLE tmp;
[14 Aug 2013 21:27] MySQL Verification Team
Thank you for the bug report.

C:\dbs>c:\dbs\5.5\bin\mysql -uroot --port=3550 --prompt="mysql 5.5 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.5.34 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.5 > use test
Database changed
mysql 5.5 > CREATE TABLE tmp (
    ->   id int NOT NULL AUTO_INCREMENT,
    ->   a int NOT NULL,
    ->   b int NOT NULL,
    ->   PRIMARY KEY (id),
    ->   UNIQUE KEY ba (b, a)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.11 sec)

<CUT>

mysql 5.5 >
mysql 5.5 > -- Returns 5, should return 10. Compare with previous queries.
mysql 5.5 > SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1;
+-------------------+
| COUNT(DISTINCT b) |
+-------------------+
|                 5 |
+-------------------+
1 row in set (0.00 sec)

mysql 5.5 > alter table tmp drop index ba;
Query OK, 0 rows affected (0.11 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql 5.5 > SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1;
+-------------------+
| COUNT(DISTINCT b) |
+-------------------+
|                10 |
+-------------------+

-------------------------------------------------------------------------------
1 roC:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --prompt="mysql 5.6 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.14 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.6 > use test
Database changed
mysql 5.6 > CREATE TABLE tmp (
    ->   id int NOT NULL AUTO_INCREMENT,
    ->   a int NOT NULL,
    ->   b int NOT NULL,
    ->   PRIMARY KEY (id),
    ->   UNIQUE KEY ba (b, a)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.38 sec)

<CUT>

mysql 5.6 >
mysql 5.6 > -- Returns 5, should return 10. Compare with previous queries.
mysql 5.6 > SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1;
+-------------------+
| COUNT(DISTINCT b) |
+-------------------+
|                 5 |
+-------------------+
1 row in set (0.00 sec)

mysql 5.6 > alter table tmp drop index ba;
Query OK, 0 rows affected (0.13 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql 5.6 > SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1;
+-------------------+
| COUNT(DISTINCT b) |
+-------------------+
|                10 |
+-------------------+
1 row in set (0.00 sec)w in set (0.00 sec)
---------------------------------------------------------------------------------
C:\dbs>c:\dbs\5.0\bin\mysql -uroot --port=3500 --prompt="mysql 5.0 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.97-Win X64 Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.0 > use test
Database changed
mysql 5.0 > CREATE TABLE tmp (
    ->   id int NOT NULL AUTO_INCREMENT,
    ->   a int NOT NULL,
    ->   b int NOT NULL,
    ->   PRIMARY KEY (id),
    ->   UNIQUE KEY ba (b, a)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.08 sec)

<CUT>

mysql 5.0 >
mysql 5.0 > -- Returns 5, should return 10. Compare with previous queries.
mysql 5.0 > SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM tmp WHERE a = 1;
+-------------------+
| COUNT(DISTINCT b) |
+-------------------+
|                10 |
+-------------------+
1 row in set (0.00 sec)
[19 Aug 2013 13:54] Patryk Pomykalski
Looks like a bug in QUICK_GROUP_MIN_MAX_SELECT when is_index_scan is true (can skip half of the rows).

I think this can fix it:

=== modified file 'sql/opt_range.cc'
--- sql/opt_range.cc    2013-04-30 17:16:37 +0000
+++ sql/opt_range.cc    2013-08-19 13:43:51 +0000
@@ -11039,7 +11039,7 @@
       are equality predicates for the key parts after the group, find the
       first sub-group with the extended prefix.
     */
-    if (!have_min && !have_max && key_infix_len > 0)
+    if (!have_min && !have_max && key_infix_len > 0 && !is_index_scan)
       result= file->index_read_map(record, group_prefix,
                                    make_prev_keypart_map(real_key_parts),
                                    HA_READ_KEY_EXACT);
[19 Aug 2013 19:39] Patryk Pomykalski
Myisam works, so maybe the above fix is incorrect.
[2 Sep 2013 3:04] Guangpu Feng
Verified, this bug also affects Percona server 5.5.18.
[17 Sep 2013 6:59] Guangpu Feng
Any progress?
[17 Sep 2013 12:33] Mattias Jonsson
The patch is currently in review.
[14 Nov 2013 13:00] Hartmut Holzgraefe
5.6 (tested with 5.6.14) is even worse, returns 100 on the last query instead of 200:

 SELECT SQL_NO_CACHE COUNT(DISTINCT b) FROM t1 WHERE a = 1;
 COUNT(DISTINCT b)
-200
+100
[19 Nov 2013 17:20] Daniel Price
This bug was addressed in a bug fix for a related bug (Oracle Bug#14621190). The fix is noted in 5.5.35, 5.6.15, and 5.7.3 changelogs. The changelog text for the duplicate bug, as noted in the 5.5.35, 5.6.15, and 5.7.3 changelogs, is:

When "index_read_map" is called for an exact search and fails to return a
record due to non-matching search criteria, the cursor would be positioned
on the next record after the searched key. A subsequent call to
"index_next " would return the next record instead of returning the
previous non-matching row, thereby skipping a record. 

Thank you for the bug report.