MySQL Bugs: #61631: slow query plans for queries with low cardinality columns in a key

Bug #61631	slow query plans for queries with low cardinality columns in a key
Submitted:	24 Jun 2011 14:49	Modified:	3 Jan 2014 8:40
Reporter:	richard prohaska	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.5.13, 5.5.22	OS:	Linux (centos 5.5)
Assigned to:		CPU Architecture:	Any
Tags:	innodb query plan

Description:
we found that the query planner uses a full table scan on a covering key when a range scan was observed to be 100 times faster.  the first column of the covering key has only one distinct value in this test case. we need to supply an index hint to get the query planner to do a range scan.

mysql> show create table t;
| t     | CREATE TABLE `t` (
  `a` varchar(255) DEFAULT NULL,
  `b` bigint(20) NOT NULL DEFAULT '0',
  `c` bigint(20) NOT NULL DEFAULT '0',
  `d` bigint(20) DEFAULT NULL,
  `e` char(255) DEFAULT NULL,
  PRIMARY KEY (`b`,`c`),
  KEY `a` (`a`,`b`,`d`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 

we loaded 10M rows into this table using the data generated referenced in the "how to repeat" section.  column a has only one value.  this 

mysql> explain select sql_no_cache count(d) from t where a = 'this is a test' and b between 8000000 and 8100000;
+----+-------------+-------+------+---------------+------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows   | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | t     | ref  | PRIMARY,a     | a    | 258     | const | 201438 | Using where; Using index | 
+----+-------------+-------+------+---------------+------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)

the default query plan takes 3.5 seconds to execute since it scans all fo the rows in the table.   

mysql> select sql_no_cache count(d) from t where a = 'this is a test' and b between 8000000 and 8100000;
+----------+
| count(d) |
+----------+
|   100001 | 
+----------+
1 row in set (3.53 sec)

when we supply the index hint, the query planner uses a range scan, which only touches rows in the range.  for this test case, this is 100x faster.

mysql> explain select sql_no_cache count(d) from t use index(a) where a = 'this is a test' and b between 8000000 and 8100000;
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | t     | range | a             | a    | 266     | NULL | 201438 | Using where; Using index | 
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

mysql> select sql_no_cache count(d) from t use index(a) where a = 'this is a test' and b between 8000000 and 8100000;
+----------+
| count(d) |
+----------+
|   100001 | 
+----------+
1 row in set (0.04 sec)

How to repeat:
1. create table t (a varchar(255), b bigint, c bigint, d bigint, e char(255), primary key(b,c), key(a,b,d));

2.  generate the table using this script:
https://s3.amazonaws.com/tokutek-pub/mysql.query.plan.gendata.py

3. load the data into the table using "LOAD DATA INFILE"

4. run the explains from the description above.

May be related to http://bugs.mysql.com/bug.php?id=28554 and other numerous reports showing that optimizer prefers ref. vs range in too many cases...

This is probably a duplicate of bug #8749, a well know problem that we planned to fix years ago. But it is still repeatable with current 5.5.22:

mysql> CREATE TABLE `t` (
    ->   `a` varchar(255) DEFAULT NULL,
    ->   `b` bigint(20) NOT NULL DEFAULT '0',
    ->   `c` bigint(20) NOT NULL DEFAULT '0',
    ->   `d` bigint(20) DEFAULT NULL,
    ->   `e` char(255) DEFAULT NULL,
    ->   PRIMARY KEY (`b`,`c`),
    ->   KEY `a` (`a`,`b`,`d`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Query OK, 0 rows affected (0.18 sec)

mysql> insert into t (a,b) values ('a', 1);
Query OK, 1 row affected (0.05 sec)

mysql> insert into t (a,b) values ('b', 2);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 3);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 5);
Query OK, 1 row affected (0.04 sec)

mysql> insert into t (a,b) values ('b', 6);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 7);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 8);
Query OK, 1 row affected (0.00 sec)

mysql> analyze table t;
+--------+---------+----------+----------+
| Table  | Op      | Msg_type | Msg_text |
+--------+---------+----------+----------+
| test.t | analyze | status   | OK       |
+--------+---------+----------+----------+
1 row in set (0.01 sec)

mysql> explain select count(d) from t where a = 'a' and b between 2 and 5;
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | t     | ref  | PRIMARY,a     | a    | 258     | const |    2 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
1 row in set (0.06 sec)

mysql> explain select count(d) from t use index(a) where a = 'a' and b between 2 and 5;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t     | range | a             | a    | 266     | NULL |    1 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.01 sec)

mysql> select version();
+------------+
| version()  |
+------------+
| 5.5.22-log |
+------------+
1 row in set (0.01 sec)

So, I'd like to give development a second chance to make up specific plans and fix this part of cost model.

Fixed in MySQL 5.7.0 by fix for Bug#14095506: DUBIOUS CHOICE OF INDEXES TO MERGE IN INDEX_MERGE_MYISAM.TEST