Bug #61631 slow query plans for queries with low cardinality columns in a key
Submitted: 24 Jun 2011 14:49 Modified: 3 Jan 2014 8:40
Reporter: richard prohaska Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.5.13, 5.5.22 OS:Linux (centos 5.5)
Assigned to: CPU Architecture:Any
Tags: innodb query plan
Triage: Needs Triage: D3 (Medium)

[24 Jun 2011 14:49] richard prohaska
Description:
we found that the query planner uses a full table scan on a covering key when a range scan was observed to be 100 times faster.  the first column of the covering key has only one distinct value in this test case. we need to supply an index hint to get the query planner to do a range scan.

mysql> show create table t;
| t     | CREATE TABLE `t` (
  `a` varchar(255) DEFAULT NULL,
  `b` bigint(20) NOT NULL DEFAULT '0',
  `c` bigint(20) NOT NULL DEFAULT '0',
  `d` bigint(20) DEFAULT NULL,
  `e` char(255) DEFAULT NULL,
  PRIMARY KEY (`b`,`c`),
  KEY `a` (`a`,`b`,`d`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 

we loaded 10M rows into this table using the data generated referenced in the "how to repeat" section.  column a has only one value.  this 

mysql> explain select sql_no_cache count(d) from t where a = 'this is a test' and b between 8000000 and 8100000;
+----+-------------+-------+------+---------------+------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows   | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | t     | ref  | PRIMARY,a     | a    | 258     | const | 201438 | Using where; Using index | 
+----+-------------+-------+------+---------------+------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)

the default query plan takes 3.5 seconds to execute since it scans all fo the rows in the table.   

mysql> select sql_no_cache count(d) from t where a = 'this is a test' and b between 8000000 and 8100000;
+----------+
| count(d) |
+----------+
|   100001 | 
+----------+
1 row in set (3.53 sec)

when we supply the index hint, the query planner uses a range scan, which only touches rows in the range.  for this test case, this is 100x faster.

mysql> explain select sql_no_cache count(d) from t use index(a) where a = 'this is a test' and b between 8000000 and 8100000;
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | t     | range | a             | a    | 266     | NULL | 201438 | Using where; Using index | 
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

mysql> select sql_no_cache count(d) from t use index(a) where a = 'this is a test' and b between 8000000 and 8100000;
+----------+
| count(d) |
+----------+
|   100001 | 
+----------+
1 row in set (0.04 sec)

How to repeat:
1. create table t (a varchar(255), b bigint, c bigint, d bigint, e char(255), primary key(b,c), key(a,b,d));

2.  generate the table using this script:
https://s3.amazonaws.com/tokutek-pub/mysql.query.plan.gendata.py

3. load the data into the table using "LOAD DATA INFILE"

4. run the explains from the description above.
[24 Jun 2011 15:03] Valeriy Kravchuk
May be related to http://bugs.mysql.com/bug.php?id=28554 and other numerous reports showing that optimizer prefers ref. vs range in too many cases...
[25 Mar 2012 15:59] Valeriy Kravchuk
This is probably a duplicate of bug #8749, a well know problem that we planned to fix years ago. But it is still repeatable with current 5.5.22:

mysql> CREATE TABLE `t` (
    ->   `a` varchar(255) DEFAULT NULL,
    ->   `b` bigint(20) NOT NULL DEFAULT '0',
    ->   `c` bigint(20) NOT NULL DEFAULT '0',
    ->   `d` bigint(20) DEFAULT NULL,
    ->   `e` char(255) DEFAULT NULL,
    ->   PRIMARY KEY (`b`,`c`),
    ->   KEY `a` (`a`,`b`,`d`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Query OK, 0 rows affected (0.18 sec)

mysql> insert into t (a,b) values ('a', 1);
Query OK, 1 row affected (0.05 sec)

mysql> insert into t (a,b) values ('b', 2);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 3);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 5);
Query OK, 1 row affected (0.04 sec)

mysql> insert into t (a,b) values ('b', 6);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 7);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t (a,b) values ('b', 8);
Query OK, 1 row affected (0.00 sec)

mysql> analyze table t;
+--------+---------+----------+----------+
| Table  | Op      | Msg_type | Msg_text |
+--------+---------+----------+----------+
| test.t | analyze | status   | OK       |
+--------+---------+----------+----------+
1 row in set (0.01 sec)

mysql> explain select count(d) from t where a = 'a' and b between 2 and 5;
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | t     | ref  | PRIMARY,a     | a    | 258     | const |    2 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
1 row in set (0.06 sec)

mysql> explain select count(d) from t use index(a) where a = 'a' and b between 2 and 5;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t     | range | a             | a    | 266     | NULL |    1 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.01 sec)

mysql> select version();
+------------+
| version()  |
+------------+
| 5.5.22-log |
+------------+
1 row in set (0.01 sec)

So, I'd like to give development a second chance to make up specific plans and fix this part of cost model.
[3 Jan 2014 8:40] Øystein Grøvlen
Fixed in MySQL 5.7.0 by fix for Bug#14095506: DUBIOUS CHOICE OF INDEXES TO MERGE IN INDEX_MERGE_MYISAM.TEST