Description:
The optimizer, when trying to avoid a filesort, seems to ignore the huge number of rows that need to be scanned through to get to the rows the query needs to read.
How to repeat:
I have a very large test case that can be used to reproduce this issue.
CREATE TABLE t1 (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`c1` int(11) DEFAULT NULL,
`c2` decimal(6,1) DEFAULT NULL,
`c3` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `k1` (`c2`),
KEY `k2` (`c1`,`c3`)
) ENGINE=InnoDB AUTO_INCREMENT=2404046480 DEFAULT CHARSET=latin1
> explain SELECT * FROM t1 WHERE c1 in (37924) ORDER BY c2 limit 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index
possible_keys: k2
key: k1
key_len: 5
ref: NULL
rows: 793
Extra: Using where
1 row in set (0.00 sec)
After executing the SELECT, SHOW STATUS reports this:
| Handler_read_next | 36115744 |
Results of the EXPLAIN are quite different if the LIMIT is excluded.
mysql 5.1.23-rc (root) "test"> explain SELECT * FROM t1 WHERE c1 in (37924) ORDER BY c2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: k2
key: k2
key_len: 5
ref: const
rows: 212280
Extra: Using where; Using filesort
1 row in set (0.00 sec)
FORCE INDEX *with* the LIMIT produces EXPLAIN output quite similar to the above (unsurprisingly), but much better Handler_read_next results:
mysql 5.1.23-rc (root) "test"> explain SELECT * FROM t1 force index (k2) WHERE c1 in (37924) ORDER BY c2 limit 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: k2
key: k2
key_len: 5
ref: const
rows: 212280
Extra: Using where; Using filesort
1 row in set (0.00 sec)
mysql 5.1.25-rc (root) "test"> SELECT * FROM t1 force index (k2) WHERE c1 in (37924) ORDER BY c2 limit 1;
...
| Handler_read_next | 168778 |
It may be possible to reproduce with a smaller test case, too. It's easy enough to reproduce the optimizer choosing the "wrong" index, but more difficult to get the distribution right to cause the enormous disconnect between "rows" in EXPLAIN output and Handler_read_next after the query is executed. I'm not sure what feature of the distribution in the large test (above) causes that particular behavior.
CREATE TABLE `i1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`a` tinyint(4) NOT NULL,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`),
KEY `b` (`b`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1
explain select * from i1 where a=100 order by b limit 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: i1
type: index
possible_keys: a
key: b
key_len: 5
ref: NULL
rows: 99
Extra: Using where
| Handler_read_next | 233 |
Suggested fix:
In this example, the optimizer should account for the 30 million rows it needs to read to carry out this execution plan.