MySQL Bugs: #37786: Optimizer does not count rows before range when choosing index

Bug #37786	Optimizer does not count rows before range when choosing index
Submitted:	1 Jul 2008 21:48	Modified:	9 Jul 2008 3:31
Reporter:	Kolbe Kegel	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.1.25	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
The optimizer, when trying to avoid a filesort, seems to ignore the huge number of rows that need to be scanned through to get to the rows the query needs to read.

How to repeat:
I have a very large test case that can be used to reproduce this issue.

CREATE TABLE t1 (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `c1` int(11) DEFAULT NULL,
  `c2` decimal(6,1) DEFAULT NULL,
  `c3` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `k1` (`c2`),
  KEY `k2` (`c1`,`c3`)
) ENGINE=InnoDB AUTO_INCREMENT=2404046480 DEFAULT CHARSET=latin1

> explain SELECT * FROM t1 WHERE c1 in (37924) ORDER BY c2 limit 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: k2
          key: k1
      key_len: 5
          ref: NULL
         rows: 793
        Extra: Using where
1 row in set (0.00 sec)

After executing the SELECT, SHOW STATUS reports this:

| Handler_read_next                 | 36115744      |

Results of the EXPLAIN are quite different if the LIMIT is excluded.

mysql 5.1.23-rc (root) "test"> explain SELECT * FROM t1 WHERE c1 in (37924) ORDER BY c2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: k2
          key: k2
      key_len: 5
          ref: const
         rows: 212280
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

FORCE INDEX *with* the LIMIT produces EXPLAIN output quite similar to the above (unsurprisingly), but much better Handler_read_next results:

mysql 5.1.23-rc (root) "test"> explain SELECT * FROM t1 force index (k2) WHERE c1 in (37924) ORDER BY c2 limit 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: k2
          key: k2
      key_len: 5
          ref: const
         rows: 212280
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

mysql 5.1.25-rc (root) "test"> SELECT * FROM t1 force index (k2) WHERE c1 in (37924) ORDER BY c2 limit 1;
...

| Handler_read_next | 168778 |

It may be possible to reproduce with a smaller test case, too. It's easy enough to reproduce the optimizer choosing the "wrong" index, but more difficult to get the distribution right to cause the enormous disconnect between "rows" in EXPLAIN output and Handler_read_next after the query is executed. I'm not sure what feature of the distribution in the large test (above) causes that particular behavior.

CREATE TABLE `i1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `a` tinyint(4) NOT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`),
  KEY `b` (`b`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1

explain select * from i1 where a=100 order by b limit 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: i1
         type: index
possible_keys: a
          key: b
      key_len: 5
          ref: NULL
         rows: 99
        Extra: Using where

| Handler_read_next | 233   | 

Suggested fix:
In this example, the optimizer should account for the 30 million rows it needs to read to carry out this execution plan.

A workaround is to use FORCE INDEX for queries known to be affected by this problem.

Reproducible on both InnoDB & MyISAM.

Looks like a duplicate of bug #36259.