MySQL Bugs: #68919: Performance regression when DS-MRR is used for query with small limit

Bug #68919	Performance regression when DS-MRR is used for query with small limit
Submitted:	10 Apr 2013 14:25	Modified:	10 Apr 2013 14:27
Reporter:	Olav Sandstå	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.6.10	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
This issue was originally reported in MySQL Forums (see
http://forums.mysql.com/read.php?24,581694,582599).

For queries that has a LIMIT clause there can be performance
regressions when upgrading from 5.5 to 5.6.10 when DS-MRR is used for
the range scan.

The following query (which is a simplified version of the query
reported in the forum) (for table definitions, see the how to repeat
section):

   SELECT * FROM t1 WHERE i1>=42 AND i2<=42 LIMIT 1;

takes milliseconds in 5.5 while it can take seconds in 5.6
when the table is sufficiently large (10 million records in this
case).

The table has an index on (i1,i2) that can be used for doing a range
scan that will be able to complete after having found the first
matching record (due to the LIMIT 1). In the case of 5.5 this
completes quickly. In 5.6 the same range scan will be executed using
DS-MRR, see "Using MRR" in the explain output:

explain SELECT *  FROM t1  WHERE i1>=42 AND i2<=42 LIMIT 1;
+----+-------------+-------+-------+---------------+------+---------+------+---------+----------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows    | Extra                            |
+----+-------------+-------+-------+---------------+------+---------+------+---------+----------------------------------+
|  1 | SIMPLE      | t1    | range | idx           | idx  | 4       | NULL | 9999958 | Using index condition; Using MRR |
+----+-------------+-------+-------+---------------+------+---------+------+---------+----------------------------------+

The reason for this query to now take much longer time is that the
DS-MRR implementation will first try to fill the DS-MRR sort buffer
with row-ids for matching index entries. Since this table only has
one matching record it will read the rest of the index before doing
the sorting of the row-ids. Only then the first qualifying record will
be returned.

How to repeat:
MTR test case for reproducing the performance regression:

CREATE TABLE t0 (
  i1 INTEGER NOT NULL
) ENGINE=MyISAM;

INSERT INTO t0 VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

CREATE TABLE t1 (
  pk INTEGER NOT NULL,
  i1 INTEGER NOT NULL,
  i2 INTEGER NOT NULL,
  i3 INTEGER NOT NULL,
  PRIMARY KEY (pk),
  KEY idx (i1,i2)
) ENGINE=MyISAM;

INSERT INTO t1
SELECT a.i1 + 10*b.i1 + 100*c.i1 + 1000*d.i1 + 10000*e.i1 + 100000*f.i1 + 1000000*g.i1,
       a.i1 + 10*b.i1 + 100*c.i1 + 1000*d.i1 + 10000*e.i1 + 100000*f.i1 + 1000000*g.i1,
       a.i1 + 10*b.i1 + 100*c.i1 + 1000*d.i1 + 10000*e.i1 + 100000*f.i1 + 1000000*g.i1,
       a.i1 + 10*b.i1 + 100*c.i1 + 1000*d.i1 + 10000*e.i1 + 100000*f.i1 + 1000000*g.i1
FROM t0 AS a, t0 AS b, t0 AS c, t0 AS d, t0 AS e, t0 AS f, t0 AS g;

let query1=
SELECT * 
FROM t1 
WHERE i1>=42 AND i2<=42 LIMIT 1;

eval EXPLAIN $query1;
eval $query1;

DROP TABLE t0,t1;

Workaround for for getting back to the 5.5 behavior and performance is to disable use of DS-MRR. This can be done by setting:

  optimizer_switch="mrr=off";

The reason DS-MRR is used for this query is:

1. When the optimizer evaluates alternative access methods and make
   cost estimate for each, the "LIMIT 1" is not taken into
   account. One of the access methods that is considered is to do
   range access on the seonday index. A range scan can either be done
   as a "standard range scan" (using the default MRR implementation)
   or as a "DS-MRR scan". In this case the MRR cost model is given an
   estimate of almost 10 million records to be read. Based on this,
   the MRR cost model chooses to use DS-MRR for this range access.

2. After the join optimizer has been run, the optimizer consideres the
   "LIMIT 1" clause to see if it can use a particular index that
   likely will require reading only a few records to satisfy the LIMIT
   1. And in this case it sees that it already has selected a good
   index for this. At fails to notice that the selected range access
   method for this indes is DS-MRR which is not a good access method
   for satisfying the "LIMIT 1" (since DS-MRR will try to read a lot
   of index entries into its sort buffer before returning the first
   result record).

Two alternative ideas for how to fix this:

a) Take the limit clause into account when estimating the number of
records to be read. Use this as input to the MRR cost model.

b) When reconsidering which access method to be used for "LIMIT 1",
disable use of DS-MRR for the range scan.