Bug #4114 order by - limit optimization is broken for "range" access type.
Submitted: 12 Jun 2004 2:27 Modified: 24 Aug 2004 6:49
Reporter: Peter Zaitsev (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.2 OS:Any (all)
Assigned to: Sergey Petrunya CPU Architecture:Any

[12 Jun 2004 2:27] Peter Zaitsev
Description:
Hi,

This issue is smaller test case for issue reported by the customer, which results in serous performance issues for him.

Issue also exists in MySQL 4.0 

It is actually two bugs in once but "in" replacement with "=" for single value lists is already reported in other bug.

mysql> explain select * from s1,s2 where s2.n=7 order by s1.i limit 5;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | s1    | index | NULL          | PRIMARY |       4 | NULL  |   85 |             |
|  1 | SIMPLE      | s2    | ref   | n             | n       |       5 | const |    1 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

mysql> explain select * from s1,s2 where s2.n in (7) order by s1.i limit 5;
+----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                        |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------------------+
|  1 | SIMPLE      | s2    | range | n             | n    |       5 | NULL |    1 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | s1    | ALL   | NULL          | NULL |    NULL | NULL |   85 |                                              |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------------------+
2 rows in set (0.00 sec)

As you see  MySQL fails to use order-by optimization for second case while there is no reason why same approach can't be used in this case.

You can't get to it even with STRAIGHT_JOIN:

mysql> explain select STRAIGHT_JOIN  * from s1,s2 where s2.n in (7) order by s1.i limit 5;
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | s1    | ALL   | NULL          | NULL |    NULL | NULL |   85 | Using temporary; Using filesort |
|  1 | SIMPLE      | s2    | range | n             | n    |       5 | NULL |    1 | Using where                     |
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.00 sec)

How to repeat:

CREATE TABLE s1 (
  i int(11) NOT NULL default '0',
  s int(11) NOT NULL default '0',
  PRIMARY KEY  (i),
  KEY s (s)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

--
-- Dumping data for table `s1`
--

INSERT INTO s1 VALUES (293,293),(646,646),(350,350),(812,812),(10,10),(616,616),(47,47),(389,389),(802,802),(842,842),(807,807),(507,507),(116,116),(57,57),(938,938),(519,519),(782,782),(352,352),(415,415),(19,19),(848,848),(183,183),(371,371),(309,309),(429,429),(218,218),(562,562),(184,184),(233,233),(611,611),(358,358),(956,956),(707,707),(669,669),(221,221),(100,100),(835,835),(875,875),(873,873),(737,737),(68,68),(129,129),(442,442),(822,822),(784,784),(453,453),(913,913),(205,205),(286,286),(816,816),(220,220),(654,654),(609,609),(84,84),(594,594),(718,718),(809,809),(889,889),(879,879),(115,115),(941,941),(360,360),(978,978),(629,629),(176,176),(993,993),(436,436),(204,204),(709,709),(934,934),(545,545),(921,921),(969,969),(83,83),(18,18),(50,50),(193,193),(817,817),(504,504),(69,69),(834,834),(964,964),(319,319),(701,701),(550,550);

--
-- Table structure for table `s2`
--

CREATE TABLE s2 (
  i int(11) NOT NULL default '0',
  n int(11) default NULL,
  PRIMARY KEY  (i),
  KEY i (i,n),
  KEY n (n)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

--
-- Dumping data for table `s2`
--

INSERT INTO s2 VALUES (7,7),(89,89),(104,104),(114,114),(123,123),(164,164),(183,183),(213,213),(222,222),(252,252),(255,255),(298,298),(308,308),(312,312),(393,393),(402,402),(528,528),(553,553),(556,556),(557,557),(572,572),(627,627),(629,629),(650,650),(661,661),(668,668),(696,696),(727,727),(777,777),(827,827),(858,858),(859,859),(869,869),(906,906),(950,950),(995,995),(998,998);

explain select * from s1,s2 where s2.n in (7) order by s1.i limit 5;

Other query to test:

explain select * from s1,s2 where s2.n in (s1.s) order by s1.i limit 5
[21 Aug 2004 9:35] Sergey Petrunya
With today's 4.1 pull, both
explain select * from s1,s2 where s2.n =7 order by s1.i limit 5;
explain select * from s1,s2 where s2.n in (7) order by s1.i limit 5;
produce the same output:
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | s1    | index | NULL          | PRIMARY |       4 | NULL  |   85 |             |
|  1 | SIMPLE      | s2    | ref   | n             | n       |       5 | const |    1 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+

The query
explain select * from s1,s2 where s2.n in (s1.s) order by s1.i limit 5
produces this:
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | s1    | ALL   | NULL          | NULL |    NULL | NULL |   85 | Using temporary; Using filesort |
|  1 | SIMPLE      | s2    | index | NULL          | i    |       9 | NULL |   37 | Using where; Using index        |
+----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------+
How important is handling for the last query? Should I investigate further?
[24 Aug 2004 4:27] Peter Zaitsev
Original bug was fixed which is great, however it looks like it was fixed only partially:

Single value IN seems to be transformed to "=" for constants however it is not transformed 
if there is single value in the list, which is not constant however, while it probably should:

mysql> explain select count(*) from s1,s2 where s2.n in (s1.s);
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | s1    | index | NULL          | s    |       4 | NULL |   85 | Using index              |
|  1 | SIMPLE      | s2    | index | NULL          | n    |       5 | NULL |   37 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)

mysql> explain select count(*) from s1,s2 where s2.n = s1.s;
+----+-------------+-------+-------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref       | rows | Extra       |
+----+-------------+-------+-------+---------------+------+---------+-----------+------+-------------+
|  1 | SIMPLE      | s2    | index | n             | n    |       5 | NULL      |   37 | Using index |
|  1 | SIMPLE      | s1    | ref   | s             | s    |       4 | test.s2.n |   10 | Using index |
+----+-------------+-------+-------+---------------+------+---------+-----------+------+-------------+
2 rows in set (0.00 sec)

There is even bigger problem here which I would call "dynamic ranges":

mysql> explain select * from s1,s2 where s2.n in (s1.s,s1.s+1);
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | s1    | ALL   | NULL          | NULL |    NULL | NULL |   85 |                          |
|  1 | SIMPLE      | s2    | index | NULL          | i    |       9 | NULL |   37 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)

As you see index scan is used for accessing second table each time as MySQL for some reason can't use dynamic range matching for each iteration.   This is serious problem
for some complex queries.

It is up to you guys to consider this a bug or just defficiency  but in any case I'd like to see 
this queries in broken queries list and scheduled to be fixed sometime according to your priorities.
[24 Aug 2004 6:49] Igor Babaev
We do not transform <expr> IN (<value>) to expr=<value> currently in 4.1.
And we are not going to add this transformation there. It will appear in 5.0.x