MySQL Bugs: #4100: Optimizer defficiency sorting on fields from different tables

Bug #4100	Optimizer defficiency sorting on fields from different tables
Submitted:	11 Jun 2004 0:27	Modified:	24 Aug 2004 23:57
Reporter:	Peter Zaitsev (Basic Quality Contributor)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	4.0.20	OS:	Any (all)
Assigned to:	Igor Babaev	CPU Architecture:	Any

Description:
This problem exists in MySQL 4.1 as well.

I'm reporting it as a bug on customer request. I do not really hope for a quick fix but it would be good if it is fixed at some point.

The problem is:

MySQL can't use index for Order BY if we have fields from different columns in
sort, even in case it is possible:

mysql> explain select  * from s1,s2 where s1.i=s2.i order by s1.i,s2.n limit 5;
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+----------------------------------------------+
|  1 | SIMPLE      | s2    | index  | PRIMARY,i     | i       |       9 | NULL      |   37 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | s1    | eq_ref | PRIMARY       | PRIMARY |       4 | test.s2.i |    1 |                                              |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+----------------------------------------------+
2 rows in set (0.01 sec)

What I would like MySQL to do in this case is to start scanning table s1 by key 
"i", which will bring values in this key order.  Joining the table s2 MySQL should retrive rows using key (i,n) which will come in the sorted order.

Note this particular case is even more trivial. We have clause s1.i=s2.i,
so "order by s1.i,s2.n" is the same as "order by s2.i,s2.n" which can be done 
using index as you can see below:

mysql> explain select  * from s1,s2 force index(i) where s1.i=s2.i order by s2.i,s2.n limit 5;
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
|  1 | SIMPLE      | s2    | index  | i             | i       |       9 | NULL      |   37 | Using index |
|  1 | SIMPLE      | s1    | eq_ref | PRIMARY       | PRIMARY |       4 | test.s2.i |    1 |             |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
2 rows in set (0.00 sec)

So MySQL for some reason does not do this simple transformation.

It will not work however for other query, which on other hand can be optimized
as described above:
mysql> explain select  * from s1,s2 where s1.i=s2.i order by s1.s,s2.n;
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+----------------------------------------------+
|  1 | SIMPLE      | s2    | index  | PRIMARY,i     | i       |       9 | NULL      |   37 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | s1    | eq_ref | PRIMARY       | PRIMARY |       4 | test.s2.i |    1 |                                              |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+----------------------------------------------+

Why are these queries important ? 

These are very helpful in search applications where you want to see only few first rows in most cases, while you might want to order results on combinations 
of columns from different tables. 

How to repeat:
CREATE TABLE s1 (
   i int(11) NOT NULL default '0',
   s int(11) NOT NULL default '0',
   PRIMARY KEY  (i),
   KEY s (s)
) TYPE=MyISAM;

--
-- Dumping data for table `s1`
--

INSERT INTO s1 VALUES 
(293,293),(646,646),(350,350),(812,812),(10,10),(616,616),(47,47),(389,389),(802,802),(842,842),(807,807),(507,507),(116,116),(57,57),(938,938),(519,519),(782,782),(352,352),(415,415),(19,19),(848,848),(183,183),(371,371),(309,309),(429,429),(218,218),(562,562),(184,184),(233,233),(611,611),(358,358),(956,956),(707,707),(669,669),(221,221),(100,100),(835,835),(875,875),(873,873),(737,737),(68,68),(129,129),(442,442),(822,822),(784,784),(453,453),(913,913),(205,205),(286,286),(816,816),(220,220),(654,654),(609,609),(84,84),(594,594),(718,718),(809,809),(889,889),(879,879),(115,115),(941,941),(360,360),(978,978),(629,629),(176,176),(993,993),(436,436),(204,204),(709,709),(934,934),(545,545),(921,921),(969,969),(83,83),(18,18),(50,50),(193,193),(817,817),(504,504),(69,69),(834,834),(964,964),(319,319),(701,701),(550,550);

--
-- Table structure for table `s2`
--

CREATE TABLE s2 (
   i int(11) NOT NULL default '0',
   n int(11) default NULL,
   PRIMARY KEY  (i),
   KEY i (i,n)
) TYPE=MyISAM;

--
-- Dumping data for table `s2`
--

INSERT INTO s2 VALUES 
(7,7),(89,89),(104,104),(114,114),(123,123),(164,164),(183,183),(213,213),(222,222),(252,252),(255,255),(298,298),(308,308),(312,312),(393,393),(402,402),(528,528),(553,553),(556,556),(557,557),(572,572),(627,627),(629,629),(650,650),(661,661),(668,668),(696,696),(727,727),(777,777),(827,827),(858,858),(859,859),(869,869),(906,906),(950,950),(995,995),(998,998);

This is definitely a feature request. We are not going to add it to 4.1.
The feature will appear 5.0.x. It won't be there yet until quality propagation
patch is applied.