Bug #31446 Inefficient index use with order by and joins
Submitted: 8 Oct 2007 12:14 Modified: 16 Oct 2008 14:44
Reporter: Pekka Lund Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.0.44, 5.0.37, 6.0.2 Alpha OS:Any
Assigned to: CPU Architecture:Any

[8 Oct 2007 12:14] Pekka Lund
Description:
Section 6.2.11. of the documentation (http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html) describes how ORDER BY may use index and avoid temp table creation if the unused parts of the index are constants in the WHERE clause. This optimization doesn't seem to work when there are joins to columns without indexes (or otherwise more complex conditions) involved.

How to repeat:
The following example creates two tables (t1 and t2) both of which have 2000 rows with column a filled with the value 1. Table t1 has also a second column b which is filled with values 1 to 2000 and there is an index covering both columns. Then the following query is executed first without any indexing on t2 and then with an index.

SELECT t1.a FROM t1 JOIN t2 ON t1.a=t2.a WHERE t1.a=1 ORDER BY t1.b LIMIT 1;

This query joins the tables with the constant column a and the value is also specified to be constant 1 in the where clause. The result is ordered by column b and only the first result row is requested.

According to the execution plans the query is performed on both cases by first using the index on table t1 with columns a and b in which case there shouldn't be any need for ordering if the columns are processed in the index order as the first matching row could be returned as it should be ordered also by column b and column a was already specified as constant. For some reason the query does a full ordering through a temp table if the joined column on table t2 is not indexed and this seriously affects the execution time. It works fine and fast without temp tables if there is an index on t2.

The example is rather artificial and the inefficiency probably isn't limited only to those cases where the joined column is not indexed but has something to do with the complexity of the queries. At least I have noticed the same thing with some other similar situations where the performance could be better.

-- Create and populate tables

CREATE TABLE t1 (
 a INT NOT NULL,
 b INT NOT NULL,
 INDEX t1_ab USING BTREE (a, b)
) ENGINE=InnoDB;

CREATE TABLE t2 (
 a INT NOT NULL
) ENGINE=InnoDB;

DELIMITER $$
CREATE PROCEDURE create_tables() BEGIN
 SET @a=1;
 REPEAT
  INSERT INTO t1 VALUES (1,@a);
  INSERT INTO t2 VALUES (1);
  SET @a=@a+1;
 UNTIL @a>2000 END REPEAT; 
END$$
DELIMITER ;
CALL create_tables();

-- Query without index on t2

mysql> EXPLAIN SELECT t1.a FROM t1 JOIN t2 ON t1.a=t2.a WHERE t1.a=1 ORDER BY t1.b LIMIT 1;
+----+-------------+-------+------+---------------+-------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                                        |
+----+-------------+-------+------+---------------+-------+---------+-------+------+----------------------------------------------+
|  1 | SIMPLE      | t1    | ref  | t1_ab         | t1_ab | 4       | const |  957 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | ALL  | NULL          | NULL  | NULL    | NULL  | 2259 | Using where                                  |
+----+-------------+-------+------+---------------+-------+---------+-------+------+----------------------------------------------+
2 rows in set (0.00 sec)
mysql> SELECT t1.a FROM t1 JOIN t2 ON t1.a=t2.a WHERE t1.a=1 ORDER BY t1.b LIMIT 1;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (3.73 sec)

-- Create index on t2
mysql> CREATE INDEX t2_a USING BTREE ON t2(a);

-- Query again with index on tmp2

mysql> EXPLAIN SELECT t1.a FROM t1 JOIN t2 ON t1.a=t2.a WHERE t1.a=1 ORDER BY t1.b LIMIT 1;
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | t1    | ref  | t1_ab         | t1_ab | 4       | const |  957 | Using where; Using index |
|  1 | SIMPLE      | t2    | ref  | t2_a          | t2_a  | 4       | const | 1144 | Using index              |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
2 rows in set (0.01 sec)
mysql> SELECT t1.a FROM t1 JOIN t2 ON t1.a=t2.a WHERE t1.a=1 ORDER BY t1.b LIMIT 1;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0.00 sec)
[28 Oct 2007 13:41] Valeriy Kravchuk
Thank you for a bug report. Verified just as described. At least, manual should have a note that this optimization does not work if JOIN is involved.
[26 Jun 2008 22:39] Sergey Petrunya
> At least, manual should have a note that this optimization does not work if JOIN is involved.

This is not true. The optimization may work if join is involved in certain cases.
[27 Jun 2008 9:16] Sergey Petrunya
The use of filesort is needed in the non-indexed case because of use of join buffering. On recent MySQL 5.1 (from about a year ago or later), use of join buffering is indicated in EXPLAIN: 

mysql> EXPLAIN SELECT t1.a FROM t1 JOIN t2 ON t1.a=t2.a WHERE t1.a=1 ORDER BY t1.b LIMIT 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: t1_ab
          key: t1_ab
      key_len: 4
          ref: const
         rows: 956
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1996
        Extra: Using where; Using join buffer
2 rows in set (0.01 sec)

(an explanation of what join buffering is available here: http://s.petrunia.net/blog/?p=18)

At the moment the optimizer always choses to do join buffering + filesort over doing no join buffering and no filesort (DUMB-CHOICE). It is not possible to affect this choice.

For MySQL 6.1, we have WL#4421 "Add hints on join buffer usage for join queries" which will allow the user to force or disable join buffering.

We consider the [DUMB-CHOICE] to be a feature request bug. The optimizer ought to do a cost-based choice between join buffer + filesort and no join buffer and no filesort. We intend to fix this in MySQL 6.x (x=0,1,...).
[25 Nov 2008 16:28] Omer Barnir
triage: correcting target to 6.0 (checked bug)