Bug #50212 MySQL doesn't use Primary Index for ORDER BY
Submitted: 10 Jan 2010 14:07 Modified: 28 Feb 2011 8:36
Reporter: Shlomo Priymak Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.37, 5.1.42, 5.1.43-bzr OS:Any
Assigned to: Tor Didriksen CPU Architecture:Any
Tags: regression
Triage: Triaged: D3 (Medium) / R5 (Severe) / E4 (High)

[10 Jan 2010 14:07] Shlomo Priymak
Description:
In the following query, a primary key which can be used for sort is ignored. 
For a large table, this incurs in a serious performance hit.

Query:

CREATE TABLE t1 (pk INT PRIMARY KEY, val INT);
CREATE TABLE t2 (pk INT PRIMARY KEY, fk INT, val INT, KEY ix_fk (fk));

SELECT
  t1.pk,
  t1.val,
  t2.val
FROM t1
	JOIN t2
		ON t1.pk = t2.fk
ORDER BY t1.pk;

When the ORDER BY is omitted, the query does not use a filesort.
Since the database is scanning the PRIMARY key in t1, and probing the t2 table for data during the scan, no file sorting is needed.

How to repeat:

CREATE TABLE t1 (pk INT PRIMARY KEY, val INT);
CREATE TABLE t2 (pk INT PRIMARY KEY, fk INT, val INT, KEY ix_fk (fk));

EXPLAIN 
SELECT
  t1.pk,
  t1.val,
  t2.val
FROM t1
	JOIN t2
		ON t1.pk = t2.fk
ORDER BY t1.pk;

    id  select_type  table   type    possible_keys  key     key_len  ref               rows  Extra         
------  -----------  ------  ------  -------------  ------  -------  --------------  ------  --------------
     1  SIMPLE       t1      ALL     PRIMARY        (NULL)  (NULL)   (NULL)               1  Using filesort
     1  SIMPLE       t2      ref     ix_fk          ix_fk   5        products.t1.pk       1  Using where   

Suggested fix:
Use the clustered index on the primary key.
[10 Jan 2010 14:17] Valeriy Kravchuk
Depending on data in the tables one can get a totally different plan:

77-52-1-11:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.1.43-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE t1 (pk INT PRIMARY KEY, val INT);
Query OK, 0 rows affected (0.41 sec)

mysql> CREATE TABLE t2 (pk INT PRIMARY KEY, fk INT, val INT, KEY ix_fk (fk));
Query OK, 0 rows affected (0.18 sec)

mysql> insert into t1 values (1,1), (2,2), (3,3);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> insert into t2 values (1,1,1), (2,1,2), (3,2,3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT t1.pk, t1.val, t2.val FROM t1 JOIN t2 ON t1.pk = t2.fk ORDER BY t1.pk\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ALL
possible_keys: ix_fk
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.t2.fk
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

So, would you, please, be so kind to send complete test case, with all the data in the table to demonstrate the problem?
[10 Jan 2010 14:26] Shlomo Priymak
This happened for the exact case I attached, when the tables are empty.

It's easily reproducible when adding more rows to table t2 than t1:

CREATE TABLE t1 (pk INT PRIMARY KEY, val INT);
CREATE TABLE t2 (pk INT PRIMARY KEY, fk INT, val INT, KEY ix_fk (fk));

INSERT INTO t1 VALUES (1,0),(2,0);
INSERT INTO t2 VALUES (1,1,0),(2,1,0), (3,1,0), (4,2,0), (5,2,0), (6,2,0);

EXPLAIN 
SELECT
 t1.pk,
 t1.val,
 t2.val
FROM t1
       JOIN t2
               ON t1.pk = t2.fk
ORDER BY t1.pk;

    id  select_type  table   type    possible_keys  key     key_len  ref               rows  Extra         
------  -----------  ------  ------  -------------  ------  -------  --------------  ------  --------------
     1  SIMPLE       t1      ALL     PRIMARY        (NULL)  (NULL)   (NULL)               2  Using filesort
     1  SIMPLE       t2      ref     ix_fk          ix_fk   5        test.t1.pk       1  Using where
[10 Jan 2010 14:54] Valeriy Kravchuk
That's because your tables are InnoDB ones. For MyISAM I've got:

mysql> delete from t1;
Query OK, 3 rows affected (0.00 sec)

mysql> delete from t2;
Query OK, 3 rows affected (0.00 sec)

mysql> EXPLAIN SELECT t1.pk, t1.val, t2.val FROM t1 JOIN t2 ON t1.pk = t2.fk ORDER BY t1.pk\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Impossible WHERE noticed after reading const tables
1 row in set (0.00 sec)

For InnoDB ones I get the same results as you:

mysql> alter table t1 engine=InnoDB;
Query OK, 0 rows affected (0.19 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table t2 engine=InnoDB;
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT t1.pk, t1.val, t2.val FROM t1 JOIN t2 ON t1.pk = t2.fk ORDER BY t1.pk\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: ix_fk
          key: ix_fk
      key_len: 5
          ref: test.t1.pk
         rows: 1
        Extra: Using where
2 rows in set (0.02 sec)

but we can not claim this is a bug, as for empty tables execution plan does NOT really matter. ALL will read zero rows, and there will be nothing to filesort. InnoDB just provides statistics that lead to this plan, somehow.

To prove that there is a bug in optimizer we have to create a set of data where the plan chosen by the optimizer leads to notably slower execution that some other plan you can force with FORCE INDEX and/or STRAIGHT_JOIN "hints".
[10 Jan 2010 15:02] Shlomo Priymak
I indeed failed to mention those are InnoDB tables.
This is a very simplified case so that you could reproduce the behavior - in reality we are speaking of two tables, each several GB in size.
I've added the minimal amount of data (2 rows in t1 and 6 rows in t2) which reproduces the case.

It's easy to see that because the query is using a filesort, the entire output needs to be sorted (very meaningful on a 10GB dataset), which can be avoided altogether.
[10 Jan 2010 16:12] Valeriy Kravchuk
Indeed, looks like we have a bug here:

77-52-1-11:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.1.43-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE t1 (pk INT PRIMARY KEY, val INT);
Query OK, 0 rows affected (0.41 sec)

mysql> CREATE TABLE t2 (pk INT PRIMARY KEY, fk INT, val INT, KEY ix_fk (fk));
Query OK, 0 rows affected (0.18 sec)

mysql> alter table t1 engine=InnoDB;
Query OK, 0 rows affected (0.19 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table t2 engine=InnoDB;
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into t1 values (1,1), (2,2);
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into t2 values (1,1,1), (2,1,2), (3,2,3), (4, NULL, 4), (5,1,5), (6,1,6);
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> analyze table t1;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> analyze table t2;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t2 | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT t1.pk, t1.val, t2.val FROM t1 JOIN t2 ON t1.pk = t2.fk ORDER BY t1.pk\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: ix_fk
          key: ix_fk
      key_len: 5
          ref: test.t1.pk
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)

And FORCE does NOT work:

mysql> EXPLAIN SELECT t1.pk, t1.val, t2.val FROM t1 force index(PRIMARY) JOIN t2 ON t1.pk = t2.fk ORDER BY t1.pk\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: ix_fk
          key: ix_fk
      key_len: 5
          ref: test.t1.pk
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)

Only if PK is psecified in WHERE also index is used:

mysql> EXPLAIN SELECT t1.pk, t1.val, t2.val FROM t1 force index(PRIMARY) JOIN t2 ON t1.pk = t2.fk WHERE t1.pk IN (1,2) ORDER BY t1.pk\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 2
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: ix_fk
          key: ix_fk
      key_len: 5
          ref: test.t1.pk
         rows: 1
        Extra: Using where
2 rows in set (0.02 sec)

It is a regression bug, as in 5.0.90 index is used as expected:

77-52-1-11:5.0 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.90-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE t1 (pk INT PRIMARY KEY, val INT) engine=InnoDB;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE t2 (pk INT PRIMARY KEY, fk INT, val INT, KEY ix_fk (fk)) engine=InnoDB;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values (1,1), (2,2);Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into t2 values (1,1,1), (2,1,2), (3,2,3), (4, NULL, 4), (5,1,5), (6,1,6);
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT t1.pk, t1.val, t2.val FROM t1 JOIN t2 ON t1.pk = t2.fk ORDER BY t1.pk\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 2
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: ix_fk
          key: ix_fk
      key_len: 5
          ref: test.t1.pk
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)
[10 Jan 2010 16:16] Valeriy Kravchuk
It is not a recent regression, as 5.1.37 is also affected.
[10 Jan 2010 16:28] Valeriy Kravchuk
The fact that FORCE does not work can be related to bug #43341. 

Bug #38745 also looks closely related, but 3 way join, MyISAM tables and LIMIT are used there. 

Regression (since 5.1.37) bug #46011 is also very similar (only InnoDB is affected).
[3 Mar 2010 15:34] Manyi Lu
This bug depends on BUG#46011.
[28 Feb 2011 8:36] Tor Didriksen
Seems to be fixed by:
http://bugs.mysql.com/bug.php?id=50843

both 5.1 and 5.5 now say
CREATE TABLE t1 (pk INT PRIMARY KEY, val INT) engine=InnoDB;
CREATE TABLE t2 (pk INT PRIMARY KEY, fk INT, val INT, KEY ix_fk (fk))
engine=InnoDB;
INSERT INTO t1 VALUES (1,0),(2,0);
INSERT INTO t2 VALUES (1,1,0),(2,1,0), (3,1,0), (4,2,0), (5,2,0), (6,2,0);
analyze table t1;
Table Op Msg_type Msg_text
test.t1 analyze status OK
analyze table t2;
Table Op Msg_type Msg_text
test.t2 analyze status OK
EXPLAIN
SELECT t1.pk, t1.val, t2.val FROM t1 JOIN t2 ON t1.pk = t2.fk
ORDER BY t1.pk;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t1 index PRIMARY PRIMARY 4 NULL 2
1 SIMPLE t2 ref ix_fk ix_fk 5 test.t1.pk 1 Using where

trunk says:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t1 index PRIMARY PRIMARY 4 NULL 2
1 SIMPLE t2 ref ix_fk ix_fk 5 test.t1.pk 1