Bug #38745 MySQL 5.1 optimizer uses filesort for ORDER BY when it should use index
Submitted: 12 Aug 2008 11:44 Modified: 20 Nov 10:49
Reporter: Marc Isambart
Status: Verified
Category:Server: Optimizer Severity:S2 (Serious)
Version:5.1 OS:Any
Assigned to: Gleb Shchepa Target Version:
Tags: Optimizer, regression
Triage: Needs Triage: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[12 Aug 2008 11:44] Marc Isambart
Description:
This is a case where MySQL 5.1.26 uses filesort for an ORDER BY when it should use the
index. On the exact same query, MySQL 5.0.45 correctly uses the index.

This greatly affects the performance on large databases. By example, the query will take
100s of seconds on large databases when it will take less than a second on 5.0.45.

How to repeat:
We need to consider 3 tables:

CREATE TABLE data1 ( index1 integer NOT NULL, value1 integer NOT NULL, PRIMARY
KEY(index1) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE data2 ( index2 integer NOT NULL, value2 integer NOT NULL, PRIMARY
KEY(index2) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE data3 ( index3 integer NOT NULL, value3 integer NOT NULL, PRIMARY
KEY(index3) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Let's add some data to data1 data3, but keep data2 empty:

INSERT INTO data1 VALUES (1, 1),(2, 2),(3, 3),(4, 4),(5, 5),(6, 6),(7, 7),(8, 8),(9,
9),(10, 10),(11, 11),(12, 12);
INSERT INTO data3 VALUES (1, 1),(2, 2),(3, 3),(4, 4),(5, 5),(6, 6),(7, 7),(8, 8),(9,
9),(10, 10),(11, 11),(12, 12);

If we consider the following query:

EXPLAIN SELECT data1.*, data3.*, data2.value2 FROM data1 JOIN data3 ON data3.index3 =
data1.index1 LEFT JOIN data2 ON data1.index1 = data2.index2 WHERE data1.index1 BETWEEN 3
AND 10 ORDER BY data1.index1 LIMIT 5\G

5.1.26 will use filesort:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: data2
         type: system
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 0
        Extra: const row not found
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: data1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 7
        Extra: Using where; Using filesort
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: data3
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: bugreport.data1.index1
         rows: 1
        Extra:

On the same query, 5.0.45 will correctly optimize it:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: data2
         type: system
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 0
        Extra: const row not found
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: data1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 7
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: data3
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: bugreport.data1.index1
         rows: 1
        Extra:

If we now add a single row in data2 and use the same previous select query:

INSERT INTO data2 VALUES (1, 1);

EXPLAIN SELECT data1.*, data3.*, data2.value2 FROM data1 JOIN data3 ON data3.index3 =
data1.index1 LEFT JOIN data2 ON data1.index1 = data2.index2 WHERE data1.index1 BETWEEN 3
AND 10 ORDER BY data1.index1 LIMIT 5\G

MySQL 5.1.26 will now optimize it correctly:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: data1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 7
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: data2
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: bugreport.data1.index1
         rows: 1
        Extra:
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: data3
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: bugreport.data1.index1
         rows: 1
        Extra:
[12 Aug 2008 15:26] Susanne Ebrecht
Many thanks for writing a bug report.

Verified as described.
[27 Apr 11:39] Sergej Pupykin
I just want to notice that this bug seems brake dbmail application.

This query hangs for a very very long time on dbmail with 6.5Gb database and mysql
5.1.34, but works fast on mysql 5.0.77:
SELECT messageblk, is_header FROM dbmail_messageblks WHERE physmessage_id = 278699 ORDER
BY messageblk_idnr

I fix it in dbmail using 'use index' expression:
SELECT messageblk, is_header FROM dbmail_messageblks  use index(physmessage_id_index)
WHERE physmessage_id = 278699 ORDER BY messageblk_idnr
[14 Sep 10:27] Gleb Shchepa
Simplified test case:

CREATE TABLE t1 (i1 integer NOT NULL PRIMARY KEY);
CREATE TABLE t2 (i2 integer);
CREATE TABLE t3 (i3 integer NOT NULL PRIMARY KEY);

INSERT INTO t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12);
INSERT INTO t3 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12);

EXPLAIN EXTENDED
SELECT t1.*, t3.* FROM t1 JOIN t3 ON t3.i3 = t1.i1 LEFT JOIN t2 ON t1.i1 = t2.i2 ORDER BY
t1.i1 LIMIT 5;

DROP TABLE t1, t2, t3;