Bug #46376 Optimizer choose wrong index even when where clause and order-by are covered
Submitted: 24 Jul 2009 17:05 Modified: 29 Jul 2009 8:39
Reporter: T C Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.0.85, 5.1.36, 5.1.38, 5.1.39 OS:Windows (XP)
Assigned to: Tor Didriksen CPU Architecture:Any
Tags: INDEX, Optimizer, performance

[24 Jul 2009 17:05] T C
Description:
In certain situations the optimizer will pick a very sub-optimal execution plan even when it's obvious that there's a single index that covers the where clause and order-by clause.  

I think this bug is most likely related to at least one of the existing optimizer/index bugs that have been reported, but I wanted to post a test case anyways because it seems like there might be a fundamental issue here that might be hurting the performance of a wide range of simple queries.

This may also be a slightly different scenario than reported in other bugs, since the table in question has a text field.

How to repeat:

casRecord is a parent table of casRecordField...there's about 120000 casRecord rows and 1.6 million casRecordField rows.

create table casRecord (
    Id                   integer         not null auto_increment,
    AppId                integer         not null,
    Uuid                 binary(16)      not null,
    RecordDefinitionName varchar(64)     not null,
    RecordName           varchar(255)    not null,
    Active               tinyint         not null default 1,
    RecordType           tinyint         not null default 0,
    constraint pk_casRecord primary key (Id),
    constraint fk_casRecord_to_casApplication foreign key (AppId) references casApplication (Id),
    constraint ix_casRecord_1 unique index (AppId, RecordDefinitionName, RecordName),
    constraint ix_casRecord_2 unique index (Uuid),
    index ix_casRecord_3 (RecordName)
);

create table casRecordField (
    Id              integer         not null auto_increment,
    RecordId        integer         not null,
    Uuid            binary(16)      not null,
    GroupNumber     integer         not null default 0,
    ShortName       varchar(64)     not null,
    Value           longtext        null,
    constraint pk_casRecordField primary key (Id),
    constraint fk_casRecordField_to_casRecord foreign key (RecordId) references casRecord (Id),
    constraint ix_casRecordField_1 unique index (RecordId, GroupNumber, ShortName),
    constraint ix_casRecordField_2 unique index (RecordId, Id),
    constraint ix_casRecordField_3 unique index (Uuid)
);

select rf.* from casRecordField rf where rf.RecordId = 578 order by rf.RecordId, rf.Id

takes approximately 6 seconds to execute (it matches 13 records).  All queries of this type return similar sized result sets (between 10 and 30 rows).

explain output:

id     select_type     table     type     possible_keys                            key      key_len     ref     rows     Extra       
 -----  --------------  --------  -------  ---------------------------------------  -------  ----------  ------  -------  ----------- 
 1      SIMPLE          rf        index    ix_casRecordField_1,ix_casRecordField_2  PRIMARY  4           (null)  1596834  Using where 

The only index that should be considered here is ix_casRecordField_2 since it covers both the where clause and the order by clause perfectly.

Using the index hint fixes the performance issue:

select rf.* from casRecordField rf force index(ix_casRecordField_2) where rf.RecordId = 257 order by rf.RecordId, rf.Id

returns in sub-millisecond response time.

Interestingly enough, a similar query that returns the child records of a small set of parent id's does indeed pick the proper index and executes in a few milliseconds:

select rf.* from casRecordField rf where rf.RecordId in (256, 301, 436) order by rf.RecordId, rf.Id

explain output:

id     select_type     table     type     possible_keys        key                  key_len     ref     rows     Extra       
 -----  --------------  --------  -------  -------------------  -------------------  ----------  ------  -------  ----------- 
 1      SIMPLE          rf        range    ix_casRecordField_2  ix_casRecordField_2  4           (null)  39       Using where 

Suggested fix:
In this case the optimizer appears to be picking the first index that covers the where clause, instead of noticing that there's a second index that covers the where clause and the order by.
[24 Jul 2009 17:39] Valeriy Kravchuk
Thank you for the problem report. I do not see how PRIMARY key can be used for this query at all (it does NOT cover WHERE and ORDER BY), so, please, provide a complete repeatable test case, with data.
[24 Jul 2009 18:52] T C
create table casRecord (
    Id                   integer         not null auto_increment,
    Uuid                 binary(16)      not null,
    RecordDefinitionName varchar(64)     not null,
    RecordName           varchar(255)    not null,
    Active               tinyint         not null default 1,
    RecordType           tinyint         not null default 0,
    constraint pk_casRecord primary key (Id),
    constraint ix_casRecord_2 unique index (Uuid),
    index ix_casRecord_3 (RecordName)
);

create table casRecordField (
    Id              integer         not null auto_increment,
    RecordId        integer         not null,
    Uuid            binary(16)      not null,
    GroupNumber     integer         not null default 0,
    ShortName       varchar(64)     not null,
    Value           longtext        null,
    constraint pk_casRecordField primary key (Id),
    constraint fk_casRecordField_to_casRecord foreign key (RecordId) references casRecord (Id),
    constraint ix_casRecordField_1 unique index (RecordId, GroupNumber, ShortName),
    constraint ix_casRecordField_2 unique index (RecordId, Id),
    constraint ix_casRecordField_3 unique index (Uuid)
);

delimiter //

create procedure sp_loadTestData()
begin
    declare cnt int default 1000;
    declare childCnt int default 15;
    declare rId int;
    while cnt > 0 do
        begin
            insert into casRecord (Uuid, RecordDefinitionName, RecordName) values (unhex(replace(uuid(), '-', '')), 'Test', concat('TestName', cnt));
           select last_insert_id() into rId;
            set childCnt = 15;
            while childCnt > 0 do
                begin
                    insert into casRecordField(RecordId, Uuid, GroupNumber, ShortName) values (rId, unhex(replace(uuid(), '-', '')), 0, concat('Name', childCnt));
                    set childCnt = childCnt - 1;
                end;
            end while;
            set cnt = cnt - 1;
        end;
    end while;
end //

delimiter ;

call sp_loadTestData();

explain select rf.* from casRecordField rf where rf.RecordId= 601 order by rf.RecordId, rf.Id
explain select rf.* from casRecordField rf force index(ix_casRecordField_2) where rf.RecordId = 701 order by rf.RecordId, rf.Id
[24 Jul 2009 19:15] T C
One more note:  the plan differs when you take the longtext field out of the selected column list.  Maybe that's a clue?
[25 Jul 2009 7:56] Valeriy Kravchuk
Thank you for the complete test case. On recent 5.1.38 from bzr with MyISAM table I've got proper index used by default:

mysql> call sp_loadTestData();
Query OK, 1 row affected (9.84 sec)

mysql> explain select rf.* from casRecordField rf where rf.RecordId= 601 order by rf.RecordId, rf.Id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: rf
         type: ref
possible_keys: ix_casRecordField_1,ix_casRecordField_2
          key: ix_casRecordField_2
      key_len: 4
          ref: const
         rows: 12
        Extra: Using where
1 row in set (0.00 sec)

mysql> show table status like 'casRecordField'\G                               
*************************** 1. row ***************************
           Name: casRecordField
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 15000
 Avg_row_length: 36
    Data_length: 540000
Max_data_length: 281474976710655
   Index_length: 1025024
      Data_free: 0
 Auto_increment: 15001
    Create_time: 2009-07-25 10:22:27
    Update_time: 2009-07-25 10:23:41
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.03 sec)

But with InnoDB tables (it is reasonable to assume they are InnoDB...) I've got ix_casRecordField_1 index used (NOT PRIMARY):

mysql> select version();+--------------+| version()    |+--------------+| 5.1.38-debug | +--------------+1 row in set (0.00 sec)mysql> create table casRecord (    ->     Id                   integer         not null auto_increment,
    ->     Uuid                 binary(16)      not null,
    ->     RecordDefinitionName varchar(64)     not null,
    ->     RecordName           varchar(255)    not null,
    ->     Active               tinyint         not null default 1,
    ->     RecordType           tinyint         not null default 0,
    ->     constraint pk_casRecord primary key (Id),
    ->     constraint ix_casRecord_2 unique index (Uuid),
    ->     index ix_casRecord_3 (RecordName)
    -> ) engine=InnoDB;
Query OK, 0 rows affected (0.14 sec)

mysql> create table casRecordField (
    ->     Id              integer         not null auto_increment,
    ->     RecordId        integer         not null,
    ->     Uuid            binary(16)      not null,
    ->     GroupNumber     integer         not null default 0,
    ->     ShortName       varchar(64)     not null,
    ->     Value           longtext        null,
    ->     constraint pk_casRecordField primary key (Id),
    ->     constraint fk_casRecordField_to_casRecord foreign key (RecordId) references casRecord
    -> (Id),
    ->     constraint ix_casRecordField_1 unique index (RecordId, GroupNumber, ShortName),
    ->     constraint ix_casRecordField_2 unique index (RecordId, Id),
    ->     constraint ix_casRecordField_3 unique index (Uuid)
    -> ) engine=InnoDB;
Query OK, 0 rows affected (0.40 sec)

mysql> call sp_loadTestData();
Query OK, 1 row affected (22.44 sec)

mysql> explain select rf.* from casRecordField rf where rf.RecordId= 601 order by rf.RecordId,
    -> rf.Id
    -> \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: rf
         type: ref
possible_keys: ix_casRecordField_1,ix_casRecordField_2
          key: ix_casRecordField_1
      key_len: 4
          ref: const
         rows: 15
        Extra: Using where; Using filesort
1 row in set (0.01 sec)

mysql> analyze table casRecordField;
+---------------------+---------+----------+----------+
| Table               | Op      | Msg_type | Msg_text |
+---------------------+---------+----------+----------+
| test.casrecordfield | analyze | status   | OK       | 
+---------------------+---------+----------+----------+
1 row in set (0.01 sec)

mysql> explain select rf.* from casRecordField rf where rf.RecordId= 601 order by rf.RecordId, rf.Id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: rf
         type: ref
possible_keys: ix_casRecordField_1,ix_casRecordField_2
          key: ix_casRecordField_1
      key_len: 4
          ref: const
         rows: 15
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

This is still a bug, as ix_casRecordField_2 can be used more efficiently to avoid filesort step. As only InnoDB is affected it seems to be a bug from "bug #28404 family".
[25 Jul 2009 8:00] Valeriy Kravchuk
5.0.85 is also affected, so it does not look like a regression.
[27 Jul 2009 17:51] T C
I'd say this is probably closer to bug#38745

For the time being the workaround is to use index hints in all order-by queries against large tables, since the current implementation seems to enjoy doing filesorts ;-)
[4 Mar 2010 8:47] Manyi Lu
It depends on fix for BUG#46011.