Bug #1882 Innodb optimiser uses the wrong index
Submitted: 18 Nov 2003 23:56 Modified: 12 Dec 2003 6:55
Reporter: Anthony Shipman Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:4.0.16 OS:Linux (Linux 2.4.18, Redhat 8)
Assigned to: Ramil Kalimullin CPU Architecture:Any

[18 Nov 2003 23:56] Anthony Shipman
Description:
I have tables with the following details

create table gridimages (
        ID              INT UNSIGNED AUTO_INCREMENT NOT NULL,
        GenDate         DATETIME NOT NULL,
        ForDate         DATETIME,
        CrDate          DATETIME NOT NULL,
        AInterval       INT UNSIGNED NOT NULL,
        Projn           INT UNSIGNED NOT NULL,
        DataType        CHAR(8) NOT NULL,
        ErrorType       CHAR(1) NOT NULL,
        Tag             VARCHAR(40) NOT NULL,
        Data            LONGTEXT NOT NULL,
        Resln           INT UNSIGNED NOT NULL,
        Cappi           INT UNSIGNED NOT NULL,
        Clutter         CHAR(1) NOT NULL,
        Occult          CHAR(1) NOT NULL,
        Atten           CHAR(1) NOT NULL,
        Hail            CHAR(1) NOT NULL,
        CalFactor       DOUBLE,
        User            VARCHAR(10) NOT NULL,

        primary key   (ID),
        index gendix  (GenDate),
        index fordix  (ForDate),
        index crdix   (CrDate),
        index typeix  (DataType, ErrorType),
        index tagix   (Tag)
        ) type = InnoDB;

create table projns (
        ID              INT UNSIGNED AUTO_INCREMENT NOT NULL,
        Hash            INT UNSIGNED NOT NULL,
        StatnId         INT UNSIGNED NOT NULL,
        Parameters      LONGTEXT NOT NULL,

        primary key (ID),
        index hashix (Hash)
        ) type = InnoDB;

The following query uses the wrong index.

select * from gridimages,projns  where GenDate>='2003-05-01 00:00:00'
    and GenDate<'2003-05-01 01:00:00' and gridimages.Projn=projns.ID and
    projns.StatnId=54 and DataType='rainfall' and ErrorType='N' and Tag=''
    order by GenDate

Explain reports

+------------+------+---------------------+--------+---------+
| table      | type | possible_keys       | key    | key_len |
+------------+------+---------------------+--------+---------+
| gridimages | ref  | gendix,typeix,tagix | typeix |       9 |
| projns     | ALL  | PRIMARY             | NULL   |    NULL |
+------------+------+---------------------+--------+---------+

+-------------+------+----------------------------------------------+
| ref         | rows | Extra
+-------------+------+----------------------------------------------+
| const,const |   90 | Using where; Using temporary; Using filesort |
| NULL        |    2 | Using where                                  |
+-------------+------+----------------------------------------------+

The DataType field only contains two values: 'rainfall' and
'rawrefl'. There are equal numbers of records using each type.
The ErrorType field is always 'N' and the Tag field is always ''.
The GenDate field contains a distinct date for each pair of rainfall and
rawrefl records.  There is only one relevant record in the projns table.

If I force the use of the date index:

select * from gridimages use index (gendix),projns where GenDate>='2003-05-01 00:00:00'
    and GenDate<'2003-05-01 01:00:00' and gridimages.Projn=projns.ID and
    projns.StatnId=54 and DataType='rainfall' and ErrorType='N' and Tag=''
    order by GenDate

then explain reports:

+------------+-------+---------------+--------+---------+
| table      | type  | possible_keys | key    | key_len |
+------------+-------+---------------+--------+---------+
| projns     | ALL   | PRIMARY       | NULL   |    NULL |
| gridimages | range | gendix        | gendix |       8 |
+------------+-------+---------------+--------+---------+

+------+------+----------------------------------------------+
| ref  | rows | Extra
+------+------+----------------------------------------------+
| NULL |    2 | Using where; Using temporary; Using filesort |
| NULL |   11 | Using where
+------+------+----------------------------------------------+

When there are 20000 records, using typeix will result in scanning 10000
records whereas using gendix will result in scanning only a few tens
of records. 

If I remove the datatype from the query:

select * from gridimages,projns  where GenDate>='2003-05-01 00:00:00'
    and GenDate<'2003-05-01 01:00:00' and gridimages.Projn=projns.ID and
    projns.StatnId=54 and Tag=''
    order by GenDate

+------------+------+---------------+-------+---------+
| table      | type | possible_keys | key   | key_len |
+------------+------+---------------+-------+---------+
| gridimages | ref  | gendix,tagix  | tagix |      40 |
| projns     | ALL  | PRIMARY       | NULL  |    NULL |
+------------+------+---------------+-------+---------+

+-------+------+----------------------------------------------+
| ref   | rows | Extra
+-------+------+----------------------------------------------+
| const |   90 | Using where; Using temporary; Using filesort |
| NULL  |    2 | Using where                                  |
+-------+------+----------------------------------------------+

The optimiser avoids using the gendix index although it is best. 
tagix is the worst possible choice.

How to repeat:
See above

Suggested fix:
Perhaps what is missing is the ability to analyse the tables like with myisam.
[21 Nov 2003 12:39] Dean Ellis
Verified with the latest 4.0.17 sources.  It does select the more efficient index if the tables are MyISAM.

Thank you.
[12 Dec 2003 6:55] Dean Ellis
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Each of the queries use the appropriate index now in 4.0.17.  Verified with a user-provided snapshot of the tables.