MySQL Bugs: #88704: Compound index (pkey) not used when it should be

Bug #88704	Compound index (pkey) not used when it should be
Submitted:	29 Nov 2017 23:24	Modified:	5 Dec 2017 13:32
Reporter:	Dan Parker	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.7.18-ndb-7.6.3-cluster-gpl	OS:	Red Hat
Assigned to:		CPU Architecture:	Any

Description:
I have a version 5.7.18-ndb-7.6.3-cluster-gpl installation containing a Db with the following table (column names are changed for simplicity)... 

CREATE TABLE `MyTable` (
  `DATETIME_COL`  DATETIME          NOT NULL,
  `CHAR_COL`      CHAR(20)          NOT NULL,
  `TINYINT_COL`   TINYINT UNSIGNED  NOT NULL DEFAULT 0,
  `INT_COL`       INT UNSIGNED      NOT NULL DEFAULT 0,
  PRIMARY KEY (`DATETIME_COL`, `CHAR_COL`))
ENGINE = INNODB;

...that contains 9,587,731 rows with well distributed key values. After running ANALYZE TABLE for the above, EXPLAIN for the following query, which matches a row that is 1,038,376 rows from the beginning primary key-wise... 

SELECT DATETIME_COL, CHAR_COL FROM MyTable
 WHERE (DATETIME_COL, CHAR_COL) >= ('2017-11-17 17:32:34', '201051000020')
 ORDER BY DATETIME_COL, CHAR_COL LIMIT 21;

...produces the following info (emphasis mine): 

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: MyTable
   partitions: NULL
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 65
          ref: NULL
         rows: 21
     filtered: 100.00
        Extra: Using where; Using index

Executing the query produces correct results, but takes over 5 seconds. Executing the same query but with key values at/near the beginning of the table runs within a few milliseconds. Clearly the PRIMARY key is not being used, and of course does not even appear as a possible_key, and a scan is being performed. My question is...why in the world would this be the case? Changing the comparison to just equality (=) and removing the ORDER BY and LIMIT clause causes the PRIMARY key to show up as a possible_key and to actually be used for the query, as does removing CHAR_COL and the corresponding value from the WHERE clause (with no other changes). But I can't understand why this would be the case. Why wouldn't the optimizer use the primary key for the original query where it is so obviously appropriate?

How to repeat:
1) Create the table described and load with a substantial number of records using well-distributed key values.

2) Use EXPLAIN for the listed SELECT query, substituting key values that represent an entry substantially distant from the beginning of the table, key-wise.

3) Execute the query from step #2.

As a convenience, I'm adding the following procedure definition that I've been using to populate 'MyTable' with test data in order to reproduce this issue:

[code]
DELIMITER //
CREATE PROCEDURE PopMyTable( IN RowCount BIGINT UNSIGNED )
BEGIN
    DECLARE DateTimeColVal DATETIME(0);
    DECLARE CharColVal     CHAR(20);
    DECLARE TempCharColVal CHAR(20);
    DECLARE CharColIntVal  BIGINT UNSIGNED;
    DECLARE InsertRowCount BIGINT UNSIGNED;
    DECLARE TotalRowCount  BIGINT UNSIGNED;

    SET autocommit     = 1;
    SET @TotalRowCount  = 0;
    SET @DateTimeColVal = SUBTIME( '2017-11-30 00:00:00', SEC_TO_TIME( RowCount ) );
    SET @CharColIntVal  = 2;
    SET @CharColVal     = "00000000000000000001";

    PREPARE InsertRow FROM 'INSERT INTO MyTable VALUES (?, ?, 0, 0)';

    ArcLoop: REPEAT
        EXECUTE InsertRow USING @DateTimeColVal, @CharColVal;
        SELECT ROW_COUNT() INTO @InsertRowCount;
        IF @InsertRowCount < 1 THEN
            LEAVE ArcLoop;
        END IF;
        SET @TotalRowCount  = (@TotalRowCount + 1);
        SET @DateTimeColVal = ADDTIME( @DateTimeColVal, SEC_TO_TIME( 1 ) );
        SET @TempCharColVal = CAST( @CharColIntVal AS CHAR);
        SET @CharColVal     = CONCAT( SUBSTRING( "00000000000000000000", 1, 20 - CHAR_LENGTH( @TempCharColVal ) ), @TempCharColVal );
        SET @CharColIntVal  = (@CharColIntVal + 1);
    UNTIL @TotalRowCount >= RowCount END REPEAT ArcLoop;

    DEALLOCATE PREPARE InsertRow;
END//
DELIMITER ;
[/code]

Hi!

Thank you for your report. However, it is not a bug.

If you read carefully the EXPLAIN output, you will see that it specified "use index" in the "Extra" column. That means that the index is not only used for range searching, but also, for optimising ORDER BY, so that filesort is not used. This has the negative effect that speed of execution will depend on the location of the tuples that satisfy the WHERE clause. Optimiser can not know that when it is making a plan.

However, you can switch off the scanning of the index for the ORDER BY optimisation, by using the optimiser hints, as described in the chapter 8.9.2 or our Reference Manual. Pay especially the attention to the hing dealing with "IGNORE INDEX FOR ORDER BY".

Not a bug.