MySQL Bugs: #57001: Incorrect optimization of ORDER BY clustered

Bug #57001	Incorrect optimization of ORDER BY clustered_key LIMIT
Submitted:	24 Sep 2010 11:48	Modified:	11 Apr 2020 2:12
Reporter:	Sasha Pachev	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.1.50, 5.1.52-bzr	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When optimizing ORDER BY on a clustered key with a LIMIT  with another key range constraint present test_if_skip_sort_order() will prefer index scan (JT_NEXT) if the heuristic expectation of the number of rows that are getting filtered out during the scan is high enough. The expectation of what is high enough is calculated based on the assumption that the rate of filtering throughout the entire table would be no worse than the number of matched records in range over total number of records. This assumption does not always hold, and when it does not, the results could be disastrous for a query that would run fast if the range is used instead.

How to repeat:
Test case (in mysqltest format):

--source include/have_innodb.inc
use test;
create table t1(id int not null primary key,
    other_id int not null,
    s tinyint not null,
    n int,
    m int,
   key(other_id,s,n)
   ) engine=innodb;
show create table t1;
let $i=10000;
let $id=1;
--disable_query_log
begin;

while ($i)
{
  eval insert into t1 values($id,($id * $id * $id)/1000000000,$id>7000,$id+5,$id+10);
  dec $i;
  inc $id;
}
eval insert into t1 values($id*35,$id-10,1,3,5);
eval insert into t1 values($id*40,$id-100,1,3,5);
eval insert into t1 values($id*50,$id-3,1,3,5);
commit;
--enable_query_log
--vertical_results
let cutoff=955;
let limit=3;
eval select count(*) from t1 where other_id > $cutoff;
eval explain extended select * from t1 where other_id > $cutoff order by id desc limit $limit;
eval explain extended select * from t1 force key(other_id) where other_id > $cutoff order by id desc limit $limit;

Results:

use test;
create table t1(id int not null primary key,
other_id int not null,
s tinyint not null,
n int,
m int,
key(other_id,s,n)
) engine=innodb;
show create table t1;
Table   Create Table
t1      CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `other_id` int(11) NOT NULL,
  `s` tinyint(4) NOT NULL,
  `n` int(11) DEFAULT NULL,
  `m` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `other_id` (`other_id`,`s`,`n`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
select count(*) from t1 where other_id > 955;
count(*)        154
explain extended select * from t1 where other_id > 955 order by id desc limit 3;
id      1
select_type     SIMPLE
table   t1
type    index
possible_keys   other_id
key     PRIMARY
key_len 4
ref     NULL
rows    181
filtered        85.08
Extra   Using where
Warnings:
Level   Note
Code    1003
Message select `test`.`t1`.`id` AS `id`,`test`.`t1`.`other_id` AS `other_id`,`test`.`t1`.`s` AS `s`,`test`.`t1`.`n` AS `n`,`test`.`t1`.`m` AS `m` from `test`.`t1` where (`test`.`t1`.`other_id` > 955) order by `test`.`t1`.`id` desc limit 3
explain extended select * from t1 force key(other_id) where other_id > 955 order by id desc limit 3;
id      1
select_type     SIMPLE
table   t1
type    range
possible_keys   other_id
key     other_id
key_len 4
ref     NULL
rows    154
filtered        100.00
Extra   Using where; Using filesort
Warnings:
Level   Note
Code    1003
Message select `test`.`t1`.`id` AS `id`,`test`.`t1`.`other_id` AS `other_id`,`test`.`t1`.`s` AS `s`,`test`.`t1`.`n` AS `n`,`test`.`t1`.`m` AS `m` from `test`.`t1` FORCE INDEX (`other_id`) where (`test`.`t1`.`other_id` > 955) order by `test`.`t1`.`id` desc limit 3
drop table t1;

The optimizer thinks that it would only need to examine 181 rows before it will stop due to LIMIT, but good luck with that! It will be chugging along through almost every record in the table because other_id is perfectly correlated with id contrary to the assumption that the optimizer makes per the following comment in sql_select.cc and the code to follow:

 /*
            We assume that each of the tested indexes is not correlated
            with ref_key. Thus, to select first N records we have to scan
            N/selectivity(ref_key) index entries.
            selectivity(ref_key) = #scanned_records/#table_records =
            table->quick_condition_rows/table_records.
            In any case we can't select more than #table_records.
            N/(table->quick_condition_rows/table_records) > table_records
            <=> N > table->quick_condition_rows.
          */
          if (select_limit > table->quick_condition_rows)
            select_limit= table_records;
          else
            select_limit= (ha_rows) (select_limit *
                                     (double) table_records /
                                      table->quick_condition_rows);

Suggested fix:
Since this is an unsafe optimization there should be a way to disable it with a my.cnf dynamic variable.

Thank you for the bug report.

Should be fixed in MySQL 8.0.21 by WL#13929 / BUG#97001.

See same for docs info.

Closed.