Bug #97001 Dangerous optimization reconsidering_access_paths_for_index_ordering
Submitted: 25 Sep 2019 0:35 Modified: 3 Apr 2020 22:41
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.7, 5.7.27 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution, limit, Optimizer, primary key, reconsidering_access_paths_for_index_ordering

[25 Sep 2019 0:35] Jeremy Cole
Description:
The reconsidering_access_paths_for_index_ordering optimization is dangerous and unwise for the general case, and there is currently no way to disable it. This optimization attempts to switch to an index that provides row ordering in preference of an index that provides efficient access _in the hopes that_ the inefficient access will find sufficient rows to satisfy the limit quickly, but that is not guaranteed (or even necessarily common).

This is unfortunately a common scenario in Rails applications because the implicit/default design for tables in Rails applications uses a synthetic (auto-increment) id as primary key, and pagination is often used in conjunction with these tables. 

Given a simplistic table design:

CREATE TABLE `t` (
  `id`                  BIGINT        NOT NULL,
  `other_id`            BIGINT        NOT NULL,
  `covered_column`      VARCHAR(50)   NOT NULL,
  `non_covered_column`  VARCHAR(50)   NOT NULL,
  PRIMARY KEY (`id`),
  INDEX `index_other_id_covered_column` (`other_id`, `covered_column`)
);

It is common to see queries that might look like:

SELECT ... WHERE [secondary key conditions] ORDER BY `id` ASC LIMIT n

This optimization can result in queries with low LIMITs being fast (due to using the correct index), whereas with slightly higher LIMITs they are much slower (due to performing a scan on the order-providing index).

This particular bug/misfeature is well-represented in the bugs system already, but since there are so many reported cases, I created a new bug in order to discuss and provide a patch and solution, rather than leaving this on one of those at random. The related bugs I could find that seem to be related are:

https://bugs.mysql.com/bug.php?id=42094
https://bugs.mysql.com/bug.php?id=54225
https://bugs.mysql.com/bug.php?id=57001
https://bugs.mysql.com/bug.php?id=74030
https://bugs.mysql.com/bug.php?id=76398
https://bugs.mysql.com/bug.php?id=78612
https://bugs.mysql.com/bug.php?id=78651
https://bugs.mysql.com/bug.php?id=83298
https://bugs.mysql.com/bug.php?id=83323
https://bugs.mysql.com/bug.php?id=88181
https://bugs.mysql.com/bug.php?id=92850
https://bugs.mysql.com/bug.php?id=93845
https://jira.percona.com/browse/PS-4935

Thanks for your consideration.

How to repeat:
Create a table with a simple design:

CREATE TABLE `t` (
  `id`                  BIGINT        NOT NULL,
  `other_id`            BIGINT        NOT NULL,
  `covered_column`      VARCHAR(50)   NOT NULL,
  `non_covered_column`  VARCHAR(50)   NOT NULL,
  PRIMARY KEY (`id`),
  INDEX `index_other_id_covered_column` (`other_id`, `covered_column`)
);

Insert a bunch of data into the table (using mysql-test syntax):

START TRANSACTION;
--disable_query_log
let $n = 1000000;
while ($n)
{
  eval INSERT INTO `t` (`id`, `other_id`, `covered_column`, `non_covered_column`) VALUES ($n, $n % 1000, '$n', '$n');
  dec $n;
}
--enable_query_log
COMMIT;

--echo # Shifting some of the data to the "end" of the table.
UPDATE `t` SET `id`=`id`+1000000 WHERE `other_id` = 555;

This query will be slow:

SELECT `non_covered_column` FROM `t` WHERE `other_id` = 555 ORDER BY `id` ASC LIMIT 1;

This query will be fast:

SELECT `non_covered_column` FROM `t` WHERE `other_id` = 555 ORDER BY `id` ASC LIMIT 5;

There is an additional case of the same poor optimization, if queries with large IN() lists are used, however I've not been able to generate a reproducible test case that does not require production data and circumstances. The query could be:

SELECT [all columns]
FROM `t`
WHERE `other_id` = ... AND `id` IN ([> 200 items])
ORDER BY id DESC

This query can choose an appropriate "range" execution plan, but reconsider it and fall back to "ref" access on any index prefixed by other_id (without considering the id column!) rather than using the primary key by id or even a secondary key on (other_id, id). Using a USE/FORCE INDEX on any usable index does not result in a bad execution plan, because it is only through the reconsidering code path that the poor ref plan can be generated.

Suggested fix:
Realistically, I think this optimization is evil and dangerous in general, and it causes rather surprising behavior from MySQL, but some people probably depend on it at this point. I would suggest adding a feature to disable it (via optimizer_switch), and potentially thinking about changing the default of this in a future version.
[25 Sep 2019 0:39] Jeremy Cole
Patch to implement reconsider_index_for_order optimizer_switch

Attachment: reconsider_index_for_order_20190924.patch.txt (text/plain), 8.20 KiB.

[25 Sep 2019 0:39] Jeremy Cole
Test case demonstrating slow query resulting from bad optimization

Attachment: reconsider_index_for_order_bad_optimization.test (application/octet-stream, text), 2.94 KiB.

[25 Sep 2019 7:00] MySQL Verification Team
Hello Jeremy,

Thank you for the report and contribution.
May I request you to  please re-send the patch via "contribution" tab. Thank you.

regards,
Umesh
[25 Sep 2019 7:10] MySQL Verification Team
mtr test results -  5.7.27, 5.6.45 and 8.0.17

Attachment: 97001.results (application/octet-stream, text), 12.66 KiB.

[26 Sep 2019 4:42] MySQL Verification Team
Hello Jeremy,

I'll follow up with the concern people on this. Thank you!

Sincerely,
Umesh
[26 Sep 2019 20:47] Jeremy Cole
Patch to implement reconsider_index_for_order optimizer_switch

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: reconsider_index_for_order_20190924.patch.txt (text/plain), 8.20 KiB.

[27 Sep 2019 4:33] MySQL Verification Team
Thank you, Jeremy.

Sincerely,
Umesh
[1 Oct 2019 7:58] Pavel Katiushyn
I've seen that many times as well.
Looking forward for this fix in next release.
[14 Jan 2020 22:47] Mark Callaghan
Domas has also encountered this problem
https://dom.as/2015/07/30/on-order-by-optimization/
[3 Apr 2020 9:43] Chaithra Gopala Reddy
Hello Jeremy. Thanks for the contribution. It is taken. Regarding
the name of the optimizer switch, it is decided as
"prefer_ordering_index"
[3 Apr 2020 21:41] Jon Stephens
This is implemented in MySQL 8.0.21 as WL#13929. See same for documentation.
[3 Apr 2020 22:41] Jeremy Cole
That's great, thanks! Just for our own planning purposes, could you clarify whether this will be backported to 5.7? (I assume not, but a man can have dreams. :))
[10 Apr 2020 22:41] Jon Stephens
Hi Jeremy!

I don't think this will be backported, and I don't know of any plans to do so.
[11 Nov 2020 13:11] Jon Stephens
Also implemented in MySQL 5.7.33.