Bug #114996 | Subquery with "order by" in mysql 8.0 is much slower than in mysql 5.7 | ||
---|---|---|---|
Submitted: | 14 May 10:09 | Modified: | 24 Jun 8:10 |
Reporter: | 镇熙 林 | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 8.0.25, 8.0.37 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | regression |
[14 May 10:09]
镇熙 林
[21 May 7:05]
MySQL Verification Team
Hello 镇熙 林, Thank you for the report and feedback. verified as described. regards, Umesh
[23 May 0:52]
镇熙 林
Under what circumstances will this problem be triggered? How to find out in advance? How to avoid it? Thanks!
[24 Jun 8:10]
Ajo Robert
Posted by developer: Below is the analysis and conclusion of the current behaviour. Issue: MySQL 8.0 is slower than 5.7 for the given customer scenario. Query: SELECT * FROM test1 WHERE EXISTS (SELECT 1 FROM test2 WHERE x2=x1) AND id > 3 ORDER BY id ASC LIMIT 5; Scenario: Creates table and add rows. Runs query before index status available to optimizer leading to inefficient plan. 1. Reason for 8.0 plan is different from 5.7. Mysql 8.0 uses Semijoin strategy to simplify the EXISTS clause leading to plan difference. If semijoin is disabled using below optimizer flag, customer will get the same 5.7 plan on 8.0 as well. optimizer_switch='semijoin=off'; 2. Reason for the 8.0 plan in the bug scenario with semijoin strategy slow? In this instance, optimizer does not have the correct index stats to generate proper plan. Background and Details: MySQL uses TABLE and TABLE_SHARE objects to represent open table and its corresponding metadata during DDL/DML execution. Any TABLE objects created in a session are cached for usage within the session for performance improvement. And TABLE_SHARE objects containing metadata and index stats for the table are shared across sessions. InnoDB updates the records per key index estimates to the TABLE_SHARE whenever a new open table is done (happens during new TABLE object creation). So, when a new query uses a cached TABLE object, it will have the old records per key estimate from last open table time. In this customer scenario, INSERT into table and SELECT are happening from same connection leading to optimizer reusing the TABLE object created before the records are inserted into the table. Hence optimizer assumes the table is having minimal records (1) for ref access and leading to a plan not efficient for the actual record count. Table Cache Reference: https://dev.mysql.com/doc/refman/8.4/en/table-cache.html How to prevent/rectify this issue? Any action that will lead to clearing the cache or a new TABLE object creation will resolve this issue. Few are listed below: Flush TABLE. ANALYZE TABLE. ANY DDL on the table. Use of table twice in a query from same connection. Eg: EXPLAIN SELECT * FROM test1, test1 as t2 LIMIT 1. Access the same table from a another/second connection. Etc. Can this affect a production environment? Not really. All the below conditions need to be met to hit this issue. A drastic change in data distribution in the table to affect the records per key calculation. There should be another plan that has less cost based to the old records/key value and performs bad enough to cause trouble. The select should happen from a connection that has stayed open from before the data change happened. No new connection should access the table after new records per key value is available. No FLUSH TABLBE, ANALYZE table should happen. Server TABLE cache should be big enough to have this TABLE object not kicked out. Etc. The only case I can think of is someone setting up a database and uses the same connection to run such a query. A simple FLUSH TABLE will solve the issue in that case. A production environment is highly unlikely to encounter this issue. Even if a query hit this rare possibility, next connection should resolve it for all. Considering above aspects and the current behaviour is as per design, closing this issue as not a bug. Regards Ajo
[8 Jul 10:59]
Kaiwang CHen
This specific problem is due to stale stats in internal cache. See https://bugs.mysql.com/bug.php?id=106264 In an instance with fix of Bug#106264, enable the sync of internal cache by set global innodb_stats_notify_change = 1; then the join order becomes (t1,t2) and performance regression is gone. Explicit ANALYZE of involved tables after insert also gives (t1,t2). The problem suggests taking Bug#106264 into next releases.