Bug #94903 | optimazer chooses inefficient plan for order by + limit in subquery | ||
---|---|---|---|
Submitted: | 4 Apr 2019 8:39 | Modified: | 8 Apr 2019 20:27 |
Reporter: | Василий Лукьянчиков | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 8.0.15 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[4 Apr 2019 8:39]
Василий Лукьянчиков
[5 Apr 2019 13:26]
MySQL Verification Team
Hi Vasilii, Thank you for your bug report. In order to verify your bug, we need lot's of additional information. First of all, you should measure the performance of the optimiser plan as is, by running query on the same tables, while MySQL server is totally idle. Next, you can FORCE INDEX on the index that you feel would be faster and measure the time again. If you are using any Unix, LInux or Mac, you can use command `time` , by running both queries three times. Let us know all the times that you have obtained therewith. Let me also tell you that GROUP BY resolution, without index, requires lots of time and machine resources. After we get the speed of the two queries, we shall proceed with other requests for the information. You can find how to set FORCE INDEX in our Reference Manual for 8.0. Thanks in advance for your feedback.
[8 Apr 2019 8:14]
Василий Лукьянчиков
Hi Sinisa, I use test instance on my home notebook. There is no other queries. MySQL server is totally idle. First of all, I used force index. There is no effect. Guilhem Bichot recommended workaround - extend ORDER BY, like explain select t2.* from (select user_id from posts group by user_id) as t1, lateral (select * from posts where t1.user_id=posts.user_id order by user_id desc, date_added desc limit 3) as t2; +----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+ | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 21 | 100.00 | Rematerialize (<derived3>) | | 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL | | 3 | DEPENDENT DERIVED | posts | NULL | ref | user_id | user_id | 4 | t1.user_id | 748 | 100.00 | Backward index scan | | 2 | DERIVED | posts | NULL | range | user_id | user_id | 4 | NULL | 21 | 100.00 | Using index for group-by | +----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+ show status like 'handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 4 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 1 | | Handler_read_key | 42 | | Handler_read_last | 1 | | Handler_read_next | 0 | | Handler_read_prev | 40 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 101 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_update | 0 | | Handler_write | 80 | +----------------------------+-------+ show status like 'sort%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Sort_merge_passes | 0 | | Sort_range | 0 | | Sort_rows | 0 | | Sort_scan | 0 | +-------------------+-------+ for comparison my original query without black magic: explain select t2.* from (select user_id from posts group by user_id) as t1, lateral (select * from posts where t1.user_id=posts.user_id order by date_added desc limit 3) as t2; +----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+ | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 21 | 100.00 | Rematerialize (<derived3>) | | 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL | | 3 | DEPENDENT DERIVED | posts | NULL | ref | user_id | user_id | 4 | t1.user_id | 748 | 100.00 | Using filesort | | 2 | DERIVED | posts | NULL | range | user_id | user_id | 4 | NULL | 21 | 100.00 | Using index for group-by | +----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+ show status like 'handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 4 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 1 | | Handler_read_key | 102 | | Handler_read_last | 1 | | Handler_read_next | 15962 | <-- ! MySQL server read all table | Handler_read_prev | 0 | | Handler_read_rnd | 60 | | Handler_read_rnd_next | 101 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_update | 0 | | Handler_write | 80 | +----------------------------+-------+ show status like 'sort%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Sort_merge_passes | 0 | | Sort_range | 0 | | Sort_rows | 60 | | Sort_scan | 20 | +-------------------+-------+
[8 Apr 2019 13:19]
MySQL Verification Team
Hi, Thanks again, but you did not do what I asked you to do. I just need times for the query without FORCE INDEX (the index that you think would work faster) and with FORCE INDEX. You can also use optimiser hints that are described in our Reference Manual. There are cases where you can help Optimiser to choose the correct path.
[8 Apr 2019 20:27]
Guilhem Bichot
Hello Sinisa! I discussed with Vasiliy about this, in blog: https://guilhembichot.blogspot.com/2019/01/support-for-lateral-derived-tables.html and there is indeed a problem. It's not specific of LATERAL, though, can also be seen with: explain select (select post_text from posts where t1.user_id=posts.user_id order by date_added limit 3) as s from posts t1; Chaithra, who dealt a lot with GROUP BY recently, agrees there's a problem. In short, the LATERAL subquery should be able to just read the composite index (user_id,date_added) to get sorted data, because posts.user_id is actually constant for an execution; but MySQL doesn't see it as constant (as it's not a true constant, but an outer reference). So I'm marking this bug as verified. Feel free to correct me if needed.
[26 Aug 2020 2:58]
Seunguck Lee
I have experienced this issues many times. Already it's passed over 1 YEAR from reporting bug. I think "Using filesort" in subquery is hurt performance a lot for some cases. How is fixing this issue going on ?
[26 Aug 2020 12:32]
MySQL Verification Team
Hi, Our team only processes the bugs. Then it is transferred to one of the teams that do fixing. Each of those teams has their own scheduling, which is not known and which changes very frequently. All you can do is click on the "Affects Me" button.