| Bug #99414 | Where exists performance 8.0.16 onwards | ||
|---|---|---|---|
| Submitted: | 30 Apr 2020 17:48 | Modified: | 4 May 2020 13:42 |
| Reporter: | Jonathan Taylor | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 8.0.20 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | regression, semijoin | ||
[1 May 2020 18:01]
Øystein Grøvlen
Suggested workaround: set optimizer_switch='duplicateweedout=off'; If you attached the optimizer trace for this query, it will be easier for the developers to understand what is going on.
[2 May 2020 12:25]
Jonathan Taylor
optimizer trace
Attachment: trace.json (application/json, text), 101.15 KiB.
[2 May 2020 12:28]
Jonathan Taylor
Thanks Øystein, optimizer plan attached. I can confirm that the issue doesn't happen when setting either duplicateweedout=off or semijoin=off
[2 May 2020 12:41]
Jonathan Taylor
Tested on 8.0.15 - issue doesn't occur. Would appear related to the semijoin exist optimisations added in 8.0.16
[2 May 2020 22:59]
Øystein Grøvlen
I processed the trace with the tool available at https://github.com/ogrovlen/opttrace to get a condensed trace for the join optimizer: select#1: Table AccessType:IndexName Rows/Cost TotalRows/TotalCost -------------------------------------------------------- `country` `c` scan 236/23.85 236/23.85 `country` eq_ref:PRIMARY 1/82.6 236/106.45 `city` ref:country_id 68.721/1873.4 16218/1979.9 `location` ref:city_id 6.1079/22415 99059/24395 `depot` ref:location_id 6.4349/223102 63743/247497 Semijoin(FM:236/247497 DW:236/253896) *** NEW BEST PLAN *** `depot` scan 119232/19400000 193000000/19400000 PRUNED(cost) `location` scan 96688/2280000 22800000/2280000 PRUNED(cost) `depot` scan 119232/292252 2810000/292359 PRUNED(cost) `city` ref:country_id 68.721/1873.4 16218/1897.3 PRUNED(heuristic) `location` scan 96688/2280000 22800000/2280000 PRUNED(cost) `depot` scan 119232/292213 2810000/292237 PRUNED(cost) `country` scan 236/23.85 236/23.85 `country` `c` eq_ref:PRIMARY 1/82.6 236/106.45 `city` ref:country_id 68.721/1873.4 16218/1979.9 `location` ref:city_id 6.1079/22415 99059/24395 `depot` ref:location_id 6.4349/223102 63743/247497 Semijoin(FM:236/247497 DW:236/253896) PRUNED(cost) `depot` scan 119232/19400000 193000000/19400000 PRUNED(cost) `location` scan 96688/2280000 22800000/2280000 PRUNED(cost) `depot` scan 119232/292252 2810000/292359 PRUNED(cost) `city` ref:country_id 68.721/1873.4 16218/1897.3 PRUNED(heuristic) `location` scan 96688/2280000 22800000/2280000 PRUNED(cost) `depot` scan 119232/292213 2810000/292237 PRUNED(cost) `city` scan 15531/1594.4 15531/1594.4 `country` `c` eq_ref:PRIMARY 1/5435.9 15531/7030.2 `country` eq_ref:PRIMARY 1/5435.9 15531/12466 eq_ref-extended `location` ref:city_id 6.1079/21466 94862/33932 `depot` ref:location_id 6.4349/213649 61042/247580 Semijoin(DW:15531/255239) PRUNED(cost) `depot` scan 119232/18500000 185000000/18600000 PRUNED(cost) `location` ref:city_id 6.1079/21466 94862/23060 PRUNED(heuristic) `depot` scan 119232/18500000 185000000/18500000 PRUNED(cost) `location` scan 96688/9779.8 96688/9779.8 `country` `c` scan 236/2280000 22800000/2290000 PRUNED(cost) `country` scan 236/2280000 22800000/2290000 PRUNED(cost) `city` eq_ref:PRIMARY 1/50813 96688/60592 `country` `c` eq_ref:PRIMARY 1/33841 96688/94433 `country` eq_ref:PRIMARY 1/33841 96688/128274 eq_ref-extended `depot` ref:location_id 6.4349/217762 62218/346036 Semijoin(DW:55696/357828) PRUNED(cost) `depot` ref:location_id 6.4349/217762 62218/278354 PRUNED(cost) `depot` ref:location_id 6.4349/217762 62218/227541 `country` `c` scan 236/1470000 14700000/1700000 PRUNED(cost) `country` scan 236/1470000 14700000/1700000 PRUNED(cost) `city` eq_ref:PRIMARY 1/32697 62218/260239 PRUNED(cost) `depot` scan 119232/11979 11923/11979 `country` `c` scan 236/281388 2810000/293367 PRUNED(cost) `country` scan 236/281388 2810000/293367 PRUNED(cost) `city` scan 15531/18500000 185000000/18500000 PRUNED(cost) `location` eq_ref:PRIMARY 1/7074.6 11923/19054 `country` `c` scan 236/281388 2810000/300442 PRUNED(cost) `country` scan 236/281388 2810000/300442 PRUNED(cost) `city` eq_ref:PRIMARY 1/6266 11923/25320 `country` `c` eq_ref:PRIMARY 1/4173.1 11923/29493 `country` eq_ref:PRIMARY 1/4173.1 11923/33666 eq_ref-extended Semijoin(DW:11923/36052) *** NEW BEST PLAN ***
[2 May 2020 23:03]
Øystein Grøvlen
Looks like the cost for FirstMatch (FM) strategy is the cost for the whole inner join, not just to find the first match. For the whole join, the chosen join order is probably the best, but not for semijoin.
[4 May 2020 13:10]
MySQL Verification Team
Hi Mr. Taylor, Thank you for your bug report. I have ran your test case on both 5.7 and on latest 8.0. I have got the same result as you, meaning the scanning of the `depot` table. This is the output: +----+-------------+----------+------------+--------+--------------------+---------+---------+------------------------+--------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+--------+--------------------+---------+---------+------------------------+--------+----------+------------------------------+ | 1 | SIMPLE | depot | NULL | ALL | location_id | NULL | NULL | NULL | 119232 | 10.00 | Using where; Start temporary | | 1 | SIMPLE | location | NULL | eq_ref | PRIMARY,city_id | PRIMARY | 4 | test.depot.location_id | 1 | 100.00 | NULL | | 1 | SIMPLE | city | NULL | eq_ref | PRIMARY,country_id | PRIMARY | 4 | test.location.city_id | 1 | 100.00 | NULL | | 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.city.country_id | 1 | 100.00 | Using index | | 1 | SIMPLE | country | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.city.country_id | 1 | 100.00 | Using index; End temporary | +----+-------------+----------+------------+--------+--------------------+---------+---------+------------------------+--------+----------+------------------------------+ Verified as reported.
[4 May 2020 13:42]
Jonathan Taylor
Thanks Sinisa. I've updated the summary to be more specific as the issue only happens in 8.0.16 onwards.
[20 May 2020 12:20]
MySQL Verification Team
You are welcome, Mr. Taylor.

Description: After upgrading our server from MySQL 5.7 to 8.0.20 we've seen a huge degrade in performance an a query with a "where exists" condition. On 5.7 the query below would prefer to start at the country table and perform a scan of the depot table for each row. This would be quick as the country table has around 200 rows and the depot table has 500k+, and it only needs to find 1 that matches the criteria for each country. 8.0.20 however decides to do a full table scan first to find all the non-deleted depots, and then narrows down the countries. With 500k+ depots this leads to it taking ~3 seconds vs 50ms on 5.7. How to repeat: Run the following query on 5.7 vs 8: SELECT c.id FROM country c WHERE EXISTS ( SELECT 1 FROM depot INNER JOIN location ON location.id = depot.location_id INNER JOIN city ON city.id = location.city_id INNER JOIN country ON country.id = city.country_id WHERE country.id = c.id AND depot.deleted = 0 ); On 5.7 this produces the following execution plan: *************************** 1. row *************************** id: 1 select_type: PRIMARY table: c partitions: NULL type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 236 filtered: 100.00 Extra: Using where; Using index *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: country partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: test.c.id rows: 1 filtered: 100.00 Extra: Using index *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: city partitions: NULL type: ref possible_keys: PRIMARY,country_id key: country_id key_len: 4 ref: test.c.id rows: 68 filtered: 100.00 Extra: Using where; Using index *************************** 4. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: location partitions: NULL type: ref possible_keys: PRIMARY,city_id key: city_id key_len: 4 ref: test.city.id rows: 1 filtered: 100.00 Extra: Using index *************************** 5. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: depot partitions: NULL type: ref possible_keys: location_id key: location_id key_len: 4 ref: test.location.id rows: 8 filtered: 10.00 Extra: Using where However on 8.0.20 it produces: *************************** 1. row *************************** id: 1 select_type: SIMPLE table: depot partitions: NULL type: ALL possible_keys: location_id key: NULL key_len: NULL ref: NULL rows: 1071984 filtered: 10.00 Extra: Using where; Start temporary *************************** 2. row *************************** id: 1 select_type: SIMPLE table: location partitions: NULL type: eq_ref possible_keys: PRIMARY,city_id key: PRIMARY key_len: 4 ref: test.depot.location_id rows: 1 filtered: 100.00 Extra: NULL *************************** 3. row *************************** id: 1 select_type: SIMPLE table: city partitions: NULL type: eq_ref possible_keys: PRIMARY,country_id key: PRIMARY key_len: 4 ref: test.location.city_id rows: 1 filtered: 100.00 Extra: NULL *************************** 4. row *************************** id: 1 select_type: SIMPLE table: c partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: test.city.country_id rows: 1 filtered: 100.00 Extra: Using index *************************** 5. row *************************** id: 1 select_type: SIMPLE table: country partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: test.city.country_id rows: 1 filtered: 100.00 Extra: Using index; End temporary