Bug #99414 Where exists performance 8.0.16 onwards
Submitted: 30 Apr 2020 17:48 Modified: 4 May 2020 13:42
Reporter: Jonathan Taylor Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.20 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression, semijoin

[30 Apr 2020 17:48] Jonathan Taylor
Description:
After upgrading our server from MySQL 5.7 to 8.0.20 we've seen a huge degrade in performance an a query with a "where exists" condition.

On 5.7 the query below would prefer to start at the country table and perform a scan of the depot table for each row. This would be quick as the country table has around 200 rows and the depot table has 500k+, and it only needs to find 1 that matches the criteria for each country.

8.0.20 however decides to do a full table scan first to find all the non-deleted depots, and then narrows down the countries. With 500k+ depots this leads to it taking ~3 seconds vs 50ms on 5.7.

How to repeat:
Run the following query on 5.7 vs 8:

SELECT c.id
FROM country c
WHERE EXISTS (
	SELECT 1
	FROM depot
       	INNER JOIN location ON location.id = depot.location_id
       		INNER JOIN city ON city.id = location.city_id
       			INNER JOIN country ON country.id = city.country_id
   	WHERE country.id = c.id AND depot.deleted = 0
);

On 5.7 this produces the following execution plan:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: c
   partitions: NULL
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 236
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: country
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.c.id
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: city
   partitions: NULL
         type: ref
possible_keys: PRIMARY,country_id
          key: country_id
      key_len: 4
          ref: test.c.id
         rows: 68
     filtered: 100.00
        Extra: Using where; Using index
*************************** 4. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: location
   partitions: NULL
         type: ref
possible_keys: PRIMARY,city_id
          key: city_id
      key_len: 4
          ref: test.city.id
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 5. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: depot
   partitions: NULL
         type: ref
possible_keys: location_id
          key: location_id
      key_len: 4
          ref: test.location.id
         rows: 8
     filtered: 10.00
        Extra: Using where

However on 8.0.20 it produces:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: depot
   partitions: NULL
         type: ALL
possible_keys: location_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1071984
     filtered: 10.00
        Extra: Using where; Start temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: location
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY,city_id
          key: PRIMARY
      key_len: 4
          ref: test.depot.location_id
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: city
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY,country_id
          key: PRIMARY
      key_len: 4
          ref: test.location.city_id
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.city.country_id
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE
        table: country
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.city.country_id
         rows: 1
     filtered: 100.00
        Extra: Using index; End temporary
[1 May 2020 18:01] Øystein Grøvlen
Suggested workaround:  set optimizer_switch='duplicateweedout=off';

If you attached the optimizer trace for this query, it will be easier for the developers to understand what is going on.
[2 May 2020 12:25] Jonathan Taylor
optimizer trace

Attachment: trace.json (application/json, text), 101.15 KiB.

[2 May 2020 12:28] Jonathan Taylor
Thanks Øystein, optimizer plan attached.

I can confirm that the issue doesn't happen when setting either duplicateweedout=off or semijoin=off
[2 May 2020 12:41] Jonathan Taylor
Tested on 8.0.15 - issue doesn't occur. Would appear related to the semijoin exist optimisations added in 8.0.16
[2 May 2020 22:59] Øystein Grøvlen
I processed the trace with the tool available at https://github.com/ogrovlen/opttrace to get a condensed trace for the join optimizer:

select#1:
Table AccessType:IndexName Rows/Cost TotalRows/TotalCost
--------------------------------------------------------
`country` `c` scan 236/23.85 236/23.85
  `country` eq_ref:PRIMARY 1/82.6 236/106.45
    `city` ref:country_id 68.721/1873.4 16218/1979.9
      `location` ref:city_id 6.1079/22415 99059/24395
        `depot` ref:location_id 6.4349/223102 63743/247497 Semijoin(FM:236/247497 DW:236/253896) *** NEW BEST PLAN ***
      `depot` scan 119232/19400000 193000000/19400000 PRUNED(cost)
    `location` scan 96688/2280000 22800000/2280000 PRUNED(cost)
    `depot` scan 119232/292252 2810000/292359 PRUNED(cost)
  `city` ref:country_id 68.721/1873.4 16218/1897.3 PRUNED(heuristic)
  `location` scan 96688/2280000 22800000/2280000 PRUNED(cost)
  `depot` scan 119232/292213 2810000/292237 PRUNED(cost)
`country` scan 236/23.85 236/23.85
  `country` `c` eq_ref:PRIMARY 1/82.6 236/106.45
    `city` ref:country_id 68.721/1873.4 16218/1979.9
      `location` ref:city_id 6.1079/22415 99059/24395
        `depot` ref:location_id 6.4349/223102 63743/247497 Semijoin(FM:236/247497 DW:236/253896) PRUNED(cost)
      `depot` scan 119232/19400000 193000000/19400000 PRUNED(cost)
    `location` scan 96688/2280000 22800000/2280000 PRUNED(cost)
    `depot` scan 119232/292252 2810000/292359 PRUNED(cost)
  `city` ref:country_id 68.721/1873.4 16218/1897.3 PRUNED(heuristic)
  `location` scan 96688/2280000 22800000/2280000 PRUNED(cost)
  `depot` scan 119232/292213 2810000/292237 PRUNED(cost)
`city` scan 15531/1594.4 15531/1594.4
  `country` `c` eq_ref:PRIMARY 1/5435.9 15531/7030.2
    `country` eq_ref:PRIMARY 1/5435.9 15531/12466 eq_ref-extended
      `location` ref:city_id 6.1079/21466 94862/33932
        `depot` ref:location_id 6.4349/213649 61042/247580 Semijoin(DW:15531/255239) PRUNED(cost)
      `depot` scan 119232/18500000 185000000/18600000 PRUNED(cost)
  `location` ref:city_id 6.1079/21466 94862/23060 PRUNED(heuristic)
  `depot` scan 119232/18500000 185000000/18500000 PRUNED(cost)
`location` scan 96688/9779.8 96688/9779.8
  `country` `c` scan 236/2280000 22800000/2290000 PRUNED(cost)
  `country` scan 236/2280000 22800000/2290000 PRUNED(cost)
  `city` eq_ref:PRIMARY 1/50813 96688/60592
    `country` `c` eq_ref:PRIMARY 1/33841 96688/94433
      `country` eq_ref:PRIMARY 1/33841 96688/128274 eq_ref-extended
        `depot` ref:location_id 6.4349/217762 62218/346036 Semijoin(DW:55696/357828) PRUNED(cost)
    `depot` ref:location_id 6.4349/217762 62218/278354 PRUNED(cost)
  `depot` ref:location_id 6.4349/217762 62218/227541
    `country` `c` scan 236/1470000 14700000/1700000 PRUNED(cost)
    `country` scan 236/1470000 14700000/1700000 PRUNED(cost)
    `city` eq_ref:PRIMARY 1/32697 62218/260239 PRUNED(cost)
`depot` scan 119232/11979 11923/11979
  `country` `c` scan 236/281388 2810000/293367 PRUNED(cost)
  `country` scan 236/281388 2810000/293367 PRUNED(cost)
  `city` scan 15531/18500000 185000000/18500000 PRUNED(cost)
  `location` eq_ref:PRIMARY 1/7074.6 11923/19054
    `country` `c` scan 236/281388 2810000/300442 PRUNED(cost)
    `country` scan 236/281388 2810000/300442 PRUNED(cost)
    `city` eq_ref:PRIMARY 1/6266 11923/25320
      `country` `c` eq_ref:PRIMARY 1/4173.1 11923/29493
        `country` eq_ref:PRIMARY 1/4173.1 11923/33666 eq_ref-extended Semijoin(DW:11923/36052) *** NEW BEST PLAN ***
[2 May 2020 23:03] Øystein Grøvlen
Looks like the cost for FirstMatch (FM) strategy is the cost for the whole inner join, not just to find the first match.  For the whole join, the chosen join order is probably the best, but not for semijoin.
[4 May 2020 13:10] MySQL Verification Team
Hi Mr. Taylor,

Thank you for your bug report.

I have ran your test case on both 5.7 and on latest 8.0. I have got the same result as you, meaning the scanning of the `depot` table. This is the output:

+----+-------------+----------+------------+--------+--------------------+---------+---------+------------------------+--------+----------+------------------------------+
| id | select_type | table    | partitions | type   | possible_keys      | key     | key_len | ref                    | rows   | filtered | Extra                        |
+----+-------------+----------+------------+--------+--------------------+---------+---------+------------------------+--------+----------+------------------------------+
|  1 | SIMPLE      | depot    | NULL       | ALL    | location_id        | NULL    | NULL    | NULL                   | 119232 |    10.00 | Using where; Start temporary |
|  1 | SIMPLE      | location | NULL       | eq_ref | PRIMARY,city_id    | PRIMARY | 4       | test.depot.location_id |      1 |   100.00 | NULL                         |
|  1 | SIMPLE      | city     | NULL       | eq_ref | PRIMARY,country_id | PRIMARY | 4       | test.location.city_id  |      1 |   100.00 | NULL                         |
|  1 | SIMPLE      | c        | NULL       | eq_ref | PRIMARY            | PRIMARY | 4       | test.city.country_id   |      1 |   100.00 | Using index                  |
|  1 | SIMPLE      | country  | NULL       | eq_ref | PRIMARY            | PRIMARY | 4       | test.city.country_id   |      1 |   100.00 | Using index; End temporary   |
+----+-------------+----------+------------+--------+--------------------+---------+---------+------------------------+--------+----------+------------------------------+

Verified as reported.
[4 May 2020 13:42] Jonathan Taylor
Thanks Sinisa. I've updated the summary to be more specific as the issue only happens in 8.0.16 onwards.
[20 May 2020 12:20] MySQL Verification Team
You are welcome, Mr. Taylor.