Bug #112362 WHERE NOT IN with subquery is much slower on 8.1 than 5.7
Submitted: 15 Sep 2023 20:19 Modified: 25 Sep 2023 14:12
Reporter: Bradley Grainger (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.1.0, 8.0.34 OS:Ubuntu (22.04)
Assigned to: CPU Architecture:x86
Tags: regression

[15 Sep 2023 20:19] Bradley Grainger
Description:
A query that was executing very quickly in MySQL 5.7.42 now executes much more slowly in MySQL 8.0.32 and 8.1.0. Instead of performing an index range scan, MySQL executes a full index scan; on a large table this may examine millions of rows, greatly impacting performance.

How to repeat:
Create the following two tables:

CREATE TABLE a
(
  a_id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
  value VARCHAR(100) NOT NULL COLLATE utf8mb4_bin,
  UNIQUE KEY ix_value_unique (value)
);

CREATE TABLE b
(
  b_id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
  value VARCHAR(100) NOT NULL COLLATE utf8mb4_bin,
  KEY ix_value (VALUE)
);

`a` will store a small number (thousands) of unique keys; `b` will store a large number (millions) of rows whose values may or may not exist in `a`.

A sample mysqldump file that creates 10,000 `a` values and 10,000,000 `b` values is available at https://drive.google.com/file/d/1KMRWIgMhejPzzBo9BQdJxTVt9tOXTQsh/view?usp=sharing

Run the following query to find all rows in `b` that don't have a matching value in `a`:

SELECT DISTINCT value FROM b
WHERE value NOT IN (SELECT value FROM a);

In MySQL 5.7.42 this returns "Empty set (0.07 sec)"
In MySQL 8.1.0 this returns "Empty set (2.31 sec)"

This is 33 times slower. In a production application that this was taken from, the performance was 95 times slower.

A workaround in MySQL 8.1.0 is to execute the following query instead:

WITH
  b2 AS (SELECT DISTINCT value FROM b),
  a2 AS (SELECT VALUE value FROM a)
SELECT value FROM b2
WHERE value NOT IN (SELECT value FROM a2);

It doesn't seem necessary that these CTEs should be necessary to force the query planner to pick a more optimal plan. (Additionally, this syntax is not valid on MySQL 5.7 so the application has to change its query based on which server version it's running against.)

The plan for 5.7 is:

+----+-------------+-------+------------+-------+-----------------+-----------------+---------+------+-------+----------+---------------------------------------+
| id | select_type | table | partitions | type  | possible_keys   | key             | key_len | ref  | rows  | filtered | Extra                                 |
+----+-------------+-------+------------+-------+-----------------+-----------------+---------+------+-------+----------+---------------------------------------+
|  1 | PRIMARY     | b     | NULL       | range | ix_value        | ix_value        | 402     | NULL | 10001 |   100.00 | Using where; Using index for group-by |
|  2 | SUBQUERY    | a     | NULL       | index | ix_value_unique | ix_value_unique | 402     | NULL | 10000 |   100.00 | Using index                           |
+----+-------------+-------+------------+-------+-----------------+-----------------+---------+------+-------+----------+---------------------------------------+

The plan for 8.1 is:

+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+--------------+---------+----------+-----------------------------------+
| id | select_type  | table       | partitions | type   | possible_keys       | key                 | key_len | ref          | rows    | filtered | Extra                             |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+--------------+---------+----------+-----------------------------------+
|  1 | SIMPLE       | b           | NULL       | index  | ix_value            | ix_value            | 402     | NULL         | 9111855 |   100.00 | Using index; Using temporary      |
|  1 | SIMPLE       | <subquery2> | NULL       | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 403     | test.b.value |       1 |   100.00 | Using where; Not exists; Distinct |
|  2 | MATERIALIZED | a           | NULL       | index  | ix_value_unique     | ix_value_unique     | 402     | NULL         |   10000 |   100.00 | Using index                       |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+--------------+---------+----------+-----------------------------------+

You can see that many more rows (almost 10 million) are examined.
[19 Sep 2023 11:27] MySQL Verification Team
Hi Mr. Grainger,

Thank you for your bug report.

However, there are literally hundreds of bug reports , which show significant slowdown for the read or write DMLs between 5.7 and 8.0/8.1.

This is a widely known fact.

We have found these bugs in this forum that show that behaviour:

 https://bugs.mysql.com/bug.php?id=94387, https://bugs.mysql.com/bug.php?id=93734,
https://bugs.mysql.com/bug.php?id=94283

However, we have found one bug, with exactly WHERE NOT IN, but this is an internally reported bug, which is not visible to you.

Hence, the performance degradation from 5.7 to 8.0 / 8.1 is very well known fact.

Duplicate.
[25 Sep 2023 14:12] MySQL Verification Team
Hello Bradley Grainger,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[25 Sep 2023 14:13] MySQL Verification Team
Test results - 8.0.11+,  5.7.43

Attachment: Bug112362_5.7.43_8.0.11_8.0.34_8.1.0.txt (text/plain), 18.09 KiB.

[26 Sep 2023 10:34] Roy Lyseng
Posted by developer:
 
Pretty strange nobody have noticed this regression before.
The "problem" with semi-join and anti-join is that it converts a single-table
query into a multi-table query, by dragging in the subquery tables, and
the index range scan algorithm is only implemented for single-table queries.

Until this has been fixed, there is a simple workaround that you can try:
disable semi-join (and anti-join) conversion by this command:

  set optimizer_switch='semijoin=off';

or use the equivalent query hint.