Bug #117907 Performance degradation in MySQL>=8.4 vs MySQL 8.0 due to redundant condition not optimized away
Submitted: 8 Apr 8:02 Modified: 9 Apr 9:10
Reporter: Power Gamer Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.4.4, 9.2.0 OS:Any
Assigned to: CPU Architecture:Any

[8 Apr 8:02] Power Gamer
Description:
It looks like query optimizer in MySQL 8.4 and 9.0 has lost the ability to optimize away some redundant conditions (see examples below). That leads to significant degradation of query execution speed in some cases.

For all versions of MySQL "sql_mode" is set to "ANSI,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION".

CREATE TABLE tbl (
	id INT NOT NULL AUTO_INCREMENT,
	a_id INT NOT NULL,
	p_id INT NOT NULL,
	l_id INT NULL DEFAULT NULL,
	prev_self_id INT NULL DEFAULT NULL,
	next_self_id INT NULL DEFAULT NULL,
	PRIMARY KEY ("id"),
	UNIQUE KEY k1 ("a_id","id"),
	UNIQUE KEY k2 ("a_id","p_id","id"),
	UNIQUE KEY k3 ("prev_self_id"),
	UNIQUE KEY k4 ("next_self_id"),
	UNIQUE KEY k5 ("a_id","p_id","l_id","id")
);

Query #1. Has a single "IN" condition.
EXPLAIN SELECT * FROM tbl WHERE (a_id, p_id, l_id, id) IN ((1, 111, NULL, NULL), (1, 111, 22222, NULL));

Query #2. Same as query #1 but adds "AND a_id = 1" condition. Notice that the value of "a_id" checked by this new condition is the same as in old "IN" condition.
EXPLAIN SELECT * FROM tbl WHERE (a_id, p_id, l_id, id) IN ((1, 111, NULL, NULL), (1, 111, 22222, NULL)) AND a_id = 1;

Both SELECTs above should always be extremely fast due to the fact that the query looks for a value of NULL in a column that cannot hold NULLs (tbl.id). 
In practice, for a second query only MySQL 8.0 is able to optimize away redundant "AND a_id = 1" condition and run the query very fast but both MySQL 8.4 and 9.0 fail to do so.

Query #3. Same as query #1 but adds another value "(1, 111, 22222, 55)" in which "tbl.id" is not NULL.
EXPLAIN SELECT * FROM tbl WHERE (a_id, p_id, l_id, id) IN ((1, 111, NULL, NULL), (1, 111, 22222, NULL), (1, 111, 22222, 55));

Query #4. Same as query #2 but adds another value "(1, 111, 22222, 55)" in which "tbl.id" is not NULL.
EXPLAIN SELECT * FROM tbl WHERE (a_id, p_id, l_id, id) IN ((1, 111, NULL, NULL), (1, 111, 22222, NULL), (1, 111, 22222, 55)) AND a_id = 1;

Both SELECTs in query #3 and #4 should always be equally fast due to the fact that additional condition "AND a_id = 1" is redundant and can be thrown away. 
In practice, only under MySQL 8.0 execution plan for query #3 and #4 is the same, but is different under MySQL 8.4 and 9.0.

Results under MySQL 8.0.41.

Query #1. OK, super fast.
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | no matching row in const table |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+

Query #2. OK, exactly the same as query #1, super fast.
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | no matching row in const table |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+

Query #3. OK, fast.
+----+-------------+-------+------------+-------+------------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys    | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+------------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | tbl   | NULL       | range | PRIMARY,k1,k2,k5 | PRIMARY | 4       | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+------------+-------+------------------+---------+---------+------+------+----------+-------------+

Query #4. OK, exactly the same as query #3, fast.
+----+-------------+-------+------------+-------+------------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys    | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+------------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | tbl   | NULL       | range | PRIMARY,k1,k2,k5 | PRIMARY | 4       | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+------------+-------+------------------+---------+---------+------+------+----------+-------------+

Results under MySQL 8.4.4 and 9.2.0.

Query #1. OK, super fast. Same as MySQL 8.0.
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | no matching row in const table |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+

Query #2. BAD, not the same as query #1, very slow. Much worse than MySQL 8.0.
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | tbl   | NULL       | ref  | k1,k2,k5      | k1   | 4       | const |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+

Query #3. OK, fast. Same as MySQL 8.0.
+----+-------------+-------+------------+-------+------------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys    | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+------------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | tbl   | NULL       | range | PRIMARY,k1,k2,k5 | PRIMARY | 4       | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+------------+-------+------------------+---------+---------+------+------+----------+-------------+

Query #4. BAD, not the same as query #3, slow. Much worse than MySQL 8.0.
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | tbl   | NULL       | ref  | k1,k2,k5      | k1   | 4       | const |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+

How to repeat:
Execute "CREATE TABLE tbl" and queries #1-4 (see description).

Suggested fix:
Fix regression in optimizer in MySQL 8.4 and 9.0 (so it produces the same results as MySQL 8.0 on example queries).
[8 Apr 8:07] Power Gamer
Fixed "Synopsis" to reflect that MySQL 8.4 is also affected.
[9 Apr 8:26] MySQL Verification Team
Hello Power Gamer,

Thank you for the report and test case.
If possible could you please provide sample data(I have randomly inserted data but development might ask for actual data etc.)? Thank you.

regards,
Umesh
[9 Apr 9:10] Power Gamer
The root cause of the issue is the loss of ability in newer MySQLs to optimize away redundant condition in the query. This is evident from the EXPLAIN output, the relevant parts of which do not depend on whether the table have any data or not. The newer MySQLs choose wrong execution plan even on empty tables. Especially for the query #2, where the result of SELECT would always be empty row set no matter the data. This should be sufficient for the people with understanding of MySQL code to figure out what exactly has changed since MySQL 8.0 that leads to a different EXPLAIN output.

That said, I did briefly try to create random data set for a simplified test table to show the performance drop, but it was not as significant as on the actual production table with half-a-million records (and I cannot include the dump of the actual data here for obvious reasons).