Bug #115149 | Optimizer produces a poor query plan for impossible WHERE condition | ||
---|---|---|---|
Submitted: | 28 May 13:05 | Modified: | 29 May 12:01 |
Reporter: | Michael Adelson | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S4 (Feature request) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[28 May 13:05]
Michael Adelson
[28 May 14:00]
MySQL Verification Team
Hi Mr. Adelson, Thank you for your bug report. However, we do not know which product have you tested. All latest releases of all our maintained versions return the same result: mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE id IN (SELECT NULL FROM INFORMATION_SCHEMA.TABLES WHERE FALSE); +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------+ 1 row in set, 1 warning (0.03 sec) mysql> SHOW WARNINGS; +-------+------+---------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------+ | Note | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from `test`.`t1` where false | +-------+------+---------------------------------------------------------------------------+ 1 row in set (0.00 sec) Simply, nothing is scanned as the WHERE is impossible ...... Definitely not a bug.
[28 May 15:24]
Michael Adelson
Did you try one of the queries I marked as bad? EXPLAIN SELECT COUNT(*) FROM t1 WHERE id IN (SELECT NULL WHERE FALSE); | id | select_type | table | rows | filtered | Extra | 1 | PRIMARY | t1 | 90000 | 100.00 | Using where | 2 | DEPENDENT SUBQUERY | null | null | Impossible WHERE This query runs quite slowly if t1 is large; it is clearly doing the fullscan. In contrast, the query with `FROM INFORMATION_SCHEMA.TABLES` in the IN clause returns instantly.
[28 May 16:00]
MySQL Verification Team
Hi Mr. Adelson, That example that you cited used a dependent nested query that , by rule, has to be run for every row of the tables used in the outer query. That is standard behaviour.
[28 May 16:09]
Michael Adelson
> That example that you cited used a dependent nested query that , by rule, has to be run for every row of the tables used in the outer query. Sorry I don't follow. I know that the plan produces a dependent nested query, but that plan is doing extra unnecessary work by my understanding. These 3 queries are all logically equivalent, but one of them gets a fast plan and the other 2 get slow plans: ``` -- EXAMPLE 1: produces a plan that returns instantly with no table scan SELECT COUNT(*) FROM t1 WHERE id IN (SELECT NULL FROM INFORMATION_SCHEMA.TABLES WHERE FALSE); -- EXAMPLE 2: produces a plan that returns slowly after scanning the full table SELECT COUNT(*) FROM t1 WHERE id IN (SELECT NULL WHERE FALSE); -- EXAMPLE 3: produces a plan that returns slowly after scanning the full table SELECT COUNT(*) FROM t1 WHERE id IN (SELECT NULL FROM DUAL WHERE FALSE); ``` I believe that the optimizer should give all 3 of these queries the same fast plan. What am I missing?
[29 May 10:21]
MySQL Verification Team
HI Mr. Adelson, A difference is in the table that actually exists. Problem is that many nested queries could finish in producing impossible condition, or just a condition that returns 0 (zero) rows. Out of 1000 (one thousand) queries, 1 will be like that. If we would optimise all of the examples, then all 1000 queries would have to pass through those checks. This would finish in the over-optimisation, id est, queries would spend too much time in the optimisation stage and not in the execution. We must avoid that. However, in your case, we think that this can make a feature request. This is now verified as a feature request for 8.0 and higher.
[29 May 12:01]
Michael Adelson
> we think that this can make a feature request. Thanks! I think this will be valuable because Dapper (the most popular micro-ORM for .NET) uses the pattern `IN (SELECT NULL WHERE 1 = 0)` to handle empty IN clauses and other frameworks that generate SQL might use it as well. Therefore, optimizing this pattern makes it easier for such frameworks to generate performant queries.