Bug #105444 | SELECT with impossible condition - different optimizations | ||
---|---|---|---|
Submitted: | 3 Nov 2021 13:32 | Modified: | 4 Nov 2021 17:57 |
Reporter: | Eimantas Jatkonis | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 8.0.27, 5.6.51 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | impossible condition, optimization |
[3 Nov 2021 13:32]
Eimantas Jatkonis
[3 Nov 2021 13:51]
MySQL Verification Team
Hi Mr. Jatkonis, Thank you for your bug report. However, MySQL version 5.6 is not maintained any more. Hence, if you want to report a difference in behaviour you can use latest 8.0 release (which you used here) and latest 5.7 release. Then , you should point us to the difference between two versions and provide us with the opinion on which behaviour is correct.
[3 Nov 2021 21:25]
Eimantas Jatkonis
This is not about difference between versions 8.0.27 and 5.6.51 This is a problem I tested on 8.0.27 (latest available) and 5.6.51 (latest of 5.6) If it is on 8.0.27 and 5.6.51 so 5.7.XX should have similar problem.
[3 Nov 2021 22:49]
Justin Swanhart
Unlike imperative languages, SQL is declarative and there is no concept of 'short circuits' like there are in imperative languages. For example in C: (if ptr != NULL && ptr->val == 1) { } The second statement will never be evaluated if ptr == NULL because the evaluation is short circuited. SQL does not offer such short circuiting in the WHERE clause. All expressions in a SQL statement are evaluated at some point in execution. In order to do this query efficiently, use UNION ALL: SELECT COUNT(*) FROM ( select 0 from datatable where :SearchType = 0 and datatable.Start BETWEEN :SearchDateBegin and :SearchDateEnd UNION ALL select 0 from datatable where :SearchType = 1 and datatable.Name = :SearchName ) This is not a bug.
[3 Nov 2021 23:00]
Justin Swanhart
If it isn't clear why, it is because both SQL statement will be parsed each time and one of them will have an impossible where clause, so that query is not executed at all. select count(*) ... UNION ALL ... select count(*) would work just as well, but I wanted the example to be clearer.
[3 Nov 2021 23:00]
Eimantas Jatkonis
I know UNION gives some advantages to control WHERE conditions (JOIN as well) but this brings more problems (in my case) Documentation provides info about DEAD CODE removal inside MySQL https://dev.mysql.com/doc/internals/en/optimizer-eliminating-dead-code.html https://dev.mysql.com/doc/refman/8.0/en/where-optimization.html but it works only on "paper" If it's not a bug, but developers works on that - this is one of test case scenario they should look in to.
[3 Nov 2021 23:04]
Justin Swanhart
Note that you can not prune subtrees from the where clause just because part of it involves a constant evaluation that evaluates to false because all expressions have to be evaluated and they can have side effects. Calling a stored function in either of the disjunctions for example, or setting a user variable. select * from table where (0=(@myvar := 1)) AND (1=1 AND (@myvar2 := 2)) select @myvar, @myvar2: 1, 2
[3 Nov 2021 23:13]
Eimantas Jatkonis
Last example with ":=" in WHERE condition is interesting and supports your possition. Indeed I doubt what possibility for variable assignment in WHERE clouse is comparable tradeoff for faster data selection :) But you should look at detailed EXPLAIN ... SHOW WARNINGS ... EXPLAIN SELECT COUNT(*) FROM datatable WHERE ( 1 = 1 AND datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW() ) OR ( 1 = 0 AND datatable.Name = '' ) ; SHOW WARNINGS; it shows already optimized SQL /* select#1 */ select count(0) AS `COUNT(*)` from `datatable` where (`datatable`.`Start` between '2010-01-01' and '2021-01-01'); wich is (reformated): select count(0) AS `COUNT(*)` from `datatable` where ( `datatable`.`Start` between '2010-01-01' and '2021-01-01' ); All unneeded conditions removed (dead code), so why it is not used for execution?
[3 Nov 2021 23:19]
Eimantas Jatkonis
I tested ":=" within WHERE condition in 8.0.27 and received this: mysql> EXPLAIN -> SELECT COUNT(*) -> FROM datatable -> WHERE ( -> 1 = 1 AND -> datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW() AND -> @A := 156 > 150 -> ) OR -> ( -> 1 = 0 AND -> datatable.Name = '' AND -> @B := 256 > 150 -> ) -> ; +----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+-----------------------+ | 1 | SIMPLE | datatable | NULL | range | LK_Start | LK_Start | 6 | NULL | 127445 | 100.00 | Using index condition | +----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+-----------------------+ 1 row in set, 3 warnings (0.00 sec) mysql> show warnings; +---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Warning | 1287 | Setting user variables within expressions is deprecated and will be removed in a future release. Consider alternatives: 'SET variable=expression, ...', or 'SELECT expression(s) INTO variables(s)'. | | Warning | 1287 | Setting user variables within expressions is deprecated and will be removed in a future release. Consider alternatives: 'SET variable=expression, ...', or 'SELECT expression(s) INTO variables(s)'. | | Note | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from `bgtest`.`datatable` where (`bgtest`.`datatable`.`Start` between <cache>((now() - interval 100000 hour)) and <cache>(now())) | +---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set (0.00 sec) And your argument with ":=" gets depreciation warning (this gives some points to my arguments).
[3 Nov 2021 23:22]
Justin Swanhart
I did not look at the code, bug logically, placeholders are not evaluated as constants, because the prepared statement represents a prepared execution plan. Each time the statement is evaluated the placeholders are evaluated, and it has to pick one of the two indexes at that runtime. The prepared plan is always a full table scan as a result. So don't use a parameterized SQL here. Just interpolate the string (carefully) and execute it.
[3 Nov 2021 23:23]
Justin Swanhart
I used the variable set in the WHERE clause to illustrate a type of statement with side effects. Stored functions can set user variables, or even modify data, and they will not generate any warning.
[3 Nov 2021 23:31]
Justin Swanhart
or, if you want the safety of bind params instead of interpolation, then use the UNION approach and prepare it.
[4 Nov 2021 7:39]
Eimantas Jatkonis
Testing your arguments - and this one "stored function with user variables" or "stored function with data manipulation" is not actualy working as you expected. Added simple function "donothing()" that increments @A. And after running SQL - I get that function that is in DEAD part of OR - is not ever called. DELIMITER $$ CREATE FUNCTION donothing() RETURNS INT DETERMINISTIC BEGIN SET @A := IFNULL(@A, 0) + 1; RETURN -1; END $$ DELIMITER ; SELECT COUNT(*) FROM datatable WHERE ( 1 = 1 AND datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW() ) OR ( donothing() = -1 AND #Function call 1 = 0 AND #DEAD part of OR datatable.Name = '' ) ; mysql> SELECT @A; +------+ | @A | +------+ | NULL | +------+ 1 row in set (0.00 sec) mysql> SELECT donothing(); +-------------+ | donothing() | +-------------+ | -1 | +-------------+ 1 row in set (0.00 sec) mysql> SELECT @A; +------+ | @A | +------+ | 1 | +------+ 1 row in set (0.00 sec)
[4 Nov 2021 12:57]
MySQL Verification Team
Thank you both, Mr. Jatkonis and Mr. Swanhart. We agree with Mr. Swanhart that this is expected behaviour. Not a bug.
[4 Nov 2021 17:31]
Justin Swanhart
Mr Jatkonis, Note that you CAN prune subtrees from the where clause when ALL of that subtree evaluates to constants that are false :D
[4 Nov 2021 17:35]
Justin Swanhart
So it works exactly as expected and documented. The query doesn't execute at all if any constant expression in the where clause evaluates to false. The entire query is short circuited in this case. But if your function call in your example is not-deterministic and/or takes columnar input, then it will have to be evaluated for each expression, because it is no longer constant. I appreciate your enthusiasm for argument. Regards.
[4 Nov 2021 17:57]
Eimantas Jatkonis
About subtree prunning: In this subtree first condition is always TRUE (function donothing() returns -1) So if you are right - it could not be pruned, but it is pruned. ( donothing() = -1 AND #Function call 1 = 0 AND #DEAD part of OR datatable.Name = '' ) About "not-deterministic and/or takes columnar input": Created one more example with function that takes columnar input dosomething(d DATETIME) and returns some not constant result. DELIMITER $$ CREATE FUNCTION dosomething(d DATETIME) RETURNS DATETIME DETERMINISTIC BEGIN SET @A = IFNULL(@A, 0) + 1; SET @B = IFNULL(d, NOW()); RETURN d; END $$ DELIMITER ; SELECT COUNT(*) FROM datatable WHERE ( 1 = 1 AND datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW() ) OR ( dosomething(datatable.Start) = datatable.Start AND 1 = 0 AND datatable.Name = '' ) ; SELECT @A, @B; +------+------+ | @A | @B | +------+------+ | NULL | NULL | +------+------+ This example does not support your position about evaluating every expression, because dosomething() is never called. ALL subtree is pruned if AT LEAST ONE condition is FALSE. As none of your arguments is confirmend - they are not arguments for marking this issue "not a bug".
[4 Nov 2021 19:31]
Justin Swanhart
If I was wrong, the verification team would disagree with me. Query optimization is complex, and it is complicated further by prepared statements. When evaluating queries, constant replacement, equality propagation, and constant propagation happen. At parse time impossible where conditions are identified. Using prepared statements with placeholders in expressions, user variables in expressions, non-constant functions in expressions, etc, those expressions have to be evaluated on a per-row basis, because the expressions they are evaluated in are not constant in this context. My recommendation to either not use prepared statements or to use a UNION is the proper recommendation. I won't fill up the inbox of the bug team with further discussion. Take care.