Bug #105444 SELECT with impossible condition - different optimizations
Submitted: 3 Nov 2021 13:32 Modified: 4 Nov 2021 17:57
Reporter: Eimantas Jatkonis Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.27, 5.6.51 OS:Any
Assigned to: CPU Architecture:Any
Tags: impossible condition, optimization

[3 Nov 2021 13:32] Eimantas Jatkonis
Description:
Two SELECT statements (same logic) from same data table - results in different execution plan and one result takes 3-10 times longer then other.
(MyISAM is ~10times difference, InnoDB 3-5times)

In application I use SELECT statement with parameters (one SQL for different searches):

SELECT COUNT(*)
FROM datatable
WHERE (
        :SearchType = 0 AND
        datatable.Start BETWEEN :SearchDateBegin AND :SearchDateEnd
      ) OR
      (
        :SearchType = 1 AND
        datatable.Name = :SearchName
      )
;

As :SearchType could be 0 or 1, only one part of OR condition is valid.
In case ":SearchType" = 0, second OR part still affects performance.

1. SQL (original)
takes 0.15s

SELECT COUNT(*)
FROM datatable
WHERE (
        0 = 0 AND
        datatable.Start BETWEEN '2010-01-01' AND '2021-01-01'
      ) OR
      (
        0 = 1 AND #IMPOSSIBLE CONDITION
        datatable.Name = ''
      )
;

2. SQL (impossible OR part removed)
takes 0.03s 

SELECT COUNT(*)
FROM datatable
WHERE (
        0 = 0 AND
        datatable.Start BETWEEN '2010-01-01' AND '2021-01-01'
      )
;

For first SQL "EXPLAIN ..." and "SHOW WARNINGS" show already optmized SQL:
mysql> show warnings;
/* select#1 */ select count(0) AS `COUNT(*)` from `datatable` where (`datatable`.`Start` between '2010-01-01' and '2021-01-01');

In fact MySQL is removing IMPOSSIBLE condition, 
But it seems - that it is not used for execution, because it is as fast as second SQL.

Prepared reproduce script with sections:
1. Data prepare - generates datatable whith 1000000 rows.
2. Test execute - two SQL with different execution plan and time
3. Test explain - extended EXPLAIN show optimized query, but it is not used.
4. Test optimizer_switch - expected index_condition_pushdown=off to be the cause, but execution time (without using idexes) is same long. This suggest what WHERE clause with impossibles conditions do not use idexes at all?

How to repeat:
DROP DATABASE IF EXISTS bgtest;
CREATE DATABASE bgtest;
USE bgtest;

#####################################################################################
# 1. Data prepare

DROP TABLE IF EXISTS seq;
CREATE TABLE seq
(
  Value INT
) ENGINE=MyISAM;

#Generate some numbers 0-9
INSERT INTO seq
VALUES
(0), (1), (2), (3), (4), (5), (6), (7), (8), (9);

#Generate some numbers 10-99
INSERT INTO seq
SELECT s1.Value * 10 + s2.Value Result
FROM seq AS s1
JOIN seq AS s2
WHERE s1.Value > 0
ORDER BY Result
;

#Generate some numbers 100-9999
INSERT INTO seq
SELECT s1.Value * 100 + s2.Value Result
FROM seq AS s1
JOIN seq AS s2
WHERE s1.Value > 0
ORDER BY Result
;

#Generate some numbers 10000-999999
INSERT INTO seq
SELECT s1.Value * 10000 + s2.Value Result
FROM seq AS s1
JOIN seq AS s2
WHERE s1.Value > 0 AND s1.Value < 100 
ORDER BY Result
;

SELECT COUNT(*) FROM seq;

DROP TABLE IF EXISTS datatable;
CREATE TABLE datatable
(
  Start DATETIME,
  Name CHAR(255),
  Data TEXT,
  INDEX LK_Start (Start),
  INDEX LK_Name (Name)
) ENGINE=MyISAM;

INSERT INTO datatable
SELECT NOW() - INTERVAL seq.Value HOUR Start,
       CASE seq.Value % 10 
         WHEN 0 THEN NULL
         WHEN 1 THEN '' 
         ELSE MD5(seq.Value)
       END Name,
       MD5(MD5(seq.Value)) Data
FROM seq
;

#####################################################################################
# 2. Test execute

SHOW STATUS LIKE 'handler_read%next';

SELECT COUNT(*)
FROM datatable
WHERE (
        1 = 1 AND
        datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW()
      ) OR
      (
        1 = 0 AND
        datatable.Name = ''
      )
;

SHOW STATUS LIKE 'handler_read%next';

SELECT COUNT(*)
FROM datatable
WHERE datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW()
;

SHOW STATUS LIKE 'handler_read%next';

#####################################################################################
# 3. Test explain

EXPLAIN
SELECT COUNT(*)
FROM datatable
WHERE (
        1 = 1 AND
        datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW()
      ) OR
      (
        1 = 0 AND
        datatable.Name = ''
      )
;

SHOW WARNINGS;

#+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
#| Level | Code | Message                                                                                                                                                                        |
#+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
#| Note  | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from `bgtest`.`datatable` where (`bgtest`.`datatable`.`Start` between <cache>((now() - interval 100000 hour)) and <cache>(now())) |
#+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

select count(0) AS `COUNT(*)` from `bgtest`.`datatable` where (`bgtest`.`datatable`.`Start` between ((now() - interval 100000 hour)) and (now()));

#####################################################################################
# 4. Test optimizer_switch

SET optimizer_switch = 'index_condition_pushdown=off';

SELECT COUNT(*)
FROM datatable
WHERE (
        1 = 1 AND
        datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW()
      ) OR
      (
        1 = 0 AND
        datatable.Name = ''
      )
;

EXPLAIN 
SELECT COUNT(*)
FROM datatable
WHERE (
        1 = 1 AND
        datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW()
      ) OR
      (
        1 = 0 AND
        datatable.Name = ''
      )
;

SET optimizer_switch = 'index_condition_pushdown=on';
[3 Nov 2021 13:51] MySQL Verification Team
Hi Mr. Jatkonis,

Thank you for your bug report.

However, MySQL version 5.6 is not maintained any more. Hence, if you want to report a difference in behaviour you can use latest 8.0 release (which you used here)  and latest 5.7 release.

Then , you should point us to the difference between two versions and provide us with the opinion on which behaviour is correct.
[3 Nov 2021 21:25] Eimantas Jatkonis
This is not about difference between versions 8.0.27 and 5.6.51
This is a problem I tested on 8.0.27 (latest available) and 5.6.51 (latest of 5.6)

If it is on 8.0.27 and 5.6.51 so 5.7.XX should have similar problem.
[3 Nov 2021 22:49] Justin Swanhart
Unlike imperative languages, SQL is declarative and there is no concept of 'short circuits' like there are in imperative languages.

For example in C:
(if ptr != NULL && ptr->val == 1) {

}

The second statement will never be evaluated if ptr == NULL because the evaluation is short circuited.  SQL does not offer such short circuiting in the WHERE clause.  All expressions in a SQL statement are evaluated at some point in execution.

In order to do this query efficiently, use UNION ALL:

SELECT COUNT(*)

FROM ( select 0 from datatable where :SearchType = 0 and datatable.Start BETWEEN :SearchDateBegin and :SearchDateEnd
       UNION ALL
       select 0 from datatable where :SearchType = 1 and datatable.Name = :SearchName
)

This is not a bug.
[3 Nov 2021 23:00] Justin Swanhart
If it isn't clear why, it is because both SQL statement will be parsed each time and one of them will have an impossible where clause, so that query is not executed at all.  select count(*) ... UNION ALL ... select count(*) would work just as well, but I wanted the example to be clearer.
[3 Nov 2021 23:00] Eimantas Jatkonis
I know UNION gives some advantages to control WHERE conditions (JOIN as well) but this brings more problems (in my case)

Documentation provides info about DEAD CODE removal inside MySQL
https://dev.mysql.com/doc/internals/en/optimizer-eliminating-dead-code.html
https://dev.mysql.com/doc/refman/8.0/en/where-optimization.html
but it works only on "paper"

If it's not a bug, but developers works on that - this is one of test case scenario they should look in to.
[3 Nov 2021 23:04] Justin Swanhart
Note that you can not prune subtrees from the where clause just because part of it involves a constant evaluation that evaluates to false because all expressions have to be evaluated and they can have side effects.  Calling a stored function in either of the disjunctions for example, or setting a user variable.

select * from table
where (0=(@myvar := 1)) AND (1=1 AND (@myvar2 := 2))

select @myvar, @myvar2:
1, 2
[3 Nov 2021 23:13] Eimantas Jatkonis
Last example with ":=" in WHERE condition is interesting and supports your possition. Indeed I doubt what possibility for variable assignment in WHERE clouse is comparable tradeoff for faster data selection :)

But you should look at detailed EXPLAIN ... SHOW WARNINGS ...

EXPLAIN
SELECT COUNT(*)
FROM datatable
WHERE (
        1 = 1 AND
        datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW()
      ) OR
      (
        1 = 0 AND
        datatable.Name = ''
      )
;

SHOW WARNINGS;

it shows already optimized SQL 
/* select#1 */ select count(0) AS `COUNT(*)` from `datatable` where (`datatable`.`Start` between '2010-01-01' and '2021-01-01');

wich is (reformated):
select count(0) AS `COUNT(*)` 
from `datatable` 
where (
        `datatable`.`Start` between '2010-01-01' and '2021-01-01'
      );

All unneeded conditions removed (dead code), so why it is not used for execution?
[3 Nov 2021 23:19] Eimantas Jatkonis
I tested ":=" within WHERE condition in 8.0.27 and received this:

mysql> EXPLAIN
    -> SELECT COUNT(*)
    -> FROM datatable
    -> WHERE (
    ->         1 = 1 AND
    ->         datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW() AND
    ->         @A := 156 > 150
    ->       ) OR
    ->       (
    ->         1 = 0 AND
    ->         datatable.Name = '' AND
    ->         @B := 256 > 150
    ->       )
    -> ;
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+-----------------------+
| id | select_type | table     | partitions | type  | possible_keys | key      | key_len | ref  | rows   | filtered | Extra                 |
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+-----------------------+
|  1 | SIMPLE      | datatable | NULL       | range | LK_Start      | LK_Start | 6       | NULL | 127445 |   100.00 | Using index condition |
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+-----------------------+
1 row in set, 3 warnings (0.00 sec)

mysql> show warnings;
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                                              |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1287 | Setting user variables within expressions is deprecated and will be removed in a future release. Consider alternatives: 'SET variable=expression, ...', or 'SELECT expression(s) INTO variables(s)'. |
| Warning | 1287 | Setting user variables within expressions is deprecated and will be removed in a future release. Consider alternatives: 'SET variable=expression, ...', or 'SELECT expression(s) INTO variables(s)'. |
| Note    | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from `bgtest`.`datatable` where (`bgtest`.`datatable`.`Start` between <cache>((now() - interval 100000 hour)) and <cache>(now()))                       |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

And your argument with ":=" gets depreciation warning (this gives some points to my arguments).
[3 Nov 2021 23:22] Justin Swanhart
I did not look at the code, bug logically, placeholders are not evaluated as constants, because the prepared statement represents a prepared execution plan. Each time the statement is evaluated the placeholders are evaluated, and it has to pick one of the two indexes at that runtime.  The prepared plan is always a full table scan as a result.

So don't use a parameterized SQL here.  Just interpolate the string (carefully) and execute it.
[3 Nov 2021 23:23] Justin Swanhart
I used the variable set in the WHERE clause to illustrate a type of statement with side effects.  Stored functions can set user variables, or even modify data, and they will not generate any warning.
[3 Nov 2021 23:31] Justin Swanhart
or, if you want the safety of bind params instead of interpolation, then use the UNION approach and prepare it.
[4 Nov 2021 7:39] Eimantas Jatkonis
Testing your arguments - and this one "stored function with user variables" or "stored function with data manipulation" is not actualy working as you expected.

Added simple function "donothing()" that increments @A.
And after running SQL - I get that function that is in DEAD part of OR - is not ever called.

DELIMITER $$

CREATE FUNCTION donothing()
RETURNS INT
DETERMINISTIC
BEGIN
  SET @A := IFNULL(@A, 0) + 1;
  RETURN -1;
END $$

DELIMITER ;

SELECT COUNT(*)
FROM datatable
WHERE (
        1 = 1 AND
        datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW()
      ) OR
      (
        donothing() = -1 AND       #Function call
        1 = 0 AND                  #DEAD part of OR
        datatable.Name = ''
      )
;

mysql> SELECT @A;
+------+
| @A   |
+------+
| NULL |
+------+
1 row in set (0.00 sec)

mysql> SELECT donothing();
+-------------+
| donothing() |
+-------------+
|          -1 |
+-------------+
1 row in set (0.00 sec)

mysql> SELECT @A;
+------+
| @A   |
+------+
|    1 |
+------+
1 row in set (0.00 sec)
[4 Nov 2021 12:57] MySQL Verification Team
Thank you both, Mr. Jatkonis and Mr. Swanhart.

We agree with Mr. Swanhart that this is expected behaviour.

Not a bug.
[4 Nov 2021 17:31] Justin Swanhart
Mr Jatkonis,

Note that you CAN prune subtrees from the where clause when ALL of that subtree evaluates to constants that are false :D
[4 Nov 2021 17:35] Justin Swanhart
So it works exactly as expected and documented.  The query doesn't execute at all if any constant expression in the where clause evaluates to false. The entire query is short circuited in this case.  But if your function call in your example is not-deterministic and/or takes columnar input, then it will have to be evaluated for each expression, because it is no longer constant.

I appreciate your enthusiasm for argument.  

Regards.
[4 Nov 2021 17:57] Eimantas Jatkonis
About subtree prunning:

In this subtree first condition is always TRUE (function donothing() returns -1)
So if you are right - it could not be pruned, but it is pruned.
      (
        donothing() = -1 AND       #Function call
        1 = 0 AND                  #DEAD part of OR
        datatable.Name = ''
      )

About "not-deterministic and/or takes columnar input":
Created one more example with function that takes columnar input dosomething(d DATETIME) and returns some not constant result.

DELIMITER $$

CREATE FUNCTION dosomething(d DATETIME)
RETURNS DATETIME
DETERMINISTIC
BEGIN
  SET @A = IFNULL(@A, 0) + 1;
  SET @B = IFNULL(d, NOW());
  RETURN d;
END $$

DELIMITER ;

SELECT COUNT(*)
FROM datatable
WHERE (
        1 = 1 AND
        datatable.Start BETWEEN NOW() - INTERVAL 100000 HOUR AND NOW()
      ) OR
      (
        dosomething(datatable.Start) = datatable.Start AND
        1 = 0 AND
        datatable.Name = ''
      )
;

SELECT @A, @B;
+------+------+
| @A   | @B   |
+------+------+
| NULL | NULL |
+------+------+

This example does not support your position about evaluating every expression, because dosomething() is never called.

ALL subtree is pruned if AT LEAST ONE condition is FALSE.

As none of your arguments is confirmend - they are not arguments for marking this issue "not a bug".
[4 Nov 2021 19:31] Justin Swanhart
If I was wrong, the verification team would disagree with me.  Query optimization is complex, and it is complicated further by prepared statements.  When evaluating queries, constant replacement, equality propagation, and constant propagation happen. At parse time impossible where conditions are identified. Using prepared statements with placeholders in expressions, user variables in expressions, non-constant functions in expressions, etc, those expressions have to be evaluated on a per-row basis, because the expressions they are evaluated in are not constant in this context.  My recommendation to either not use prepared statements or to use a UNION is the proper recommendation. I won't fill up the inbox of the bug team with further discussion.  Take care.