Bug #112629 Index is not used if there is some constant boolean expressions
Submitted: 5 Oct 2023 8:26 Modified: 6 Oct 2023 13:20
Reporter: Sunny Chung Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.28 OS:Any (AWS managed RDS)
Assigned to: CPU Architecture:Any

[5 Oct 2023 8:26] Sunny Chung
Description:
For an index like this:
```
CREATE INDEX idx__merchant_id__member_number__status__purpose
    ON rs_table (merchant_id, member_number, status, purpose);
```

For a SQL like this:
```
 select rs1.*
 from rs_table rs1
 where 1=1
 and (0 = 0 or rs1.id = 0)
 and (false or rs1.merchant_id in (
   select tm1.tenant_id
   from tm_table tm1
   where 1=1
   and tm1.replication_status = "ACTIVE"
   and tm1.status = "ACTIVE"
   and (tm1.tenant_name_en like "%ABC%" or tm1.tenant_name_tc like "%ABC%" or tm1.tenant_name_sc like "%ABC%" or tm1.tenant_id like "%ABC%")
 ));
```

This SQL does not use the mentioned index, no matter I use "FORCE INDEX" or not.

If `false or` in line 5 is removed, it uses index.

It can be validated using `EXPLAIN`. The actual query execution time is 17s VS 0.015s.

How to repeat:
As described.

Suggested fix:
Evaluate constant expressions and simplify the query before building query plan.
[5 Oct 2023 12:31] MySQL Verification Team
Hi Mr. Chung,

We must be able to repeat the problem.

Hence, we also need a DDL for the CREATE of that table, as well as the rows. You can send us only a fraction of the total rows, but so that the effect is repeatable.

Can't repeat.
.
[5 Oct 2023 13:11] MySQL Verification Team
Hi Mr. Chung,

We had discussions on your bug in our team and concluded that this is not a bug.

The expression:

false or rs1.merchant_id

forces scanning of all rows, as no index can be used ......

Not a bug.
[6 Oct 2023 3:37] Sunny Chung
It is obvious that the expression `false or X` could be simplified to `X`, according to Boolean algebra.

Hence, the SQL could be simplified to below:
```
 select rs1.*
 from rs_table rs1
 where true
 and true
 and (false or (rs1.merchant_id in (...)));
```

and then:
```
 select rs1.*
 from rs_table rs1
 where rs1.merchant_id in (...);
```

Now, it is obvious that the index `idx__merchant_id__member_number__status__purpose` can be applied.

Note that simplifying `true and X` to `X` is already supported by the current version of MySQL, but `false or X` is not. This inconsistency suggests that this is a bug.

What is the reasoning that the expression `false or X` forces scanning all the rows?
[6 Oct 2023 10:01] MySQL Verification Team
Hi,

When the expression like:

(false OR X) 

is used in the context like yours, then both have to be evaluated against the conditions. Hence, the full scan because of the false.

This is one of the basic concepts of the query resolution.

Not a bug.
[6 Oct 2023 13:20] Sunny Chung
I think I have already made my reasoning very clear, but you rejected without a valid reason.

"false" is a constant, and has no relationship with the row data. There is no reason to repeatedly re-evaluate things that do not change against each row. This is also one of the basic concepts when designing algorithms.

And, you did not justify why MySQL does not trigger full table scan with (true AND X) but (false OR X). They share similar natures.

Also, you did not explain why MySQL cannot simplify the query that human can easily figure out as I demonstrated. It can be done by repeatedly applying basic laws of Boolean algebra. I think implementation exists because they solve problem, not problem exists because the implementation is like that.

I took my example to run in another relational database, PostgreSQL, to check if they are also bounded by the "basic concepts of the query resolution" you mentioned. The result is - they did not suffer this issue, and they are able to use the right index with the (false OR X) conditions.

```
Aggregate  (cost=1921.53..1921.54 rows=1 width=8)
  ->  Index Only Scan using idx_mid on table1  (cost=0.43..1844.50 rows=30812 width=0)
        Index Cond: (mid = ANY ('{66,13,45}'::integer[]))
```

I reported this bug because I want MySQL to be better. Now, I feel I took a wrong action.