Bug #108743 Equality propagation is disabled on utf8mb4_general_ci collation
Submitted: 11 Oct 2022 14:41 Modified: 12 Oct 2022 9:48
Reporter: Yukun Liang Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version: OS:Any
Assigned to: CPU Architecture:Any

[11 Oct 2022 14:41] Yukun Liang
Description:
In the patch for bug#9509, MySQL use MY_COLLATION_HANDLER::propagate to indicate whether the const value can be propagated(used in check_simple_equality). In this patch, my_collation_utf8_general_ci_handler(and some other collations) use my_propagate_complex as its propagate function, i.e. all utf8_general_ci const string value can't be propagated, so all equality propagate in equal predicate are disabled, like:
  SELECT count(*) from t1 join t2 on t1.col_char = t2.col_char where t1.col_char = 'abc';

But in the patch for bug#21698, MySQL add Item_field::subst_argument_checker to disable all string values substituted except the comparison predicate. In fact, this patch also fixes the case in bug#9509 and can do equality propagate on equal predicate.

How to repeat:
create table t1 (s1 char(5) character set utf8mb4 collate utf8mb4_general_ci);
create table t2 (s1 char(5) character set utf8mb4 collate utf8mb4_general_ci);
insert into t1 values ('abc');
insert into t2 values ('abc');

explain format=tree select * from t1 join t2 on t1.s1 = t2.s1 where t1.s1 = 'abc';

This SQL should be rewritten to:
select * from t1, t2 where t1.s1 = 'abc' and t2.s1='abc';

Suggested fix:
all propagate function ptr of collation handler use my_propagate_simple (return true), this will enable equality propagation of all equal predicate, and the case in BUG#9509 will be fixed by item::subst_argument_checker
[12 Oct 2022 9:48] MySQL Verification Team
Hello Yukun Liang,

Thank you for the report and feedback.
Development confirmed that this is an expected behavior.
Quoting Dev's note as is:

====
The equality is propagated correctly for utf8mb4_0900_ai_ci in mysql
8.0:

mysql> explain format=tree select * from t1 join t2 on t1.s1 = t2.s1
where t1.s1 = 'abc'\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (no condition)  (cost=0.70 rows=1)
    -> Filter: (t2.s1 = 'abc')  (cost=0.35 rows=1)
        -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Filter: (t1.s1 = 'abc')  (cost=0.35 rows=1)
            -> Table scan on t1  (cost=0.35 rows=1)

1 row in set (0.00 sec)

Since
1) this has been like this for utf8mb4_general_ci since 2006 and
2) utf8mb4_0900_ai_ci is the default collation for utf8mb4 in mysql 8.0

======

Closing this as a won't fix.

regards,
Umesh