Bug #99413 Constant propagation get the wrong result when mix with the different collations
Submitted: 30 Apr 2020 17:13 Modified: 6 May 2020 13:30
Reporter: wj huang Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.19 OS:Any
Assigned to: CPU Architecture:Any

[30 Apr 2020 17:13] wj huang
Description:
Consider the following statement:

mysql> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql> create table t (a char(10) collate utf8mb4_bin, b char(10) collate utf8mb4_0900_ai_ci);
Query OK, 0 rows affected (0.05 sec)

mysql> insert into t values ('a', 'A');
Query OK, 1 row affected (0.01 sec)

mysql> select * from t t1, t t2 where t1.a=t2.b and t2.b='a';
+------+------+------+------+
| a    | b    | a    | b    |
+------+------+------+------+
| a    | A    | a    | A    |
+------+------+------+------+
1 row in set (0.00 sec)

mysql> select * from t t1, t t2 where t1.a=t2.b;
Empty set (0.01 sec)

I think it's because t2.b='a' is propagated to t1.a='a' 

How to repeat:
set names utf8mb4;
create table t (a char(10) collate utf8mb4_bin, b char(10) collate utf8mb4_0900_ai_ci);
insert into t values ('a', 'A');
select * from t t1, t t2 where t1.a=t2.b and t2.b='a';

Suggested fix:
I found that in function my_propagate_uca_900, MySQL only checks !my_uca_have_contractions(cs->uca);    and for utf8mb4_0900_ai_ci, my_uca_have_contractions is false.
[5 May 2020 13:19] MySQL Verification Team
Hello Mr. huang,

Thank you for your bug report.

However, I am in doubt that this is truly a bug. You are mixing here collations with different weights and in expressions of different coercibilities.  You simply can not expect that different collations return the same result, in different expressions.

When you make small changes in your SELECT queries, then these problem disappear. Like with these queries :

select * from t t1, t t2 where t1.a= t2.b collate utf8mb4_bin and t2.b='a' collate utf8mb4_0900_ai_ci ;
select * from t t1, t t2 where t1.a= t2.b collate utf8mb4_bin;

Let us know whether you understood the point.
[5 May 2020 14:11] wj huang
Hi, Sinisa Milivojevic.

I do not quite understand the point here, would you mean that the behaviour is undefined? I have to admit that the behaviour is quite confusing, adding a condition into the query will return extra row.
[5 May 2020 15:07] MySQL Verification Team
Hi Mr. huang,

No, the behaviour is quite defined. But, behaviour is not defined by constant propagation itself, but more by different weights and coercibilities of the collations, as used here in different expressions.

This is why my queries give consistent results in both cases.
[5 May 2020 15:40] wj huang
Hi, Sinisa Milivojevic.

Could you explain how the result is defined by different weights and coercibilities of the collations? 
I tried to change the utf8mb4_0900_ai_ci to utf8mb4_general_ci:

create table t (a char(10) collate utf8mb4_bin, b char(10) collate utf8mb4_general_ci);
insert into t values ('a', 'A');
set names utf8mb4 collate utf8mb4_general_ci;
mysql> select * from t t1, t t2 where t1.a=t2.b and t2.b='a';
Empty set (0.00 sec)

I also tried utf8mb4_da_0900_ai_ci, and got empty result too. Why the utf8mb4_0900_ai_ci is different?
[6 May 2020 12:27] MySQL Verification Team
Hello Mr. huang,

We had a short discussion with Development on your report and it is our conclusion that this is not a bug.

This is the justification:

t2.b='a' is always true, so the query may be simplified to:

select * from t t1, t t2 where t1.a=t2.b;
Empty set (0,00 sec)

And since t1.a = t1.b is done in utf8mb4_bin, the result is correct:

select a,b,a=b from t;
+------+------+------+
| a    | b    | a=b  |
+------+------+------+
| a    | A    |    0 |
+------+------+------+
1 row in set (0,00 sec)

If you add a collate statement, you get what is expected:

 select * from t t1, t t2 where t1.a collate utf8mb4_0900_ai_ci=t2.b and t2.b='a';
+------+------+------+------+
| a    | b    | a    | b    |
+------+------+------+------+
| a    | A    | a    | A    |
+------+------+------+------+
1 row in set (0,00 sec)

Regarding your other questions regarding Unicode character sets and different between standard collations, please read our Reference Manual and some textbook on Unicode encodings, character sets, collations, weights and coercibilities.

Not a bug.
[6 May 2020 12:47] wj huang
Hi, Sinisa Milivojevic.

Thanks for your justification, and I really agree with your justification.
Since t2.b='a' is always true, 
`select * from t t1, t t2 where t1.a=t2.b and t2.b='a';` 
can be simplified to
`select * from t t1, t t2 where t1.a=t2.b;` which returns empty.

However, What I got is one row, not empty, I think this is not consistent with the justification. What I expect is empty, not one row.
[6 May 2020 12:54] MySQL Verification Team
Hi Mr. Huang,

The difference is that we tested the results with current, internal release of our server, which contains patches for 8.0.21 and 8.0.22.

Hence, the confusion. In short, you will see different results with one of those two future releases. We do not know when will either be out, since 8.0.20 has just been released.
[6 May 2020 13:30] wj huang
Hi, Sinisa Milivojevic.

That makes sense, I tested it in 8.0.19 and 8.0.20, both of these versions returned one row. I hope I will see the correct result in the next release. 
Thanks for your reply.
[6 May 2020 13:59] MySQL Verification Team
You are truly welcome.