| Bug #115283 | result not correct when using ‘<=>’ operator compare null value | ||
|---|---|---|---|
| Submitted: | 11 Jun 2024 12:00 | Modified: | 11 Jun 2024 12:10 |
| Reporter: | ximin liang (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
| Version: | 8.0.37 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[11 Jun 2024 12:01]
ximin liang
Description:
Hello mysql team:
I found a bug about result not correct when using ‘<=>’ operator compare null value executed in nested loop join.
How to repeat:
Can be repeated in MySQL 8.4.0
CREATE TABLE `t1` ( `c1` int DEFAULT NULL, `c2` int DEFAULT NULL, KEY `idx` (`c1`));
CREATE TABLE `t2` ( `c1` int DEFAULT NULL, `c2` int DEFAULT NULL, KEY `idx` (`c1`) );
insert into t2(c1,c2) values (1,1),(null,null);
insert into t1(c1,c2) values (1,1),(null,null);
select /*+no_bnl()*/ `t1`.`c1` AS `c1`,`t1`.`c2` AS `c2`,`t2`.`c1` AS `c1`,`t2`.`c2` AS `c2` from `test`.`t1` inner join `test`.`t2`on(`t2`.`c1`= `t1`.`c1`) and (`t1`.`c1` <=> `t2`.`c1`);
+------+------+------+------+
| c1 | c2 | c1 | c2 |
+------+------+------+------+
| 1 | 1 | 1 | 1 |
| NULL | NULL | NULL | NULL |
+------+------+------+------+
2 rows in set (0.00 sec)
if we force hash join:
mysql> select /*+bnl()*/ `t1`.`c1` AS `c1`,`t1`.`c2` AS `c2`,`t2`.`c1` AS `c1`,`t2`.`c2` AS `c2` from `test`.`t1` inner join `test`.`t2` on(`t2`.`c1`= `t1`.`c1`) and (`t1`.`c1` <=> `t2`.`c1`);
+------+------+------+------+
| c1 | c2 | c1 | c2 |
+------+------+------+------+
| 1 | 1 | 1 | 1 |
+------+------+------+------+
1 row in set (0.00 sec)
Suggested fix:
seems plan is not right, notice filter: t1.c1 <=> t1.c1
mysql> explain format=tree select /*+no_bnl()*/ `t1`.`c1` AS `c1`,`t1`.`c2` AS `c2`,`t2`.`c1` AS `c1`,`t2`.`c2` AS `c2` from `test`.`t1` inner join `test`.`t2` on(`t2`.`c1`= `t1`.`c1`) and (`t1`.`c1` <=> `t2`.`c1`);
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join (cost=1.15 rows=2)
-> Filter: (t1.c1 <=> t1.c1) (cost=0.45 rows=2)
-> Table scan on t1 (cost=0.45 rows=2)
-> Index lookup on t2 using idx (c1=t1.c1) (cost=0.3 rows=1)
|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
it is about func `substitute_for_best_equal_field`.
for cond t1.c1 <=> t2.c1, this func replace t2.c1 with t1.c1 so result miss
[11 Jun 2024 12:10]
MySQL Verification Team
Hello ximin liang, Thank you for the report and test case. Verified as described. regards, Umesh
[19 Jun 2024 8:08]
huahua xu
Hi ximin liang,
I think that it is a bug, but I don't agree with your reason for the incorrect result from the query statement execution.
After tracked the optimization process of the statement, I found that the optimizer removed the equality(`t2`.`c1`= `t1`.`c1`) which is considered a redundant predicate condition. It is reasonable, because the predicate is guaranteed to be true by use of 'ref' access method.(the form 'field = value' could be redundant if the [ref-] access chosen for the table use an index containing 'field', where 'value' is specified as (part of) its ref-key)
throgh see the associated source code, it is suspicious:
```
/*
Remove equalities that are guaranteed to be true by use of 'ref' access
method.
Note that ref access implements "table1.field1 <=>
table2.indexed_field2", i.e. if it passed a NULL field1, it will return
NULL indexed_field2 if there are.
Thus the equality "table1.field1 = table2.indexed_field2",
is equivalent to "ref access AND table1.field1 IS NOT NULL"
i.e. "ref access and proper setting/testing of ref->null_rejecting".
Thus, we must be careful, that when we remove equalities below we also
set ref->null_rejecting, and test it at execution; otherwise wrong NULL
matches appear.
So:
- for the optimization phase, the code which is below, and the code in
test_if_ref(), and in add_key_field(), must be kept in sync: if the
applicability conditions in one place are relaxed, they should also be
relaxed elsewhere.
- for the execution phase, all possible execution methods must test
ref->null_rejecting.
*/
Item *left_item = func->arguments()[0]->real_item();
Item *right_item = func->arguments()[1]->real_item();
bool redundant = false;
if (left_item->type() == Item::FIELD_ITEM &&
!(left_item->used_tables() & null_extended) &&
test_if_ref(thd, down_cast<Item_field *>(left_item), right_item,
&redundant)) {
return true;
}
if (redundant) {
return false;
}
if (right_item->type() == Item::FIELD_ITEM &&
!(right_item->used_tables() & null_extended) &&
test_if_ref(thd, down_cast<Item_field *>(right_item), left_item,
&redundant)) {
return true;
}
if (redundant) {
return false;
}
```
The code comment notes that the optimizer should set ref->null_rejecting to avoid wrong NULL matches appear when removing the redundant equality. But in fact, it didn't do that.
[19 Jun 2024 8:40]
huahua xu
The patch maybe fix the bug
Attachment: set_null_rejecting_when_remove_equality_with_ref_access.patch (application/octet-stream, text), 2.26 KiB.
[31 Oct 2024 10:16]
MySQL Verification Team
This is the original bug report for the following duplicate: https://bugs.mysql.com/bug.php?id=116517

Description: Hello mysql team: I found a bug about result not correct when using ‘<=>’ operator compare null value executed in nested loop join. How to repeat: Can be repeated in MySQL 8.4.0 CREATE TABLE `t1` ( `c1` int DEFAULT NULL, `c2` int DEFAULTz NULL, KEY `idx` (`c1`)); CREATE TABLE `t2` ( `c1` int DEFAULT NULL, `c2` int DEFAULT NULL, KEY `idx` (`c1`) ); insert into t2(c1,c2) values (1,1),(null,null); insert into t1(c1,c2) values (1,1),(null,null); select /*+no_bnl()*/ `t1`.`c1` AS `c1`,`t1`.`c2` AS `c2`,`t2`.`c1` AS `c1`,`t2`.`c2` AS `c2` from `test`.`t1` inner join `test`.`t2`on(`t2`.`c1`= `t1`.`c1`) and (`t1`.`c1` <=> `t2`.`c1`); +------+------+------+------+ | c1 | c2 | c1 | c2 | +------+------+------+------+ | 1 | 1 | 1 | 1 | | NULL | NULL | NULL | NULL | +------+------+------+------+ 2 rows in set (0.00 sec) if we force hash join: mysql> select /*+bnl()*/ `t1`.`c1` AS `c1`,`t1`.`c2` AS `c2`,`t2`.`c1` AS `c1`,`t2`.`c2` AS `c2` from `test`.`t1` inner join `test`.`t2` on(`t2`.`c1`= `t1`.`c1`) and (`t1`.`c1` <=> `t2`.`c1`); +------+------+------+------+ | c1 | c2 | c1 | c2 | +------+------+------+------+ | 1 | 1 | 1 | 1 | +------+------+------+------+ 1 row in set (0.00 sec) Suggested fix: seems plan is not right, notice filter: t1.c1 <=> t1.c1 mysql> explain format=tree select /*+no_bnl()*/ `t1`.`c1` AS `c1`,`t1`.`c2` AS `c2`,`t2`.`c1` AS `c1`,`t2`.`c2` AS `c2` from `test`.`t1` inner join `test`.`t2` on(`t2`.`c1`= `t1`.`c1`) and (`t1`.`c1` <=> `t2`.`c1`); +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Nested loop inner join (cost=1.15 rows=2) -> Filter: (t1.c1 <=> t1.c1) (cost=0.45 rows=2) -> Table scan on t1 (cost=0.45 rows=2) -> Index lookup on t2 using idx (c1=t1.c1) (cost=0.3 rows=1) | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) it is about func `substitute_for_best_equal_field`. for cond t1.c1 <=> t2.c1, this func replace t2.c1 with t1.c1 so result miss