Description:
When using the <=> (null-safe equal) operator to compare a row of data with the result of a single-row subquery, if the subquery returns an Empty set, the final query result becomes unstable and inconsistent (returns unexpected rows in some cases), even though the subquery result remains empty throughout.
How to repeat:
-- 1. Create test tables and insert initial data:
-- sql
create table t1(c1 int default null, c2 int default null);
create table t2(c1 int default null, c2 int default null);
insert into t1 values (1,null);
insert into t2 values (1,1);
-- 2. Verify the subquery returns an empty set:
-- sql
select c1,c2 from t2 where c1=4 order by c1,c2 limit 1;
-- Result: Empty set (0.01 sec)
-- 3. Execute the main query with <=> comparison against the empty subquery:
-- sql
select * from t1 where (c1,c2) <=> (select c1,c2 from t2 where c1=4 order by c1,c2 limit 1);
-- Result: Empty set (0.01 sec) (expected)
-- 4. Insert additional data into t2 (does not affect the subquery result):
-- sql
insert into t2 values (2,null);
-- 5. Re-verify the subquery still returns empty set:
-- sql
select c1,c2 from t2 where c1=4 order by c1,c2 limit 1;
-- Result: Empty set (0.01 sec) (same as before)
-- 6. Re-execute the main query:
-- sql
select * from t1 where (c1,c2) <=> (select c1,c2 from t2 where c1=4 order by c1,c2 limit 1);
-- Result: Returns row (1, null) (unexpected)
Suggested fix:
Root Cause
The root cause lies in the compare_pair_for_nulls() function used for row comparison with <=>:
When comparing row data (item->result_type()==ROW_RESULT), the function checks each column individually, and considers NULL = NULL as a match (per <=> semantics).
Even though the subquery returns an Empty set, it still executes a table scan. The last row's values remain "residued" in the field buffer. After inserting (2, null) into t2, this residual row (with c2 = NULL) is read during the comparison. Since t1.c2 and the residual t2.c2 are both NULL, the <=> comparison succeeds, leading to the unexpected row being returned.
Fix Solution
Reset the values of all columns in the single-row subquery result when the subquery returns an empty set, to eliminate residual data in the field buffer. The fix modifies sql/item_subselect.cc to explicitly store NULL in all cached columns of the subquery result when the subquery is a single-row type and returns NULL (empty set):
diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc
index e4466735d8c..c39c0bbdbaf 100644
--- a/sql/item_subselect.cc
+++ b/sql/item_subselect.cc
@@ -3072,6 +3072,13 @@ bool SubqueryWithResult::exec(THD *thd) {
char const *save_where = thd->where;
const bool res = unit->execute(thd);
thd->where = save_where;
+ if (item->substype() == Item_subselect::SINGLEROW_SUBS && item->null_value) {
+ Item_singlerow_subselect *item_singlerow =
+ down_cast<Item_singlerow_subselect *>(this->item);
+ for (uint i = 0; i < item_singlerow->cols(); ++i) {
+ down_cast<Item_cache *>(item_singlerow->element_index(i))->store_null();
+ }
+ }
return res;
}
Description: When using the <=> (null-safe equal) operator to compare a row of data with the result of a single-row subquery, if the subquery returns an Empty set, the final query result becomes unstable and inconsistent (returns unexpected rows in some cases), even though the subquery result remains empty throughout. How to repeat: -- 1. Create test tables and insert initial data: -- sql create table t1(c1 int default null, c2 int default null); create table t2(c1 int default null, c2 int default null); insert into t1 values (1,null); insert into t2 values (1,1); -- 2. Verify the subquery returns an empty set: -- sql select c1,c2 from t2 where c1=4 order by c1,c2 limit 1; -- Result: Empty set (0.01 sec) -- 3. Execute the main query with <=> comparison against the empty subquery: -- sql select * from t1 where (c1,c2) <=> (select c1,c2 from t2 where c1=4 order by c1,c2 limit 1); -- Result: Empty set (0.01 sec) (expected) -- 4. Insert additional data into t2 (does not affect the subquery result): -- sql insert into t2 values (2,null); -- 5. Re-verify the subquery still returns empty set: -- sql select c1,c2 from t2 where c1=4 order by c1,c2 limit 1; -- Result: Empty set (0.01 sec) (same as before) -- 6. Re-execute the main query: -- sql select * from t1 where (c1,c2) <=> (select c1,c2 from t2 where c1=4 order by c1,c2 limit 1); -- Result: Returns row (1, null) (unexpected) Suggested fix: Root Cause The root cause lies in the compare_pair_for_nulls() function used for row comparison with <=>: When comparing row data (item->result_type()==ROW_RESULT), the function checks each column individually, and considers NULL = NULL as a match (per <=> semantics). Even though the subquery returns an Empty set, it still executes a table scan. The last row's values remain "residued" in the field buffer. After inserting (2, null) into t2, this residual row (with c2 = NULL) is read during the comparison. Since t1.c2 and the residual t2.c2 are both NULL, the <=> comparison succeeds, leading to the unexpected row being returned. Fix Solution Reset the values of all columns in the single-row subquery result when the subquery returns an empty set, to eliminate residual data in the field buffer. The fix modifies sql/item_subselect.cc to explicitly store NULL in all cached columns of the subquery result when the subquery is a single-row type and returns NULL (empty set): diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc index e4466735d8c..c39c0bbdbaf 100644 --- a/sql/item_subselect.cc +++ b/sql/item_subselect.cc @@ -3072,6 +3072,13 @@ bool SubqueryWithResult::exec(THD *thd) { char const *save_where = thd->where; const bool res = unit->execute(thd); thd->where = save_where; + if (item->substype() == Item_subselect::SINGLEROW_SUBS && item->null_value) { + Item_singlerow_subselect *item_singlerow = + down_cast<Item_singlerow_subselect *>(this->item); + for (uint i = 0; i < item_singlerow->cols(); ++i) { + down_cast<Item_cache *>(item_singlerow->element_index(i))->store_null(); + } + } return res; }