Bug #102251 CTE produces wrong result
Submitted: 15 Jan 2021 4:42 Modified: 12 Dec 2022 22:27
Reporter: xiaoyang chen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0, 8.0.22, 8.0.23 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression

[15 Jan 2021 4:42] xiaoyang chen
Description:
The following CTE query produces wrong result:

```
create table t1(id int);

alter table t1 add column col1 int;
alter table t1 add column col2 int;
alter table t1 add column col3 int;
alter table t1 add column col4 int;
alter table t1 add column col5 int;
alter table t1 add column col6 int;
alter table t1 add column col7 int;
alter table t1 add column col8 int;
alter table t1 add column col9 int;
alter table t1 add column col10 int;
alter table t1 add column col11 int;
alter table t1 add column col12 int;
alter table t1 add column col13 int;
alter table t1 add column col14 int;
alter table t1 add column col15 int;
alter table t1 add column col16 int;
alter table t1 add column col17 int;
alter table t1 add column col18 int;

create table t2(id int, op int);

insert into t1 values(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18);

```

The CTE query:

```
with dest as (
                   SELECT
                        *
                     FROM t1
                     WHERE col1  = 1
                  union
                   SELECT
                        *
                     FROM t1
                     WHERE col2  = 1
              ),
     records AS (
                     SELECT 'I' op, o.id
                        FROM t2 o LEFT JOIN dest t ON o.id = t.id  WHERE t.id IS NULL
                        UNION ALL
                        SELECT 'D' op, t.id
                        FROM dest t LEFT JOIN t2 o ON t.id = o.id  WHERE o.id IS NULL
                 )
     select * from records;

+----+------+
| op | id   |
+----+------+
| D  |    1 |
+----+------+

but there is not id=1 record in t1.

```

How to repeat:
Following the above descripition, one can repeat this problem. 

Suggested fix:
By carefully analyzing the execution process, we find that when building tmp_table_param structure for passing data between tables, the "hidden_field_count" is not processed correctly. Specifically, in 'ConvertItemsToCopy' creating copy_fields should skip hash_field because it is a hidden field. but the CTE temp table lost hidden_field_count value, its origin table's value is 1 while the cloned CTE tmp table's value is 0. 

The root reason is funciton Common_table_expr::clone_tmp_table lost hidden_field_count value between origin table and target table. so the hash_field don't skip.

Suggested fix (based on MySQL-8.0.22):

```
diff --git a/sql/sql_derived.cc b/sql/sql_derived.cc
index c4cd6b7bfdc..b38f512220f 100644
--- a/sql/sql_derived.cc
+++ b/sql/sql_derived.cc
@@ -210,6 +210,8 @@ TABLE *Common_table_expr::clone_tmp_table(THD *thd, TABLE_LIST *tl) {
   tl->table = t;
   t->pos_in_table_list = tl;

+  t->hidden_field_count = first->hidden_field_count;
+
   t->set_not_started();

   if (tmp_tables.push_back(tl)) return nullptr; /* purecov: inspected */
```
[15 Jan 2021 6:02] MySQL Verification Team
Hello xiaoyang chen,

Thank you for the report and test case.
Verified as described. 

regards,
Umesh
[15 Jan 2021 11:29] xiaoyang chen
Suggest fix:

we also should condider to set the hash_field in the cloned tmp table. 

Namely, the fixed code is 

--- a/sql/sql_derived.cc
+++ b/sql/sql_derived.cc
@@ -210,6 +210,12 @@ TABLE *Common_table_expr::clone_tmp_table(THD *thd, TABLE_LIST *tl) {
   tl->table = t;
   t->pos_in_table_list = tl;

+  if (first->hash_field) {
+    t->hash_field = t->field[0];
+  }
+
+  t->hidden_field_count = first->hidden_field_count;
+
   t->set_not_started();
[21 Mar 2021 10:53] MySQL Verification Team
related - Bug #103052
[12 Dec 2022 22:27] Jon Stephens
Documented fix as follows in the MySQL 8.0.33 changelog:

    While cloning a temporary table for a common table expression
    which used shared materialization, the cloned temp table was not
    marked as using hash deduplication, leading to wrong results. We
    now set the hash field for the cloned temporary table correctly,
    and update the hidden field count to take this into account.

Closed.