Bug #106938 lateral derived table with LIMIT produces wrong result
Submitted: 7 Apr 2022 2:26 Modified: 8 Apr 2022 1:35
Reporter: xiaoyang chen Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:8.0, 8.0.28 OS:Any
Assigned to: CPU Architecture:Any

[7 Apr 2022 2:26] xiaoyang chen
Description:
The following Lateral query will produce wrong results. 

How to repeat:
```
CREATE TABLE `t1` (
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  KEY `idx1` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 

insert into t1 values (1, 10), (2, 10), (1, 20), (2, 20), (3, 20), (2, 30), (4, 40); 

```
Perform the query:
```
mysql> select a from t1, lateral (select min(b) AS mt from t1 as t2 where t1.a = t2.a + 1 having mt > 10) as tt order by b limit 1;
Empty set (0.00 sec)
```

However, if we remove LIMIT 1, then we can obtain one record, like 
```
mysql> select a from t1, lateral (select min(b) AS mt from t1 as t2 where t1.a = t2.a + 1 having mt > 10) as tt order by b;
+------+
| a    |
+------+
|    4 |
+------+
1 row in set (0.00 sec)

```
[7 Apr 2022 10:19] MySQL Verification Team
Hello xiaoyang chen,

Thank you for the report and test case.
Verified as described. 

regards,
Umesh
[8 Apr 2022 1:33] xiaoyang chen
The tree explain of this query is as follows:

```
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Limit: 1 row(s)
    -> Nested loop inner join
        -> Invalidate materialized tables (row from t1)  (cost=0.95 rows=7)
            -> Sort: t1.b  (cost=0.95 rows=7)
                -> Table scan on t1  (cost=0.95 rows=7)
        -> Table scan on tt  (cost=2.72 rows=2)
            -> Materialize (invalidate on row from t1)
                -> Filter: (mt > 10)
                    -> Aggregate: min(t2.b)
                        -> Filter: (t1.a = (t2.a + 1))  (cost=0.95 rows=7)
                            -> Table scan on t2  (cost=0.95 rows=7)
 |
```
Clearly, the wrong result is due to "LIMIT 1" being pushed to table t1. This is caused by the "filesort_limit" optimization when making a temporary table for sort. 

Suggest fix: 

```
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -5065,7 +5065,7 @@ bool JOIN::make_tmp_tables_info() {
         m_select_limit == HA_POS_ERROR (we need a full table scan)
         unit->select_limit_cnt == 1 (we only need one row in the result set)
       */
-      if (sort_tab->filesort)
+      if (sort_tab->filesort && !sort_tab->lateral_derived_tables_depend_on_me)
         sort_tab->filesort->limit =
             (has_group_by || (primary_tables > curr_tmp_table + 1) ||
              calc_found_rows)
```

When the table is the lateral deepened table, we cannot pushdown limit to this table, as the lateral derived table wants to see all the records in this table.
[8 Apr 2022 1:35] xiaoyang chen
Sorry, the above-listed tree explain is the right one. The original is that 
```
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Limit: 1 row(s)
    -> Nested loop inner join
        -> Invalidate materialized tables (row from t1)  (cost=0.95 rows=7)
            -> Sort: t1.b, limit input to 1 row(s) per chunk  (cost=0.95 rows=7)
                -> Table scan on t1  (cost=0.95 rows=7)
        -> Table scan on tt  (cost=2.72 rows=2)
            -> Materialize (invalidate on row from t1)
                -> Filter: (mt > 10)
                    -> Aggregate: min(t2.b)
                        -> Filter: (t1.a = (t2.a + 1))  (cost=0.95 rows=7)
                            -> Table scan on t2  (cost=0.95 rows=7)
 |
```