Bug #107309 Wrong result when a mergeable derived table and IN subquery is present
Submitted: 17 May 2022 0:53 Modified: 18 May 2022 6:41
Reporter: Evgeny P Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.23+ OS:Any
Assigned to: CPU Architecture:Any
Tags: regression

[17 May 2022 0:53] Evgeny P
Description:
Here's the query to demonstrate the bug: 
SELECT 1
 FROM ( SELECT `user`.`User` FROM `mysql`.`user` AS `user` LEFT JOIN `mysql`.`time_zone` ON false ) AS `t1`
 WHERE
  (
    true OR
    `t1`.`User` IN (SELECT 1 where false) )
   AND
   ( true OR true );

in 8.0.21 it returns several rows, in 8.0.23+ (including the latest 8.0.29)  this query returns Impossible Where.

The correct result is several rows, because the WHERE condition in the outer query is always true and the outer join in the derived table with always false ON condition should return all rows from the left table.

here's how it's optimized in 8.0.21:
*) the derived table t1 is merged into the outer select
*) because the WHERE condition of the outer query is effectively true, it's optimized away
*) at the end of the optimization the query becomes SELECT 1  FROM `mysql`.`user` AS `user` LEFT JOIN `mysql`.`time_zone` ON false
then it's executed and we see the result 

here's how it's optimized in 8.0.23:
*) derived table t1 is merged into the outer select
*) then this bug fix kicks in 

commit eadef7cca586a34d27496e69b8785918739b3e49
Author: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
Date:   Mon Sep 21 10:09:55 2020 +0530

    Bug#31216115:  WHEN WHERE CONDITION REFERS TO SUBQUERY AND IS ALWAYS FALSE, SERVER WILL CRASH

and prevents optimizing away `t1`.`User` IN (SELECT 1 where false)  because it contains a reference to the merged derived table
*) "( true OR true )" gets rewritten as "(true)", i.e it's an OR cond with a single arg. The condition now looks like following  "( true OR `t1`.`User` IN (SELECT 1 where false) )  AND  ( true )"
*) then optimizer starts to flatten join, in particular it changes LEFT JOIN from the merged derived table to INNER JOIN because "(true)" represents itself as null rejecting for all tables (see suggested fix) . Once the join is converted to INNER, optimizer adds ON condition to the WHERE clause, i.e adds "false"  to the WHERE condition of the outer query. The condition becomes "( true OR `t1`.`User` IN (SELECT 1 where false) )  AND  ( true ) AND false"
*) the "AND false" part triggers the Impossible WHERE 

How to repeat:
SELECT 1
 FROM ( SELECT `user`.`User` FROM `mysql`.`user` AS `user` LEFT JOIN `mysql`.`time_zone` ON false ) AS `t1`
 WHERE
  (
    true OR
    `t1`.`User` IN (SELECT 1 where false) )
   AND
   ( true OR true );

Suggested fix:
The actual regression is introduced by this commit

commit 88b391372e74616bc35edef4bcd238dabea13899
Author: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
Date:   Thu Sep 27 09:43:25 2018 +0530

    WL#9571: Remove trivial conditions before outer to inner join
             transformation

but it remained hidden till this commit revealed it
commit eadef7cca586a34d27496e69b8785918739b3e49
Author: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
Date:   Mon Sep 21 10:09:55 2020 +0530

    Bug#31216115:  WHEN WHERE CONDITION REFERS TO SUBQUERY AND IS ALWAYS FALSE, SERVER WILL CRASH

The cause of the bug is that the optimized "(true)" got not_null_table_cache property incorrectly set up. In Item_cond::fix_fields() :

  if (remove_condition) {
    new_item->fix_fields(thd, ref);
    used_tables_cache = 0;

    if (func_type == COND_AND_FUNC && ignore_unknown())
      not_null_tables_cache = 0;
    else
      not_null_tables_cache = ~(table_map)0;

    li.rewind();
    while ((item = li++)) {
      Cleanup_after_removal_context ctx(select);
      item->walk(&Item::clean_up_after_removal, enum_walk::SUBQUERY_POSTFIX,
                 pointer_cast<uchar *>(&ctx));
      li.remove();
    }
    Prepared_stmt_arena_holder ps_arena_holder(thd);
    list.push_front(new_item);
  }

After this code not_null_tables_cache for OR will be left as 0xFFFFFFFF. 
Over the course of Item_cond::fix_fields() code amends not_null_tables_cache  with values from arg list thus limiting it to actual not null tables, but here at the end OR will still contain one element which isn't taken into account to calculate correct value of not_null_tables_cache.
The fix should be pretty obvious now.
[17 May 2022 6:42] MySQL Verification Team
Hello Evgeny P,

Thank you for the report and feedback.

regards,
Umesh
[18 May 2022 6:41] MySQL Verification Team
Confirmed internally that it is a duplicate of internally reported issue Bug 34060289(sorry not accessible) which is fixed and documented fix as follows in the MySQL 8.0.30 changelog: A select from a view that used left joins did not return any results. 

Once 8.0.30 is released then details would appear here - https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-30.html