Bug #106473 Inner table of OUT-JOIN is nullable, after applying OUTER2INNER transformation
Submitted: 16 Feb 2022 8:55 Modified: 10 Mar 2022 9:27
Reporter: Shanshan Ying Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.28 OS:Any
Assigned to: CPU Architecture:Any
Tags: Optimizer

[16 Feb 2022 8:55] Shanshan Ying
Description:
After applying outer2inner transformation, the inner table of outer join is still NULLABLE. and the optimizer failed to add a `is-not-null` filter to prun data early.

How to repeat:
create three tables:

 CREATE TABLE `tx` (
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL
);
 CREATE TABLE `ty` (
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  UNIQUE KEY `idx_ab` (`a`,`b`)
) ;
CREATE TABLE `tz` (
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL
);

and then execute two queries;

 explain format=tree select /*+ join_order(tz,tx,ty) */ * from tx left join ty on tx.a = ty.a join tz on ty.b = tz.b;

 explain format=tree select /*+ join_order(tz,tx,ty) */ * from tx join ty on tx.a = ty.a join tz on ty.b = tz.b;

We will see that the latter query constructes two not-null conditions and push them down to table scan:
 Filter: (tx.a is not null) 
 Filter: (tz.b is not null)  

But the former query failed to do so (cause optimizer believes table ty is nullable, and does nothing in `add_not_null_conds`)

Suggested fix:
After appling OUTER2INNER transformation, update the `nullable` flag in inner part  properly.
[16 Feb 2022 10:03] MySQL Verification Team
Hello Shanshan Ying,

Thank you for the report and test case.

regards,
Umesh
[21 Feb 2022 13:28] huahua xu
I do not think that the former query failed. The = operator return 0 if one or both operands are NULL, so the former query will rejecting null in an implicit way. 

You may get a expected query result by:  select /*+ join_order(tz,tx,ty) */ * from tx left join ty on tx.a <=> ty.a join tz on ty.b <=> tz.b;
[21 Feb 2022 13:32] huahua xu
In addition, you can refer to the method `update_ref_and_keys`->`add_key_fields`
[10 Mar 2022 9:27] Shanshan Ying
Hey Xu,

Seems you did not get my point.

mysql [localhost:8027] {msandbox} (test) >  explain format=tree select /*+ join_order(tz,tx,ty) */ * from tx left join ty on tx.a = ty.a join tz on ty.b = tz.b;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join  (cost=1.05 rows=1)
    -> Inner hash join (no condition)  (cost=0.70 rows=1)
        -> Table scan on tx  (cost=0.35 rows=1)
        -> Hash
            -> Table scan on tz  (cost=0.35 rows=1)
    -> Single-row covering index lookup on ty using idx_ab (a=tx.a, b=tz.b)  (cost=0.35 rows=1)
 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql [localhost:8027] {msandbox} (test) >  explain format=tree select /*+ join_order(tz,tx,ty) */ * from tx join ty on tx.a = ty.a join tz on ty.b = tz.b;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join  (cost=1.05 rows=1)
    -> Inner hash join (no condition)  (cost=0.70 rows=1)
        -> Filter: (tx.a is not null)  (cost=0.35 rows=1)
            -> Table scan on tx  (cost=0.35 rows=1)
        -> Hash
            -> Filter: (tz.b is not null)  (cost=0.35 rows=1)
                -> Table scan on tz  (cost=0.35 rows=1)
    -> Single-row covering index lookup on ty using idx_ab (a=tx.a, b=tz.b)  (cost=0.35 rows=1)
 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

It is about perofrmance, not about correctness.

I guess there is no doubt that there two query are logically equal.
But, the first plan failed to pushdown (or add) is-not-null condition to base tables. These conditions can help filter some data in an early stage and enhance performance.