Bug #112345 Contribute by tencent: Repeated subquery plans were printed in format=tree.
Submitted: 14 Sep 2023 7:53 Modified: 13 May 3:07
Reporter: tianfeng li (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.34 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[14 Sep 2023 7:53] tianfeng li
Description:
In GetAccessPathsFromItem(), WalkItem will walk into every subselect and collect them, when two subselects overlap, the nested subselect would be printed once more.

How to repeat:
```
CREATE TABLE t1 (a INT NOT NULL, b INT NOT NULL);
CREATE TABLE t2 (c INT NOT NULL, d INT NOT NULL);
CREATE TABLE t3 (e INT NOT NULL);

explain format=tree SELECT * FROM t1 AS ta
WHERE ta.a IN (SELECT c FROM t2 AS tb
WHERE (SELECT MIN(e) FROM t3 as tc
WHERE tb.d=tc.e) < SOME(SELECT e FROM t3 as tc
WHERE ta.b=tc.e));

-> Remove duplicate ta rows using temporary table (weedout)  (cost=5.30 rows=6)
    -> Filter: <nop>(<in_optimizer>((select #3),<exists>(select #4)))  (cost=5.30 rows=6)
        -> Inner hash join (ta.a = tb.c)  (cost=5.30 rows=6)
            -> Table scan on ta  (cost=0.35 rows=7)
            -> Hash
                -> Table scan on tb  (cost=0.85 rows=6)
        -> Select #3 (subquery in condition; dependent)
            -> Aggregate: min(tc.e)  (cost=0.45 rows=1)
                -> Filter: (tb.d = tc.e)  (cost=0.35 rows=1)
                    -> Table scan on tc  (cost=0.35 rows=4)
        -> Select #3 (subquery in condition; dependent)
            -> Aggregate: min(tc.e)  (cost=0.45 rows=1)
                -> Filter: (tb.d = tc.e)  (cost=0.35 rows=1)
                    -> Table scan on tc  (cost=0.35 rows=4)
        -> Select #4 (subquery in condition; dependent)
            -> Limit: 1 row(s)  (cost=0.35 rows=1)
                -> Filter: ((ta.b = tc.e) and <if>(outer_field_is_not_null, (<cache>((select #3)) < tc.e), true))  (cost=0.35 rows=1)
                    -> Table scan on tc  (cost=0.35 rows=4)
                    -> Select #3 (subquery in condition; dependent)
                        -> Aggregate: min(tc.e)  (cost=0.45 rows=1)
                            -> Filter: (tb.d = tc.e)  (cost=0.35 rows=1)
                                -> Table scan on tc  (cost=0.35 rows=4)
```

As above, 'Select #3' have been printed three times,
  1. first time, through top Filter's condition, e.g. (<in_optimizer>((select #3),<exists>(select #4)))
  2. second time,  by postfix walk, through Filter's condition in 'Select #4', e.g. (<cache>((select #3)) < tc.e)
  3. third time, print subplan of 'Select #4'.

Only for the second time, it is unreasonable.

Suggested fix:
WalkItem will walk through all item subselect, thus for nested subselect, it may be collected more than once.
There is a natural thought to solve this, collect all the top subselect for nested subselect if any. Thus we need choose another way to walk item.
Otherwise, we can just prevent same path child been push_backed into children simply.
[14 Sep 2023 7:56] tianfeng li
Repeated subquery plans were printed in format=tree.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 0001-Repeated-subquery-plans-were-printed-in-format-tree.patch (application/octet-stream, text), 1.97 KiB.

[14 Sep 2023 8:05] MySQL Verification Team
Hello tianfeng li,

Thank you for the report and contribution.

regards,
Umesh
[9 May 12:28] Frederic Descamps
Thank you for your contribution, however our development team fixed this in parallel in a slightly different way.
[13 May 3:07] tianfeng li
Thank you for the update! I'm glad to hear the issue has been resolved.
Could you kindly share the commit link for the official fix when convenient? I'd like to learn from the team's approach to improve my future contributions.
Looking forward to contributing again in the future.