Bug #104760 | incorrect placement of hash join predicates in a left outer join nest | ||
---|---|---|---|
Submitted: | 29 Aug 2021 19:12 | Modified: | 31 Aug 2021 4:47 |
Reporter: | Shu Lin | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 8.0.26 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[29 Aug 2021 19:12]
Shu Lin
[30 Aug 2021 12:20]
MySQL Verification Team
Hi Mr. Lin, Thank you for your bug report. However, to us this seems to be the expected behaviour. According to the SQL standard (like the one from 2011), placing brackets in the FROM clause, makes those table expressions, actually, a nested query. We truly do not see any justifiable reason for using brackets in a straight query like yours. If you remove all brackets from your query, you should get a much better optimisation plan. Let us know. We are waiting on your feedback.
[30 Aug 2021 13:16]
Shu Lin
Two points (1) t1 LJ (t2 IJ t3) is not semantically equivalent to (t1 LJ t2) IJ t3 (2) This predicate placement bug can also manifest in other types of queries. For example, I changed the LJ to AJ as such below: mysql> explain format=tree select * from t1 where not exists (select * from t2, t3 where t2.c2=t3.c3 and t3.c3>3 and t1.c1=t2.c1); -------------- explain format=tree select * from t1 where not exists (select * from t2, t3 where t2.c2=t3.c3 and t3.c3>3 and t1.c1=t2.c1) -------------- +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Hash antijoin (t2.c1 = t1.c1) (cost=1.05 rows=1) -> Table scan on t1 (cost=0.35 rows=1) -> Hash -> Filter: ((t2.c2 > 3) and (t3.c3 = t2.c2)) (cost=0.70 rows=1) -> Inner hash join (no condition) (cost=0.70 rows=1) -> Table scan on t3 (cost=0.35 rows=1) -> Hash -> Table scan on t2 (cost=0.35 rows=1) |
[30 Aug 2021 13:23]
MySQL Verification Team
Hi Mr. Lin, Thank you for your comment. Actually, you are using dependent nested queries, which therefore leads to the Optimiser plan that you have displayed. In short, this is expected behaviour. It is possible to optimise some, but only few, of those dependent nested query, but that is a long term project, that will take lot's of time and is not within the scope of this forum. Thank you for your contribution.
[30 Aug 2021 14:02]
Shu Lin
The problem isn't just associated with "dependent nested queries". It is more of a problem with hash join. For example, if I change AJ to SemiJoin, then the correct predicate placement is chosen. Note the instead of Hash Join, the plan uses NestedLoopJoin. So the bug doesn't apply anymore. mysql> explain format=tree select * from t1 where exists (select * from t2, t3 where t2.c2=t3.c2 and t3.c3>3 and t1.c1=t2.c1); -------------- explain format=tree select * from t1 where exists (select * from t2, t3 where t2.c2=t3.c2 and t3.c3>3 and t1.c1=t2.c1) -------------- +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Nested loop semijoin (cost=0.70 rows=1) -> Table scan on t1 (cost=0.35 rows=1) -> Nested loop inner join (cost=0.70 rows=1) -> Filter: (t2.c1 = t1.c1) (cost=0.35 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Filter: ((t3.c2 = t2.c2) and (t3.c3 > 3)) (cost=0.35 rows=1) -> Table scan on t3 (cost=0.35 rows=1) I am very disappointed that you don't consider this a bug. It is not just a sub-optimal query plan; it is a disastrous query plan because the plan chooses to do a cartesian join between t2 and t3, despite that a proper join predicate exists.
[30 Aug 2021 18:53]
Shu Lin
I did some debugging, and I think the bug is in void SplitConditions(Item *condition, QEP_TAB *current_table, vector<Item *> *predicates_below_join, vector<PendingCondition> *predicates_above_join, vector<PendingCondition> *join_conditions) Some conditions that should have been placed into predicates_below_join are incorrectly placed into join_conditions. If you want to pass the information to the development team, it shouldn't be very difficult for them to fix it.
[31 Aug 2021 4:47]
MySQL Verification Team
Thank you for your valuable feedback. Verifying based on internal discussion. Sincerely, Umesh
[17 Sep 2021 20:14]
Shu Lin
Proposed fix to the problem. Huawei has signed Oracle Contribution Agreement
Attachment: 0001-Place-conditions-at-the-correct-position-in-the-iter.patch (application/octet-stream, text), 31.35 KiB.
[20 Sep 2021 12:14]
MySQL Verification Team
Hi Mr .Lin, Thank you for your patch. We shall forward it immediately to our Development team.