Bug #63701 | MySQL Chooses unnecessarily to run subquery as dependent | ||
---|---|---|---|
Submitted: | 9 Dec 2011 18:42 | Modified: | 23 Jan 2014 8:27 |
Reporter: | Trey Raymond | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.1.48.4 x86_64 | OS: | Linux (RHEL 4U8) |
Assigned to: | CPU Architecture: | Any | |
Tags: | Optimizer, subqueries, subquery |
[9 Dec 2011 18:42]
Trey Raymond
[9 Dec 2011 19:05]
Valeriy Kravchuk
This is yet another variation of bug #18826 and other bug reports where MySQ:L optimizer (wrongly) considers independent subquery as dependent and execute it for every row from outer table(s). Subquery Materialization methods (that solve this problem) were implemented in 6.0.x (now discontinued) long time ago, and now is being ported to 5.6.x. See http://forge.mysql.com/worklog/task.php?id=3985 etc
[23 Jan 2014 8:27]
Roy Lyseng
This appears to be fixed in MySQL 5.6. Here is the plan that we get for the above query: | 1 | PRIMARY | l | range | PRIMARY | PRIMARY | 3 | NULL | 2 | Using where | | 1 | PRIMARY | t | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using where; Using join buffer (Block Nested Loop) | | 2 | SUBQUERY | e | index_merge | PRIMARY,fkid | PRIMARY,fkid | 4,4 | NULL | 3 | Using union(PRIMARY,fkid); Using where | The subquery is materialized into a temporary table using index merge (indicated by the SUBQUERY row), and this materialized table is later used to lookup the tid values. Semi-join is not used because of the STRAIGHT_JOIN hint. If we remove this hint and disable join buffering, we get this plan: | 1 | SIMPLE | l | range | PRIMARY | PRIMARY | 3 | NULL | 2 | Using where | | 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | test.l.tid | 1 | NULL | | 1 | SIMPLE | <subquery2> | eq_ref | <auto_key> | <auto_key> | 4 | test.l.tid | 1 | NULL | | 2 | MATERIALIZED | e | index_merge | PRIMARY,fkid | PRIMARY,fkid | 4,4 | NULL | 3 | Using union(PRIMARY,fkid); Using where | A semi-join plan is chosen, and the optimizer chooses to materialize the subquery once again (but now with a MATERIALIZED code). What happens if we remove the force index hint too? | 1 | SIMPLE | l | range | PRIMARY | PRIMARY | 3 | NULL | 2 | Using where | | 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | test.l.tid | 1 | NULL | | 1 | SIMPLE | e | ALL | PRIMARY,fkid | NULL | NULL | NULL | 4 | Using where; FirstMatch(t) | Now, the optimizer chooses the FirstMatch strategy instead of materialization, which causes a table scan for each time the subquery is invoked. However, for larger row counts, I think that the optimizer will notice that materialization is less expensive than FirstMatch and choose that strategy instead. But no experimenting with varying row counts has been performed, so this has not been verified. I close the bug report based on this analysis, please re-open if you still have problems.