MySQL Bugs: #63701: MySQL Chooses unnecessarily to run subquery as dependent

Bug #63701	MySQL Chooses unnecessarily to run subquery as dependent
Submitted:	9 Dec 2011 18:42	Modified:	23 Jan 2014 8:27
Reporter:	Trey Raymond	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.1.48.4 x86_64	OS:	Linux (RHEL 4U8)
Assigned to:		CPU Architecture:	Any
Tags:	Optimizer, subqueries, subquery

Description:
The optimizer sometimes chooses to run certain subqueries as dependent (for every row) even when they do not reference the outside query.

In the production use case, the join logic is similar to the attached example, and with small row counts it performs great - but we have to use the slightly slower version where we just join 'e' based on tid rather than the more selective index_merge (and it's nice to see a good use case for index_merge!) simply because a dependent subquery on many rows is far more work for the server.  As a derived table, it also performs poorly.  If the server recognized that it already had the results of that subquery from a single run, as no outside tables are referenced, it would be the best query plan for any result size.

How to repeat:
mysql> create table e (id int unsigned not null, fkid int unsigned not null, tid int unsigned not null, primary key (id), key (fkid)) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

mysql> create table l (day date not null, tid int unsigned not null, primary key (day)) engine=innodb;
Query OK, 0 rows affected (0.00 sec)

mysql> create table t (id int unsigned not null, data binary(16) not null, primary key (id)) engine=innodb;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into e values (1,1,1),(2,1,1),(3,2,2),(4,3,3);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into t values (1,'testtesttesttest'),(2,'testtesttesttest'),(3,'testtesttesttest');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> insert into l values ('2011-12-16',1),('2011-12-17',2),('2011-12-18',1);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select straight_join t.data
    -> from l
    -> inner join t on t.id=l.tid and t.id in (select e.tid from e force index (primary,fkid) where e.id=1 or e.fkid=1)
    -> where l.day between '2011-12-16' and '2011-12-17';
+------------------+
| data             |
+------------------+
| testtesttesttest |
+------------------+
1 row in set (0.00 sec)

mysql> explain select straight_join t.data
    -> from l
    -> inner join t on t.id=l.tid and t.id in (select e.tid from e force index (primary,fkid) where e.id=1 or e.fkid=1)
    -> where l.day between '2011-12-16' and '2011-12-17';
+----+--------------------+-------+-------------+---------------+--------------+---------+------------+------+----------------------------------------+
| id | select_type        | table | type        | possible_keys | key          | key_len | ref        | rows | Extra                                  |
+----+--------------------+-------+-------------+---------------+--------------+---------+------------+------+----------------------------------------+
|  1 | PRIMARY            | l     | range       | PRIMARY       | PRIMARY      | 3       | NULL       |    1 | Using where                            |
|  1 | PRIMARY            | t     | eq_ref      | PRIMARY       | PRIMARY      | 4       | test.l.tid |    1 |                                        |
|  2 | DEPENDENT SUBQUERY | e     | index_merge | PRIMARY,fkid  | PRIMARY,fkid | 4,4     | NULL       |    3 | Using union(PRIMARY,fkid); Using where |
+----+--------------------+-------+-------------+---------------+--------------+---------+------------+------+----------------------------------------+
3 rows in set (0.00 sec)

Suggested fix:
Have the optimizer check subqueries for outside references no matter where in the query they are located, and if it is unaffected, treat it as independent to avoid rerunning the same thing.

This is yet another variation of bug #18826 and other bug reports where MySQ:L optimizer (wrongly) considers independent subquery as dependent and execute it for every row from outer table(s). 

Subquery Materialization methods (that solve this problem) were implemented in 6.0.x (now discontinued) long time ago, and now is being ported to 5.6.x. See http://forge.mysql.com/worklog/task.php?id=3985 etc

This appears to be fixed in MySQL 5.6.

Here is the plan that we get for the above query:

|  1 | PRIMARY     | l     | range       | PRIMARY       | PRIMARY      | 3       | NULL |    2 | Using where                                        |
|  1 | PRIMARY     | t     | ALL         | PRIMARY       | NULL         | NULL    | NULL |    3 | Using where; Using join buffer (Block Nested Loop) |
|  2 | SUBQUERY    | e     | index_merge | PRIMARY,fkid  | PRIMARY,fkid | 4,4     | NULL |    3 | Using union(PRIMARY,fkid); Using where             |

The subquery is materialized into a temporary table using index merge (indicated by the SUBQUERY row), and this materialized table is later used to lookup the tid values.

Semi-join is not used because of the STRAIGHT_JOIN hint. If we remove this hint and disable join buffering, we get this plan:

|  1 | SIMPLE       | l           | range       | PRIMARY       | PRIMARY      | 3       | NULL       |    2 | Using where                            |
|  1 | SIMPLE       | t           | eq_ref      | PRIMARY       | PRIMARY      | 4       | test.l.tid |    1 | NULL                                   |
|  1 | SIMPLE       | <subquery2> | eq_ref      | <auto_key>    | <auto_key>   | 4       | test.l.tid |    1 | NULL                                   |
|  2 | MATERIALIZED | e           | index_merge | PRIMARY,fkid  | PRIMARY,fkid | 4,4     | NULL       |    3 | Using union(PRIMARY,fkid); Using where |

A semi-join plan is chosen, and the optimizer chooses to materialize the subquery once again (but now with a MATERIALIZED code).

What happens if we remove the force index hint too?

|  1 | SIMPLE      | l     | range  | PRIMARY       | PRIMARY | 3       | NULL       |    2 | Using where                |
|  1 | SIMPLE      | t     | eq_ref | PRIMARY       | PRIMARY | 4       | test.l.tid |    1 | NULL                       |
|  1 | SIMPLE      | e     | ALL    | PRIMARY,fkid  | NULL    | NULL    | NULL       |    4 | Using where; FirstMatch(t) |

Now, the optimizer chooses the FirstMatch strategy instead of materialization, which causes a table scan for each time the subquery is invoked. However, for larger row counts, I think that the optimizer will notice that materialization is less expensive than FirstMatch and choose that strategy instead. But no experimenting with varying row counts has been performed, so this has not been verified.

I close the bug report based on this analysis, please re-open if you still have problems.