Bug #88663 Performance Regression using subqueries in 5.7
Submitted: 27 Nov 2017 10:27 Modified: 28 Nov 2017 22:07
Reporter: Mark El-Wakil Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:5.7.20 OS:Any
Assigned to: CPU Architecture:Any

[27 Nov 2017 10:27] Mark El-Wakil
Description:
It appears that there is a performance regression within the 5.7 branch when it comes to subqueries.

Using the documentation below on how to repeat this error. On 5.5, the explain result looks like this:

+----+--------------------+-------+-----------------+---------------+---------+---------+------+------+--------------------------+
| id | select_type        | table | type            | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+--------------------+-------+-----------------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY            | p     | index           | NULL          | PRIMARY | 4       | NULL |   10 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | s     | unique_subquery | PRIMARY       | PRIMARY | 4       | func |    1 | Using index              |
+----+--------------------+-------+-----------------+---------------+---------+---------+------+------+--------------------------+

On 5.7, it looks like this:

+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref        | rows | filtered | Extra                                        |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | s     | NULL       | index  | PRIMARY       | PRIMARY | 4       | NULL       |    1 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | p     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | dev02.s.Id |    1 |   100.00 | Using index                                  |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+----------------------------------------------+

How to repeat:
DROP TABLE IF EXISTS p;
DROP TABLE IF EXISTS s;

CREATE TABLE `p` (`Id` int(11),PRIMARY KEY (`Id`));
CREATE TABLE `s` (`Id` int(11),PRIMARY KEY (`Id`));

insert into s VALUES (1);
insert into p VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

explain SELECT p.Id FROM (p) WHERE p.Id IN (select s.Id from s) ORDER BY Id desc;

Suggested fix:
I'm looking to avoid temporary tables with this specific class of subqueries, similar to how they are not used within 5.5.

I would be happy with either:

- A fix to make this behave like 5.5.
- An option within the 5.7 configuration to make this behave like 5.5.

Worst case, if this is behavior that is going to stay, any sort of advice on how to avoid temporary tables with subqueries would be most helpful.
[28 Nov 2017 11:20] Øystein Grøvlen
It is not clear to me why you want to avoid temporary tables in your example.  The 5.7 plan will read 1 row from table s, do 1 look-up into table p, and store 1 row in a temporary table.  The 5.5 plan will read 10 rows from table p and do 10 look-ups into table s.  I am not sure the latter will be more efficient.

That is said, it is possible to turn off this particular plan by setting optimizer_switch='dupsweedout=off'.  However, it is possible that you will then get another semijoin strategy that uses a temporary table.  So in order to be sure to get the 5.5 plan, you should set optimizer_switch='semijoin=off,materialization=off'

Alternatively, you can use an optimizer hint to force the subquery strategy from 5.5:

SELECT p.Id FROM (p) WHERE p.Id IN 
  (select /*+ SUBQUERY(INTOEXISTS) */ s.Id from s) ORDER BY Id desc
[28 Nov 2017 22:07] Mark El-Wakil
I was trying to track down performance issues down to a behavior difference between 5.5 and 5.7. Looks like the issue I had on my end was unrelated to this. My bad.

I noticed that sometimes a temporary table is not created for subqueries like this. Is this related to the relative size of `p` and `s` respectively?

Thanks!