Bug #17952 nested query super slow..should not be
Submitted: 6 Mar 2006 5:22 Modified: 15 Oct 2012 13:48
Reporter: Xing Li Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:4.1.18 OS:Linux (Linux)
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: semijoin

[6 Mar 2006 5:22] Xing Li
Description:
select threadid from f_texts  where threadid in (select threadid from f_threads where parentid = 1)

This query takes over 1.5 seconds to take execute on an idle server with 7 raid0 scsi drives, 4gb of ram, amd64 cpu (32bit os), with no other process/query running in mysql server.

Below is details on why the query should not be this slow.

CREATE TABLE `f_texts` (
  `text` longblob NOT NULL,
  `threadid` int(10) unsigned NOT NULL default '0',
  `compressed` tinyint(3) unsigned NOT NULL default '0',
  PRIMARY KEY  (`threadid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE `f_threads` (
  `threadid` int(11) unsigned NOT NULL auto_increment,
  `threaddate` int(11) unsigned NOT NULL default '0',
  `parentid` int(11) unsigned NOT NULL default '0',
  `userid` int(10) unsigned NOT NULL default '0',
  `moddate` int(10) unsigned NOT NULL default '0',
  `moduserid` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`threadid`),
  KEY `forumid` (`parentid`,`threaddate`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

So let's run the query.

mysql> select threadid from f_texts  where threadid in (select threadid from f_threads where parentid = 1);
+----------+
| threadid |
+----------+
|        1 |
|        4 |
|        5 |
|        8 |
|       23 |
|       26 |
+----------+
6 rows in set (1.51 sec)

1.51 seconds. Wow that's slow.  Let's see how explain works.

mysql> explain select threadid from f_texts  where threadid in (select threadid from f_threads where parentid = 1);
+----+--------------------+-----------+-----------------+-----------------+---------+---------+------+--------+--------------------------+
| id | select_type        | table     | type            | possible_keys   | key     | key_len | ref  | rows   | Extra                    |
+----+--------------------+-----------+-----------------+-----------------+---------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | f_texts   | index           | NULL            | PRIMARY |       4 | NULL | 266288 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | f_threads | unique_subquery | PRIMARY,forumid | PRIMARY |       4 | func |      1 | Using index; Using where |
+----+--------------------+-----------+-----------------+-----------------+---------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)

Using 2 primary indices. Looks good. Or is it? Let's execute the subquery.

mysql> select threadid from f_threads where parentid = 1;
+----------+
| threadid |
+----------+
|        1 |
|        4 |
|        5 |
|        8 |
|       23 |
|       26 |
+----------+
6 rows in set (0.00 sec)

Very fast. Using primary key so it should be fast.

mysql> select threadid from f_texts  where threadid in (1,4,5,8,23,26);
+----------+
| threadid |
+----------+
|        1 |
|        4 |
|        5 |
|        8 |
|       23 |
|       26 |
+----------+
6 rows in set (0.00 sec)

Using primary key again so should be fast as expected.

Conclusion:

The nested query should not be this slow. The same problem exist for the same query when used in a "delete" instance. The delete query takes a long time in the "prepare" stage. 

How to repeat:
Use above table defs with a few correlated rows of dummy data.
[6 Mar 2006 7:31] Valeriy Kravchuk
Thank you for a detailed bug report. This is a well-known problem (see bug #12106 for just one of examples): subquery is considered to correlated (wrongly) and thus is execured for each and every row from the outer query. This will be eventually fixed (work in progress already).
[15 Oct 2012 13:48] Erlend Dahl
Fixed in 5.6.5