Bug #35042 Subquery using IN causes MySQL to hang
Submitted: 4 Mar 2008 16:57 Modified: 21 Apr 2008 21:25
Reporter: Shawn halayka Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:Server version: 5.0.45-community-log OS:Linux
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: IN, subquery

[4 Mar 2008 16:57] Shawn halayka
Description:
The following query causes the thread to steal all CPU, leaving other threads hung. The query must be killed in order for the server to return to normal.

Query:
select A_ID from A where B_ID in (select B_ID from C where D_ID = n);

Tables:
A, # Rows: 13907011
+----------------+--------------+------+-----+---------+----------------+
| Field          | Type         | Null | Key | Default | Extra          |
+----------------+--------------+------+-----+---------+----------------+
| A_ID           | int(11)      | NO   | PRI | NULL    | auto_increment |
| B_ID           | int(11)      | NO   | MUL | 0       |                |
| Foo1           | int(11)      | YES  | MUL | NULL    |                |
| Foo2           | varchar(255) | YES  |     | NULL    |                |
| Foo3           | int(11)      | YES  |     | NULL    |                |
+----------------+--------------+------+-----+---------+----------------+

B, # Rows: 941033
+-------------------+-----------+------+-----+-------------------+----------------+
| Field             | Type      | Null | Key | Default           | Extra          |
+-------------------+-----------+------+-----+-------------------+----------------+
| B_ID              | int(11)   | NO   | PRI | NULL              | auto_increment |
| Foo1              | int(11)   | YES  |     | NULL              |                |
| Foo2              | int(11)   | YES  |     | NULL              |                |
| Foo3              | int(11)   | YES  |     | 1                 |                |
| Foo4              | timestamp | NO   |     | CURRENT_TIMESTAMP |                |
+-------------------+-----------+------+-----+-------------------+----------------+

C, # Rows: 940245
+-------------+---------+------+-----+---------+-------+
| Field       | Type    | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+-------+
| B_ID        | int(11) | YES  | MUL | NULL    |       |
| D_ID        | int(11) | YES  | MUL | NULL    |       |
| Foo1        | int(11) | YES  |     | NULL    |       |
| Foo2        | int(8)  | YES  |     | 1       |       |
+-------------+---------+------+-----+---------+-------+

D, # Rows: 6504
+---------------------+--------------+------+-----+-------------------+----------------+
| Field               | Type         | Null | Key | Default           | Extra          |
+---------------------+--------------+------+-----+-------------------+----------------+
| D_ID                | int(11)      | NO   | PRI | NULL              | auto_increment |
| Foo1                | time         | YES  |     | NULL              |                |
| Foo2                | varchar(150) | YES  |     | NULL              |                |

...

| Foo25               | timestamp    | NO   |     | CURRENT_TIMESTAMP |                |
| Foo26               | int(8)       | YES  |     | 1                 |                |
+---------------------+--------------+------+-----+-------------------+----------------+

How to repeat:
Run query.

Suggested fix:
Or, perform subquery as standalone query, then iterate through results programmatically using Perl/C++/etc.
[20 Apr 2008 5:46] Sergey Petrunya
Changing category s/DML/Optimizer
[20 Apr 2008 5:53] Sergey Petrunya
Shawn, yes, the subquery is re-evaluated every time because it is made correlated by IN->EXISTS transformation (for details see
http://en.oreilly.com/mysql2008/public/asset/attachment/2128, slide #5 and around).
[20 Apr 2008 5:55] Sergey Petrunya
The provided case should be handled by 6.0's new subquery optimizations.
[20 Apr 2008 5:56] Sergey Petrunya
Shawn, would it be possible for you to spare us the dataset so we can make sure it will work fast in MySQL 6.0?
[21 Apr 2008 20:21] Shawn halayka
I apologize for the late delay. Unfortunately the data set cannot be provided.
[21 Apr 2008 21:23] Sergey Petrunya
Shawn, thanks for your response. 

Considering that
* looking at query and table definitions, the issue seem to be fixed in 6.0 (and poor performance in preceding versions is expected) 
* there is no way to really check this without the dataset,

I'm changing status to can't repeat.

If you can't try it in 6.0, we'd appreciate to get the result (we expect that the problem will disappear).  If there are still performance issues, feel free to re-open this bug or submit another.
[21 Apr 2008 21:25] Shawn halayka
Thanks Sergey, I really appreciate the hard work and time of the MySQL team. I will keep this in mind when we eventually upgrade.

Thanks again.