Bug #15135 Dependent subquery performs very badly
Submitted: 22 Nov 2005 14:28 Modified: 21 Mar 2009 17:02
Reporter: Dominic Binks Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:4.1.14 OS:Any (NT)
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: subquery benchmark

[22 Nov 2005 14:28] Dominic Binks
Description:
On a table with about 10000 rows this query gobbles most of the PCs resources:

SELECT iq.* FROM iq
WHERE iq.question_id IN 
(SELECT question_id FROM iq 
 WHERE in_msg_id = 0
 GROUP BY question_id
 HAVING COUNT(question_id) > 1)

I managed to escape out of the query browser after 189 seconds (in my case it return 49 rows).

This query which looks similar (but is handled differently):

SELECT iq.* FROM
(SELECT question_id FROM iq
 WHERE in_msg_id = 0
 GROUP BY question_id
 HAVING COUNT(question_id) > 1) AS _iq
WHERE iq.question_id = _iq.question_id

Executes in significantly under a second with the machine barely noticing.  There is no index on in_msg_id in either case.

Given the small number of rows in the table, I don't see why the performance of the first is so appalling.

How to repeat:
Create a table with two columns, question_id is indexed (but allows duplicates).  in_msg_id is not indexed, although all legal values are >0.  0 represents an not currently determined value.  (I'm wanting to reset the 0 values to make correct assignments).  Fill with 10000 values.  Obviously there must be some 0 values for in_msg_id.  Then run the two queries.

Suggested fix:
Change the handling of the dependent subquery to build a derived table, as in the other query.
[21 Mar 2009 17:02] Sergey Petrunya
Hi! 

MySQL 6.0 has reworked subquery optimization subsystem, in particular, the provided WHERE .. IN query will be handled by Subquery Materialization which will do roughly the same as what is done for the FROM-clause subquery.  

Based on that, marking the bug as closed in 6.0. 

Feel free to re-open if you don't observe the speedup (please upload a testcase if re-opening).