Bug #9090 Optimization Problem with a subquery in an IN Operation
Submitted: 10 Mar 2005 10:30 Modified: 4 Aug 2010 5:24
Reporter: Andre Justus Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:4.1.x, 5.x OS:Any
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: subquery benchmark

[10 Mar 2005 10:30] Andre Justus
Description:
The Bug I have found, is that a query is quite slower than it schould be.
It occurs in 'bigger tables'
I have tested is with smaller ones, there everything is ok.

with bigger tables, I mean about 50000 rows

the following example is taken from a production system.
I abortet the query after about 40 seconds. the 'werber' collum used in the Group BY has no index, the id colums has indexes.

SELECT *
FROM table_2
LEFT JOIN table_1 ON table_2.id = table_1.id
WHERE table_2.id IN (SELECT werber
FROM `table_3`
GROUP BY werber
HAVING count( 1 ) >20
)

WHEN I Do the querys one by one, everything is ok

SELECT werber
FROM `table_3`
GROUP BY werber
HAVING count( 1 ) >20

takes under 1 second of time and results in 40 integer rows

taking this values and perform the following query

SELECT *
FROM table_2
LEFT JOIN table_1 ON table_2.id = table_1.id
WHERE table_2.id IN (int1,int2,...,int40)
also take under 1 second of time.

So I think, the combined query sould not use more time than query1 AND query2

Additional Information:

EXPLAIN with the combined query results in

1  	PRIMARY  	table_2  	ALL  	NULL  	NULL  	NULL  	NULL  	40577  	Using where
1 	PRIMARY 	table_1 	eq_ref 	PRIMARY 	PRIMARY 	3 	arena.table_2.id 	1 	 
2 	DEPENDENT SUBQUERY 	table_3 	ALL 	NULL 	NULL 	NULL 	NULL 	65630 	Using temporary; Using filesort

table_1 is an 2 col table with id as an primary
table_2 is an 40 col table with id as an primary
table_3 is an 40 col table (same structure AS table_2) with no relevant indexes

I hope my examples are understandable

How to repeat:
look at the descripion
[13 Mar 2005 20:18] Aleksey Kishkin
I wouldn't call this 'bug' but rather 'feature request'. Could you please change severity to 'feature request'?
[17 Mar 2005 7:29] Jorge del Conde
Hi!

Currently this is the expected behaviour as subqueries are in early stages.  This is not a bug and this behaviour will be changed in future versions of mysql.
[17 Mar 2005 7:29] Jorge del Conde
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php
[19 Apr 2005 20:00] [ name withheld ]
This is indeed a bug.  There appears no way to cause a subquery to be optimized as anything other than a dependent subquery.  So instead of in-to-out which is needed for IN (SELECT blah) or <> ANY (SELECT blah) and I'm sure more you get an out-to-in (correlated subquery) which is algorithmically so ineffecient as to be useless.  In otherwards you get exactly the opposite behavior as any database with proper subquery support.  I don't understand why this bug is marked non-critical, it means I absolutely cannot use subqueries and have to move to a new database or do my subquery in the application layer.
[4 Nov 2005 16:33] Lisbeth Kellogg
This is an extremely bad design on the developers' part.  Because of it, I have built several queries that should work but are totally useless.  A main query and what should be an independant subquery, each returning in less that a second, but put together don't return after several minutes?  And no, I can't rewrite it as a join because it needs to be the basis of an update statement.
[4 Nov 2005 16:49] MySQL Verification Team
I am changing this to "To be fixed later" as we have plans to fix this in 5.2 or 6.0.
[11 Jan 2006 18:41] Olaf van der Spek
> I am changing this to "To be fixed later" as we have plans to fix this in 5.2 or
6.0.

Why won't this be fixed for 5.0?
Subqueries are *useless* until this is fixed.
When can a stable version be expected in which this has been fixed?
[7 May 2006 21:20] James Dennis
I have to agree with the other comments in this bug. Even if there is a plan to fix this in the future (rather than ASAP) it is still a serious bug. This feature makes sub-queries unusable in many situations.

We will have to switch DBMS because we have extensively used sub-queries in our query-building abstraction layer. There should be clearer warnings that sub-queries are very slow/unusable in certain situations.
[21 May 2006 20:47] Jakub Horky
Take a look to

http://dev.mysql.com/doc/refman/4.1/en/subquery-restrictions.html

In third point there, it is described why we're getting stuck.

My opinion is that MySQL simply uses inadequate optimization method for uncorrelated subqueries. Yes, optimization through EXISTS is better than doing SELECT & IN for each and every row of outer query, but it can't even compare to doing only _one_ SELECT and using its results to evaluate all rows of outer query.

Whatever this property is - maybe bug, maybe feature - for large databases, this is very important property and affects MySQL overall quality very much.
[4 Jun 2007 19:51] David Andersen
Hi,

This bug really causes problems for my application, is there any news when this will be solved? Does any alpha code exist that can be tested?

Regards,

David
[4 Jun 2007 19:56] David Andersen
Possible workaround?

Look at the post at: http://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-joins-in-mysql/#commen...
[2 Aug 2007 15:36] Shiri Dori
I agree with what's been said above - this is not simply a "feature" request but a real bug, as queries that should be working in a snap are taking impossibly long.

Thanks,
-Shiri
[11 Apr 2008 5:27] Martin Ross
This ruins Alfresco running for large databases under MySQL.  In fact many hibernate applications will be affected by this.
[17 Apr 2008 0:17] Paulo Cesar
This is indeed something to take in consideration. The workarounds for avoiding subqueries with speed in mind could be avoided with a better subquery + in optimization. 3 years already, and it was left to be fixed later, when?
[28 Jun 2008 21:16] Jeffrey Mattox
Count me as yet another victim of this bug.  However, I found a simple workaround in a comment to the xparb.com link (above).  I want to document it here in case that page goes away.  This worked for me: wrap the subquery in another subquery, thus:

  SELECT x FROM y WHERE z IN ( SELECT z FROM ( subquery_here ) AS Q )
[31 Dec 2008 13:55] Olaf van der Spek
> -Version:              4.1.x
> +Version:              4.1.x, 5.x

Does that mean it's fixed in 6?
[15 Jan 2009 19:50] Sean Hammons
It's depressing that MySQL doesn't take this bug more seriously. If you are working with large data sets, it almost 100% ruins any advantages that a sub-query could give you if it was optimized properly. This bug was filed almost 4 years ago. Are you freaking kidding me guys? FIX IT!
[10 Mar 2009 18:35] qxzvd zqycp
Happy 4th birthday, Bug #9090!  Can we look forward to you never, ever being fixed?  Should we stop looking forward to your death?
  With love,
   -Everyone Who Would Like to Use Subqueries but Can't Because Subqueries in MySQL Are Broken
[21 Mar 2009 17:46] Sergey Petrunya
Hi!

In MySQL 6.0, we've significantly re-worked subquery optimizations (see http://forge.mysql.com/wiki/6.0_Subquery_Optimization_Cheatsheet for details). Judging from the query, it should be now handled efficiently.  

Could you please try with the latest 6.0 (if the subquery has HAVING - 6.0.9 is ok, for subqueries w/o HAVING - preferably latest bzr source)?  Alternatively, we would appreciate getting testcases -we're in the process of evaluating the impact of new optimizations (see  http://forge.mysql.com/wiki/6.0_Subquery_Optimization_Benchmarks for what we got on the previous run).  If you could upload a dataset + subquery subquery, we will include it in the benchmark which will ensure that subqueries of that kind are handled in 6.0.
[21 Mar 2009 17:51] Sergey Petrunya
Marking as closed in 6.0. 

If you have testcases, feel free to re-open and file them here (no matter if you tried them on 6.0 or not)
[21 Mar 2009 18:59] Sergey Petrunya
Set the proper bug status
[21 Apr 2009 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[3 Aug 2010 20:35] John Swapceinski
For posterity's sake, these earlier bug reports reported the same issue:
Bug #8139
Bug #9021

I think it's important to keep track of this because it shows how the arrogance of management can cause an important bug to be ignored for many years.  I don't want to name names (Jorge del Conde), but saying "it's not a bug, it's a feature" for years and years just doesn't cut it.
[3 Aug 2010 22:32] Jakub Horky
It appears that it's fixed already. See bug #9021.

But documentation is not up-to-date:

http://dev.mysql.com/doc/refman/4.1/en/subquery-restrictions.html

(see third point)
[4 Aug 2010 5:24] Sveta Smirnova
Thank you for the feedback.

Set back to "Closed". 6.0 features have been backported to 5.5 series.
[4 Aug 2010 9:40] Jakub Horky
Of course meant

http://dev.mysql.com/doc/refman/5.5/en/subquery-restrictions.html
[4 Aug 2010 12:47] Sveta Smirnova
Jakub,

thank you for the feedback. I opened documentation bug #55743 about this problem.