Bug #9090 Optimization Problem with a subquery in an IN Operation
Submitted: 10 Mar 2005 11:30 Modified: 21 Apr 2009 20:58
Reporter: Andre Justus
Status: No Feedback
Category:Server: Optimizer Severity:S3 (Non-critical)
Version:4.1.x, 5.x OS:Any
Assigned to: Bugs System Target Version:
Tags: subquery benchmark
Triage: D3 (Medium)

[10 Mar 2005 11:30] Andre Justus
Description:
The Bug I have found, is that a query is quite slower than it schould be.
It occurs in 'bigger tables'
I have tested is with smaller ones, there everything is ok.

with bigger tables, I mean about 50000 rows

the following example is taken from a production system.
I abortet the query after about 40 seconds. the 'werber' collum used in the Group BY has
no index, the id colums has indexes.

SELECT *
FROM table_2
LEFT JOIN table_1 ON table_2.id = table_1.id
WHERE table_2.id IN (SELECT werber
FROM `table_3`
GROUP BY werber
HAVING count( 1 ) >20
)

WHEN I Do the querys one by one, everything is ok

SELECT werber
FROM `table_3`
GROUP BY werber
HAVING count( 1 ) >20

takes under 1 second of time and results in 40 integer rows

taking this values and perform the following query

SELECT *
FROM table_2
LEFT JOIN table_1 ON table_2.id = table_1.id
WHERE table_2.id IN (int1,int2,...,int40)
also take under 1 second of time.

So I think, the combined query sould not use more time than query1 AND query2

Additional Information:

EXPLAIN with the combined query results in

1  	PRIMARY  	table_2  	ALL  	NULL  	NULL  	NULL  	NULL  	40577  	Using where
1 	PRIMARY 	table_1 	eq_ref 	PRIMARY 	PRIMARY 	3 	arena.table_2.id 	1 	 
2 	DEPENDENT SUBQUERY 	table_3 	ALL 	NULL 	NULL 	NULL 	NULL 	65630 	Using temporary;
Using filesort

table_1 is an 2 col table with id as an primary
table_2 is an 40 col table with id as an primary
table_3 is an 40 col table (same structure AS table_2) with no relevant indexes

I hope my examples are understandable

How to repeat:
look at the descripion
[13 Mar 2005 21:18] Aleksey Kishkin
I wouldn't call this 'bug' but rather 'feature request'. Could you please change severity
to 'feature request'?
[17 Mar 2005 8:29] Jorge del Conde
Hi!

Currently this is the expected behaviour as subqueries are in early stages.  This is not
a bug and this behaviour will be changed in future versions of mysql.
[17 Mar 2005 8:29] Jorge del Conde
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php
[19 Apr 2005 22:00] [ name withheld ]
This is indeed a bug.  There appears no way to cause a subquery to be optimized as
anything other than a dependent subquery.  So instead of in-to-out which is needed for IN
(SELECT blah) or <> ANY (SELECT blah) and I'm sure more you get an out-to-in (correlated
subquery) which is algorithmically so ineffecient as to be useless.  In otherwards you
get exactly the opposite behavior as any database with proper subquery support.  I don't
understand why this bug is marked non-critical, it means I absolutely cannot use
subqueries and have to move to a new database or do my subquery in the application layer.
[4 Nov 2005 17:33] Lisbeth Kellogg
This is an extremely bad design on the developers' part.  Because of it, I have built
several queries that should work but are totally useless.  A main query and what should
be an independant subquery, each returning in less that a second, but put together don't
return after several minutes?  And no, I can't rewrite it as a join because it needs to
be the basis of an update statement.
[4 Nov 2005 17:49] Sinisa Milivojevic
I am changing this to "To be fixed later" as we have plans to fix this in 5.2 or 6.0.
[11 Jan 2006 19:41] Olaf van der Spek
> I am changing this to "To be fixed later" as we have plans to fix this in 5.2 or
6.0.

Why won't this be fixed for 5.0?
Subqueries are *useless* until this is fixed.
When can a stable version be expected in which this has been fixed?
[7 May 2006 23:20] James Dennis
I have to agree with the other comments in this bug. Even if there is a plan to fix this
in the future (rather than ASAP) it is still a serious bug. This feature makes
sub-queries unusable in many situations.

We will have to switch DBMS because we have extensively used sub-queries in our
query-building abstraction layer. There should be clearer warnings that sub-queries are
very slow/unusable in certain situations.
[21 May 2006 22:47] Jakub Horky
Take a look to

http://dev.mysql.com/doc/refman/4.1/en/subquery-restrictions.html

In third point there, it is described why we're getting stuck.

My opinion is that MySQL simply uses inadequate optimization method for uncorrelated
subqueries. Yes, optimization through EXISTS is better than doing SELECT & IN for each
and every row of outer query, but it can't even compare to doing only _one_ SELECT and
using its results to evaluate all rows of outer query.

Whatever this property is - maybe bug, maybe feature - for large databases, this is very
important property and affects MySQL overall quality very much.
[4 Jun 2007 21:51] David Andersen
Hi,

This bug really causes problems for my application, is there any news when this will be
solved? Does any alpha code exist that can be tested?

Regards,

David
[4 Jun 2007 21:56] David Andersen
Possible workaround?

Look at the post at:
http://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-joins-in-mysql/#commen...
[2 Aug 2007 17:36] Shiri Dori
I agree with what's been said above - this is not simply a "feature" request but a real
bug, as queries that should be working in a snap are taking impossibly long.

Thanks,
-Shiri
[11 Apr 2008 7:27] Martin Ross
This ruins Alfresco running for large databases under MySQL.  In fact many hibernate
applications will be affected by this.
[17 Apr 2008 2:17] Paulo Cesar
This is indeed something to take in consideration. The workarounds for avoiding subqueries
with speed in mind could be avoided with a better subquery + in optimization. 3 years
already, and it was left to be fixed later, when?
[28 Jun 2008 23:16] Jeffrey Mattox
Count me as yet another victim of this bug.  However, I found a simple workaround in a
comment to the xparb.com link (above).  I want to document it here in case that page goes
away.  This worked for me: wrap the subquery in another subquery, thus:

  SELECT x FROM y WHERE z IN ( SELECT z FROM ( subquery_here ) AS Q )
[31 Dec 2008 14:55] Olaf van der Spek
> -Version:              4.1.x
> +Version:              4.1.x, 5.x

Does that mean it's fixed in 6?
[15 Jan 2009 20:50] Sean Hammons
It's depressing that MySQL doesn't take this bug more seriously. If you are working with
large data sets, it almost 100% ruins any advantages that a sub-query could give you if
it was optimized properly. This bug was filed almost 4 years ago. Are you freaking
kidding me guys? FIX IT!
[10 Mar 2009 19:35] qxzvd zqycp
Happy 4th birthday, Bug #9090!  Can we look forward to you never, ever being fixed? 
Should we stop looking forward to your death?
  With love,
   -Everyone Who Would Like to Use Subqueries but Can't Because Subqueries in MySQL Are
Broken
[21 Mar 2009 18:46] Sergey Petrunya
Hi!

In MySQL 6.0, we've significantly re-worked subquery optimizations (see
http://forge.mysql.com/wiki/6.0_Subquery_Optimization_Cheatsheet for details). Judging
from the query, it should be now handled efficiently.  

Could you please try with the latest 6.0 (if the subquery has HAVING - 6.0.9 is ok, for
subqueries w/o HAVING - preferably latest bzr source)?  Alternatively, we would
appreciate getting testcases -we're in the process of evaluating the impact of new
optimizations (see  http://forge.mysql.com/wiki/6.0_Subquery_Optimization_Benchmarks for
what we got on the previous run).  If you could upload a dataset + subquery subquery, we
will include it in the benchmark which will ensure that subqueries of that kind are
handled in 6.0.
[21 Mar 2009 18:51] Sergey Petrunya
Marking as closed in 6.0. 

If you have testcases, feel free to re-open and file them here (no matter if you tried
them on 6.0 or not)
[21 Mar 2009 19:59] Sergey Petrunya
Set the proper bug status
[22 Apr 2009 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".