Bug #32665 Query with dependent subquery is too slow
Submitted: 23 Nov 2007 16:26 Modified: 23 Nov 2010 3:32
Reporter: adrian drumea Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.1.22-rc-Debian_1-log OS:Any
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: dependent subquery, IN, slow, subquery benchmark

[23 Nov 2007 16:26] adrian drumea
Description:
Executing a query with a dependent subquery (as identified by explain) is too slow even though the dependent subquery could be cached at the beginning of the execution. The IN operator is used over the result of the subquery. 

How to repeat:
We have three tables:

individual (
  idIndividual int primary key,
  ... extra cols ...
)

contact (
  idContact int primary key,
  contactType int,
  idObj int
)

contact_address (
  idAddress int primary key,
  idContact int,
  postalStripped varchar(100),
  ... extra cols ...
)

We have indexes only on the primary keys. We have a 1-1 relationship between individual and contact based on contact.idObj = individual.idIndividual and contact.contactType = 1. We have a 1-N relationship between contact and contact_address based on contact_address.idContact = contact.idContact. 

We have < 3000 rows in every table. 

select count(*) from individual: 1327
select count(*) from contact: 2170
select count(*) from contact_address: 2238

We then execute the query:

SELECT a.idIndividual FROM individual a 
WHERE a.idIndividual IN 
	(	SELECT c.idObj FROM contact_address cona 
		INNER JOIN contact c ON c.idContact=cona.idContact 
		WHERE cona.postalStripped='T2H3B2'
	) 

It returns exactly one row, with idIndividual = 216, which is correct. The problem is that it is too slow (around 4 seconds). 

We then execute the subquery separately:

SELECT c.idObj FROM contact_address cona 
		INNER JOIN contact c ON c.idContact=cona.idContact 
		WHERE cona.postalStripped='T2H3B2'

The result is instantaneous, and returns one row with col idObj = 216. So the subquery clearly does not depend on any row from individual and runs very fast.

We then try the query:

SELECT a.idIndividual FROM individual a 
WHERE a.idIndividual IN (216) 

Obviously, this runs very fast because of the primary key index and returns idIndividual = 216 (as expected).

So when the two are combined, the query is very slow although done in two stages manually (fetch list of ids and then execute the outer query) it is instantaneous.

We have found (by trial and error) that adding an index to contact.idObj gets rid of the problem. We signal this as a bug because there is no obvious reason for this. 

Here are the explains for each select:

For the full query:

1	PRIMARY	a	index		PRIMARY	4		1291	Using where; Using index
2	DEPENDENT SUBQUERY	cona	ALL	IDX_contact_address_2				2187	Using where
2	DEPENDENT SUBQUERY	c	eq_ref	PRIMARY	PRIMARY	4	hawk.cona.idContact	1	Using where

After the index is added:

1	PRIMARY	a	index		PRIMARY	4		1291	Using where; Using index
2	DEPENDENT SUBQUERY	c	ref	PRIMARY,IDX_contact_2	IDX_contact_2	5	func	1	Using where; Using index
2	DEPENDENT SUBQUERY	cona	ref	IDX_contact_address_2	IDX_contact_address_2	4	hawk.c.idContact	1	Using where

Suggested fix:
None.
[23 Nov 2007 16:47] Valeriy Kravchuk
Thank you for a problem report. This is a known optimizer's deficiency that will be fixed in MySQL 6.0.x.
[28 Dec 2008 0:05] Sergey Petrunya
Confirm, should be handled by new subquery optimizations in MySQL 6.0.  
The query is uncorrelated, so WL#3985 code may bring extra benefits.
[28 Dec 2008 15:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62387

2758 Sergey Petrunia	2008-12-28
      BUG#32665: Query with dependent subquery is too slow
      - Testcase (the code is subquery optimization WLs)
[11 Jan 2009 14:51] Bugs System
Pushed into 6.0.10-alpha (revid:sergefp@mysql.com-20090110105650-he9xy4221tbn5y70) (version source revid:sergefp@mysql.com-20090110105650-he9xy4221tbn5y70) (merge vers: 6.0.10-alpha) (pib:6)
[13 Jan 2009 1:51] Paul DuBois
Noted in 6.0.10 changelog.

Queries with dependent subqueries were slow.
[16 Aug 2010 6:37] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:24] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:32] Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.
[2 Dec 2011 10:30] Asad ahmed
I have MySQL Server 5.5 and my sub query is very very very slow where as the same query takes hardly 1 second in SQL server 2005.My Subquery is as follows:

select articlenumber from temp_tag
where articlenumber not in (
select distinct cont from temp_tag where heading is not null 
and heading <> '' and publishdate='2011-10-20' 
and type = 'story' and cont <> '0')

I am stuck :( please help...
[2 Dec 2011 12:42] Guilhem Bichot
To Ahmed:
"distinct" is useless in this query. Try one of those queries, which could be equivalent to yours:

- without distinct:
select articlenumber from temp_tag
where articlenumber not in (
select cont from temp_tag where heading is not null 
and heading <> '' and publishdate='2011-10-20' 
and type = 'story' and cont <> '0')

- without subquery:
select outer.articlenumber from temp_tag AS outer LEFT JOIN temp_tag AS inner
ON outer.articlenumber=inner.cont and inner.heading is not null 
and inner.heading <> '' and inner.publishdate='2011-10-20' 
and inner.type = 'story' and inner.cont <> '0'
WHERE inner.heading IS NULL
(search for "You can use this fact to find rows in a table that have no counterpart in another table" in
http://dev.mysql.com/doc/refman/5.6/en/join.html ).
[2 Dec 2011 13:26] Asad ahmed
@Guilhem  this is also taking same time.
[2 Dec 2011 14:01] Guilhem Bichot
To Ahmed: then I have no idea. Maybe tables are missing some index. You should look at EXPLAIN output. Also you can ask questions on forums: http://forums.mysql.com/