Bug #32665 Query with dependent subquery is too slow
Submitted: 23 Nov 2007 17:26 Modified: 13 Jan 2:51
Reporter: adrian drumea
Status: Closed
Category:Server: Optimizer Severity:S3 (Non-critical)
Version:5.1.22-rc-Debian_1-log OS:Any
Assigned to: Bugs System Target Version:6.0-beta
Tags: subquery benchmark, slow, dependent subquery, IN
Triage: Triaged: D3 (Medium)

[23 Nov 2007 17:26] adrian drumea
Description:
Executing a query with a dependent subquery (as identified by explain) is too slow even
though the dependent subquery could be cached at the beginning of the execution. The IN
operator is used over the result of the subquery. 

How to repeat:
We have three tables:

individual (
  idIndividual int primary key,
  ... extra cols ...
)

contact (
  idContact int primary key,
  contactType int,
  idObj int
)

contact_address (
  idAddress int primary key,
  idContact int,
  postalStripped varchar(100),
  ... extra cols ...
)

We have indexes only on the primary keys. We have a 1-1 relationship between individual
and contact based on contact.idObj = individual.idIndividual and contact.contactType = 1.
We have a 1-N relationship between contact and contact_address based on
contact_address.idContact = contact.idContact. 

We have < 3000 rows in every table. 

select count(*) from individual: 1327
select count(*) from contact: 2170
select count(*) from contact_address: 2238

We then execute the query:

SELECT a.idIndividual FROM individual a 
WHERE a.idIndividual IN 
	(	SELECT c.idObj FROM contact_address cona 
		INNER JOIN contact c ON c.idContact=cona.idContact 
		WHERE cona.postalStripped='T2H3B2'
	) 

It returns exactly one row, with idIndividual = 216, which is correct. The problem is
that it is too slow (around 4 seconds). 

We then execute the subquery separately:

SELECT c.idObj FROM contact_address cona 
		INNER JOIN contact c ON c.idContact=cona.idContact 
		WHERE cona.postalStripped='T2H3B2'

The result is instantaneous, and returns one row with col idObj = 216. So the subquery
clearly does not depend on any row from individual and runs very fast.

We then try the query:

SELECT a.idIndividual FROM individual a 
WHERE a.idIndividual IN (216) 

Obviously, this runs very fast because of the primary key index and returns idIndividual
= 216 (as expected).

So when the two are combined, the query is very slow although done in two stages manually
(fetch list of ids and then execute the outer query) it is instantaneous.

We have found (by trial and error) that adding an index to contact.idObj gets rid of the
problem. We signal this as a bug because there is no obvious reason for this. 

Here are the explains for each select:

For the full query:

1	PRIMARY	a	index		PRIMARY	4		1291	Using where; Using index
2	DEPENDENT SUBQUERY	cona	ALL	IDX_contact_address_2				2187	Using where
2	DEPENDENT SUBQUERY	c	eq_ref	PRIMARY	PRIMARY	4	hawk.cona.idContact	1	Using where

After the index is added:

1	PRIMARY	a	index		PRIMARY	4		1291	Using where; Using index
2	DEPENDENT SUBQUERY	c	ref	PRIMARY,IDX_contact_2	IDX_contact_2	5	func	1	Using where;
Using index
2	DEPENDENT
SUBQUERY	cona	ref	IDX_contact_address_2	IDX_contact_address_2	4	hawk.c.idContact	1	Using
where

Suggested fix:
None.
[23 Nov 2007 17:47] Valeriy Kravchuk
Thank you for a problem report. This is a known optimizer's deficiency that will be fixed
in MySQL 6.0.x.
[28 Dec 2008 1:05] Sergey Petrunya
Confirm, should be handled by new subquery optimizations in MySQL 6.0.  
The query is uncorrelated, so WL#3985 code may bring extra benefits.
[28 Dec 2008 16:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62387

2758 Sergey Petrunia	2008-12-28
      BUG#32665: Query with dependent subquery is too slow
      - Testcase (the code is subquery optimization WLs)
[11 Jan 15:51] Bugs System
Pushed into 6.0.10-alpha (revid:sergefp@mysql.com-20090110105650-he9xy4221tbn5y70)
(version source revid:sergefp@mysql.com-20090110105650-he9xy4221tbn5y70) (merge vers:
6.0.10-alpha) (pib:6)
[13 Jan 2:51] Paul DuBois
Noted in 6.0.10 changelog.

Queries with dependent subqueries were slow.