MySQL Bugs: #4040: Extremely Slow "Where in (Select... )" Subqueries GROUP BY / HAVING related

Bug #4040	Extremely Slow "Where in (Select... )" Subqueries GROUP BY / HAVING related
Submitted:	7 Jun 2004 15:54	Modified:	8 Jun 2004 15:37
Reporter:	Andrew Spencer	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	4.1.1a-alpha-nt	OS:	Windows (Windows 2003 Server)
Assigned to:	Oleksandr Byelkin	CPU Architecture:	Any

Description:
Subqueries are unusably slow.  Identical query in MSSQL runs in .04 seconds.  In MySQl it takes 82 seconds on the same machine.
(Appears to have something to do with the GROUP BY / HAVING Statement).  

Query is simply using a Subquery to populate a WHERE statement and pull separate details from another table.  

*Performance improves (.06 seconds instead of 80 seconds) when HAVING statement in subquery removed.

*Performance also improves separately if INNER JOIN in in subquery removed (7.61 seconds instead of 80 seconds)

How to repeat:
Tables:
1:  "Patients"   (1200 records)  Single record per Patient
Key_ID     (primary Key: autoincrement)   
Center_ID  (int)

2:  "Evals"      (3600 records)  Multiple records per Patient
Eval_ID   (primary Key: autoincrement)
Location_id  (int)
Patient_ID   (int)

Query Syntax:

Select count(key_id) 
from patients
where key_id in 
(
SELECT DISTINCT Evals.Patient_ID
FROM Evals INNER JOIN
Patients ON Patients.Key_ID = Evals.Patient_ID
WHERE (Evals.Location_ID = 3) AND (Patients.Center_ID = 3) and (Patients.key_id < 820)
        GROUP BY Evals.Patient_ID
        HAVING      (COUNT(Evals.Patient_ID) > 0)
)

Thank you for taking the time to report a problem.  Unfortunately
you are not using a current version of the product your reported a
problem with -- the problem might already be fixed. Please download
a new version from http://www.mysql.com/downloads/

If you are able to reproduce the bug with one of the latest versions,
please change the version on this bug report to the version you
tested and change the status back to "Open".  Again, thank you for
your continued support of MySQL.

Additional info:

Subqueries are still under development.  Some of them are optimized in 4.1.2, but not all.

This query is slow because we rewrite IN via EXISTS like subquery and it 
become dependent (if you issue EXPLAIN EXTENDEND <your select> and SHOW 
WARNING then you will see internal structure of query after optimisation). 
 
You can make it a bit faster by removing DISTINCT. 
 
in 5.0/5.1 we plan make hash/temporary table optimization for such queries and 
it will become fast enough.

I experience this same exact problem on 5.0.2-alpha.

Anyone else coming across this problem there is a way around it by using a named subquery as below:

Select count(key_id)
from patients
where key_id in 
(
SELECT evsub.Patient_ID
FROM (
SELECT COUNT(Evals.Patient_ID) as cnt, Evals.Patient_ID
 FROM Evals INNER JOIN	Patients ON Patients.Key_ID = Evals.Patient_ID
WHERE 
	(Evals.Location_ID = 1) 
	AND (Patients.Center_ID = 1) 
	AND (Patients.key_id< 820)
        GROUP BY Evals.Patient_ID
) as evsub
where evsub.cnt>1
)