Bug #4040 Extremely Slow "Where in (Select... )" Subqueries GROUP BY / HAVING related
Submitted: 7 Jun 2004 15:54 Modified: 8 Jun 2004 15:37
Reporter: Andrew Spencer Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.1a-alpha-nt OS:Microsoft Windows (Windows 2003 Server)
Assigned to: Oleksandr Byelkin CPU Architecture:Any

[7 Jun 2004 15:54] Andrew Spencer
Subqueries are unusably slow.  Identical query in MSSQL runs in .04 seconds.  In MySQl it takes 82 seconds on the same machine.
(Appears to have something to do with the GROUP BY / HAVING Statement).  

Query is simply using a Subquery to populate a WHERE statement and pull separate details from another table.  

*Performance improves (.06 seconds instead of 80 seconds) when HAVING statement in subquery removed.

*Performance also improves separately if INNER JOIN in in subquery removed (7.61 seconds instead of 80 seconds)

How to repeat:
1:  "Patients"   (1200 records)  Single record per Patient
Key_ID     (primary Key: autoincrement)   
Center_ID  (int)

2:  "Evals"      (3600 records)  Multiple records per Patient
Eval_ID   (primary Key: autoincrement)
Location_id  (int)
Patient_ID   (int)

Query Syntax:

Select count(key_id) 
from patients
where key_id in 
Patients ON Patients.Key_ID = Evals.Patient_ID
WHERE (Evals.Location_ID = 3) AND (Patients.Center_ID = 3) and (Patients.key_id < 820)
        GROUP BY Evals.Patient_ID
        HAVING      (COUNT(Evals.Patient_ID) > 0)
[8 Jun 2004 11:55] Alexander Keremidarski
Thank you for taking the time to report a problem.  Unfortunately
you are not using a current version of the product your reported a
problem with -- the problem might already be fixed. Please download
a new version from http://www.mysql.com/downloads/

If you are able to reproduce the bug with one of the latest versions,
please change the version on this bug report to the version you
tested and change the status back to "Open".  Again, thank you for
your continued support of MySQL.

Additional info:

Subqueries are still under development.  Some of them are optimized in 4.1.2, but not all.
[8 Jun 2004 15:37] Oleksandr Byelkin
This query is slow because we rewrite IN via EXISTS like subquery and it 
become dependent (if you issue EXPLAIN EXTENDEND <your select> and SHOW 
WARNING then you will see internal structure of query after optimisation). 
You can make it a bit faster by removing DISTINCT. 
in 5.0/5.1 we plan make hash/temporary table optimization for such queries and 
it will become fast enough.
[3 Jun 2005 0:24] ypirc ypirc
I experience this same exact problem on 5.0.2-alpha.
[12 May 2008 14:23] Jeremy Pointer
Anyone else coming across this problem there is a way around it by using a named subquery as below:

Select count(key_id)
from patients
where key_id in 
SELECT evsub.Patient_ID
SELECT COUNT(Evals.Patient_ID) as cnt, Evals.Patient_ID
 FROM Evals INNER JOIN	Patients ON Patients.Key_ID = Evals.Patient_ID
	(Evals.Location_ID = 1) 
	AND (Patients.Center_ID = 1) 
	AND (Patients.key_id< 820)
        GROUP BY Evals.Patient_ID
) as evsub
where evsub.cnt>1