Bug #4460 Query having subqueries (having group by clause) take too much of time
Submitted: 8 Jul 2004 13:41 Modified: 14 Jan 2020 20:31
Reporter: NOT_FOUND NOT_FOUND Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:4.1.3 Beta OS:Windows (Win2k/linux/AIX)
Assigned to: CPU Architecture:Any
Tags: performance, subquery

[8 Jul 2004 13:41] NOT_FOUND NOT_FOUND
Description:
Hi

We have our product on AIX that uses DB2 as database server. At present we want to migrate the product to windows environment with different database software. Obviously MYSQL was the choice to make.

We wanted mysql to support subqueries. And finally it is now available in 4.1.3beta.

I am evaluating the database server using few complex queries from our product.

Myquery looks like
===========================================
select T1.c1,sum(T2.c2),T3.c3, T3.c4,sum(T2.c3), 
(sum(T2.c3)-sum(T2.c2)),T1.c2, T1.c3, T1.c4, 
T1.c5,T1.c6, 'std' , T1.c7  
from T1, T2 LEFT JOIN T3 on T3.c1=T2.c1 and T3.c5=T2.c4 
where T2.c4 = '2002-03-28'   and ((T2.c3 >0 or T2.c2 > 0) or (T2.c3 = 0 and T2.c2 = 0 
and c7 = T1.c1)) 
and T1.c1 = T2.c1 
and (c5 in (select max(c5) from T1 , T2 where T1.c1 = T2.c1 and T2.c4 = '2002-03-28' group by T1.c1))
and T1.c3 not in ('R1','R2','AA', 'CO', 'AR', 'R3', 'R4', 'PA','SP', 'IN', 'ID', 'WH', 'CN', 'CM') 
group by T1.c1, T1.c2, T1.c3, T1.c4, T1.c5,T1.c6, 'std' , 
T1.c7 ,c3, c4 order by T1.c7 DESC,T1.c1
===========================================

Above query is not much complex. When I ran same query on DB2 it takes 7-10 seconds to fetch result. But in case of MYSQL 4.1.3beta, it takes more than 3 hours.

The bottleneck is at the 

=================================================
and (c5 in (select max(c5) from T1 , T2 where T1.c1 = T2.c1 and T2.c4 = '2002-03-28' group by T1.c1))
=================================================
statement in the "whereclause".

To make this query fast, I tested 2 scenarios as mentioned below.

1. 
Run SUBQUERY in question separately and paste the output of it in the main query. i.e. I replaced the subquery with outputof the subquery.

The observation was that the MYSQL performance was at par with DB2. It fetched data in 7-10 seconds.

2.
Replace the subquery with the DERIVED table in the main query.
================================================
...........from T1, (select max(c5) from T1 , T2 where T1.c1 = T2.c1 and T2.c4 = '2002-03-28' group by T1.c1)) T4, T2 LEFT JOIN T3 ..........
================================================

The observation was same as that of point 1. 

Based on these 2 scenarios, it can be easily concluded that we need to change queries if we want to continue 4.1.3xxxx.

But this suggestion will be turned down without any discussion as it would impose question the integrity of the product and , correctness of results. It will certainly increase span of the development & QA cycle, which we want to avoid for some inevitable reasons.

There are many such queries in our product. Can mysql guys come to our help and provide solution/fix to this problem?

How to repeat:
Follow the instructions mentioned in the above section. Or feel free to contact me. my email id is

pgirgaonkar@ideas.com
[8 Jul 2004 13:42] NOT_FOUND NOT_FOUND
Please read the main comment.
[14 Jul 2004 4:28] MySQL Verification Team
Could you please provide a test case. The zip file you can
upload at:

ftp://support.mysql.com:/pub/mysql/upload

making reference to this bug report.

Thanks
[14 Jul 2004 11:20] NOT_FOUND NOT_FOUND
I have copied the required information at ftp://support.mysql.com/pub/mysql/upload/.

Also copied is the text document which will provide information about the test-system.
[14 Jul 2004 16:51] MySQL Verification Team
Thank you for the test case.
[3 Aug 2004 5:28] NOT_FOUND NOT_FOUND
Any update on this....?
[3 Aug 2004 5:53] MySQL Verification Team
Yes, sorry I was on vacation last week and after the preliminar test I will ask
for my colleagues help regarding the analyze of your query.

Sorry for the inconvenience.
[20 Aug 2004 17:42] Oleksandr Byelkin
Thank you for bugreport! 
We plan to make optimisation for such queries (value IN (<non-correlated select which 
return small amount of rows)).
[31 Aug 2004 13:33] NOT_FOUND NOT_FOUND
What is the timeline to fix this?
[1 Dec 2004 11:51] NOT_FOUND NOT_FOUND
Any update on this?
[3 Dec 2004 16:55] Sergei Golubchik
no.
we have not implemented a special optimizations for these subqueries yet.
[14 Jan 2020 20:31] Roy Lyseng
Posted by developer:
 
This feature was implemented in 5.6 with subquery materialization.