Bug #34364 LEFT JOIN with subquery takes very long time
Submitted: 6 Feb 2008 21:17 Modified: 30 Jul 2012 17:14
Reporter: Ratan Mohla Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:5.1.22 OS:Solaris (SunOS clapton.nms.fnc.fujitsu.com 5.10 Generic_118833-36 sun4u sparc SUNW,Ultra-4)
Assigned to: CPU Architecture:Any
Triage: Triaged: D2 (Serious) / R4 (High) / E4 (High)

[6 Feb 2008 21:17] Ratan Mohla
Description:

Below query that involves 2 way left join and a subquery takes a very long time to complete. When 10 way left join is used, the query seems to hang and does not complete after many hours. Same query in oracle finishes within minutes.

Query performance involving LEFT JOIN with subquery is not acceptable.

SELECT objectmeta.name AS objname 
       , IFNULL(NV1.value,'NULL')  AS "projectName"
       , objectmeta.objId , IFNULL(NV2.value,'NULL')  AS "NEType" 
       , objectmeta.objectState
       , objectmeta.lastModifiedBy 
       , objectmeta.lastModificationTime , objectmeta.description
       , objectmeta.creationTime , objectmeta.objectType 
       , objectmeta.createdBy
FROM objectmeta
LEFT JOIN
(select  oid,name,value from objectmeta_userattributes where name = 'projectName') NV1
ON objectmeta.oid=NV1.oid
LEFT JOIN (select  oid,name,value from objectmeta_userattributes where name = 'NEType') NV2
ON objectmeta.oid=NV2.oid
WHERE objectmeta.oid IN (select versionmeta_objectmeta from versionmeta
                         where versionmeta.createdby = 'netsmart' and versionState = '1')
ORDER by 7 desc;

====
When above query is replaced by the below one (that replaces subquery with ON clause), the query completes within few minutes for 2 way join with 700000 records. Even though the results of the two queries are exactly the same always, the above query takes very very long time to complete than it should be.
Using 10 way join is even much worse.

SELECT objectmeta.name AS objname , IFNULL(NV1.value,'NULL')  AS "projectName"
       , objectmeta.objId , IFNULL(NV2.value,'NULL')  AS "NEType" , objectmeta.objectState
       , objectmeta.lastModifiedBy , objectmeta.lastModificationTime , objectmeta.description
       , objectmeta.creationTime , objectmeta.objectType , objectmeta.createdBy
FROM objectmeta
LEFT JOIN objectmeta_userattributes AS NV1 ON objectmeta.oid=NV1.oid and NV1.name = 'projectName'
LEFT JOIN objectmeta_userattributes AS NV2 ON objectmeta.oid=NV2.oid and NV2.name = 'NEType'
,
versionmeta
WHERE objectmeta.oid = versionmeta.versionmeta_objectmeta and versionmeta.createdby = 'netsmart'
                                                          and versionmeta.versionState = '1'
ORDER by 7 desc;
=====
explain output:
mysql> explain SELECT objectmeta.name AS objname , IFNULL(NV1.value,'NULL')  AS "projectName"
ctState
    ->        , objectmeta.objId , IFNULL(NV2.value,'NULL')  AS "NEType" , objectmeta.objectState
reated    ->        , objectmeta.lastModifiedBy , objectmeta.lastModificationTime , objectmeta.description
    ->        , objectmeta.creationTime , objectmeta.objectType , objectmeta.createdBy
eatedb    -> FROM objectmeta
          -> LEFT JOIN objectmeta_userattributes AS NV1 ON objectmeta.oid=NV1.oid and NV1.name = 'projectName'
    -> LEFT JOIN objectmeta_userattributes AS NV2 ON objectmeta.oid=NV2.oid and NV2.name = 'NEType'
    -> ,
    -> versionmeta
    -> WHERE objectmeta.oid = versionmeta.versionmeta_objectmeta and versionmeta.createdby = 'netsmart'
    ->                                                           and versionmeta.versionState = '1'
    -> ORDER by 7 desc;
+----+-------------+-------------+--------+--------------------------------------------+-------------------------------+---------+---------------------------------------+--------+----------------------------------------------+
| id | select_type | table       | type   | possible_keys                              | key                           | key_len | ref                                   | rows   | Extra                                        |
+----+-------------+-------------+--------+--------------------------------------------+-------------------------------+---------+---------------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | versionmeta | ref    | sk_versionmeta_1,sk_versionmeta_2          | sk_versionmeta_1              | 3       | const                                 | 374456 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | objectmeta  | eq_ref | PRIMARY                                    | PRIMARY                       | 18      | gf.versionmeta.VersionMeta_ObjectMeta |      1 |                                              |
|  1 | SIMPLE      | NV1         | ref    | SK_ObjectMeta_userAttributes1,SK_oidname_1 | SK_ObjectMeta_userAttributes1 | 19      | gf.versionmeta.VersionMeta_ObjectMeta |      1 |                                              |
|  1 | SIMPLE      | NV2         | ref    | SK_ObjectMeta_userAttributes1,SK_oidname_1 | SK_ObjectMeta_userAttributes1 | 19      | gf.versionmeta.VersionMeta_ObjectMeta |      1 |                                              |
+----+-------------+-------------+--------+--------------------------------------------+-------------------------------+---------+---------------------------------------+--------+----------------------------------------------+
4 rows in set (0.00 sec)

=====

We have been using the query involving LEFT JOIN and subqueries many times in our product software code and are planning migrating from oracle to mysql. 

Replacing the subquery with ON clause will be a huge effort on our part to change the code at several places involving a lot of testing, coding and performance test issues. So we ask you to please fix this problem in a future version of MySQL 5.1. 

I understand there have been some subquery optimizations being performed in 
6.0 but we need at least the above optimization in 5.1 in a couple of months from now.
====

How to repeat:

See above Description and just run the query.

Suggested fix:

Optimize subquery execution when LEFT join is done with a subquery output
[7 Feb 2008 4:35] Valeriy Kravchuk
Thank you for a problem report. Please, send a complete test case (with some data for tables involved to demostrate the problem) or at least SHOW CREATE TABLE and SHOW TABLE STATUS results for the tables involved.
[7 Feb 2008 19:47] Ratan Mohla
test schema and some data populated

Attachment: omvmomua_small.dmp.gz (application/gzip, text), 398.09 KiB.

[7 Feb 2008 19:47] Ratan Mohla
perl script to generate large data

Attachment: db_mysql_griphon_10wayjoin.pl (application/octet-stream, text), 9.60 KiB.

[24 Feb 2008 13:45] Valeriy Kravchuk
I see 5.0.18-debug-log as a server version in the dump you uploaded. So, what version do you have this bug in? Have you tried to check with 5.1.23-rc or 5.0.54?
[25 Feb 2008 11:56] Valeriy Kravchuk
I had loaded your test case. Actually, this is a know limitation of MySQL's optimizer:

mysql> explain SELECT objectmeta.name AS objname
    ->        , IFNULL(NV1.value,'NULL')  AS "projectName"
    ->        , objectmeta.objId , IFNULL(NV2.value,'NULL')  AS "NEType"
    ->        , objectmeta.objectState
     ->        , objectmeta.lastModifiedBy
    ->        , objectmeta.lastModificationTime , objectmeta.description
    ->        , objectmeta.creationTime , objectmeta.objectType
    ->        , objectmeta.createdBy
    -> FROM objectmeta
    -> LEFT JOIN
    -> (select  oid,name,value from objectmeta_userattributes where name = 'pro
jectName') NV1
    -> ON objectmeta.oid=NV1.oid
    -> LEFT JOIN (select  oid,name,value from objectmeta_userattributes where n
ame = 'NEType')
    -> NV2
    -> ON objectmeta.oid=NV2.oid
    -> WHERE objectmeta.oid IN (select versionmeta_objectmeta from versionmeta
    ->                          where versionmeta.createdby = 'netsmart' and ve
rsionState = '1')
    -> ORDER by 7 desc;
+----+--------------------+---------------------------+----------------+--------
---------------------------+------------------+---------+------+--------+-------
---------------------------------------+
| id | select_type        | table                     | type           | possibl
e_keys                     | key              | key_len | ref  | rows   | Extra
                                       |
+----+--------------------+---------------------------+----------------+--------
---------------------------+------------------+---------+------+--------+-------
---------------------------------------+
|  1 | PRIMARY            | <derived3>                | system         | NULL
                           | NULL             | NULL    | NULL |      0 | const
row not found                          |
|  1 | PRIMARY            | objectmeta                | ALL            | NULL
                           | NULL             | NULL    | NULL | 759423 | Using
where; Using temporary; Using filesort |
|  1 | PRIMARY            | <derived2>                | ALL            | NULL
                           | NULL             | NULL    | NULL |  61360 |
                                       |
|  4 | DEPENDENT SUBQUERY | versionmeta               | index_subquery | sk_vers
ionmeta_1,sk_versionmeta_2 | sk_versionmeta_2 | 19      | func |      1 | Using
where                                  |
|  3 | DERIVED            | objectmeta_userattributes | ALL            | NULL
                           | NULL             | NULL    | NULL | 317131 | Using
where                                  |
|  2 | DERIVED            | objectmeta_userattributes | ALL            | NULL
                           | NULL             | NULL    | NULL | 317131 | Using
where                                  |
+----+--------------------+---------------------------+----------------+--------
---------------------------+------------------+---------+------+--------+-------
---------------------------------------+
6 rows in set (3.07 sec)

Subquery to versionmeta table is considered DEPENDENT, while it is NOT. And using JOIN instead of such a subquery is a known good workaround.

This problem should be fixed in 6.0.x. I'll prove it in my next comment.
[25 Feb 2008 12:57] Valeriy Kravchuk
Now, let's check the plan in 6.0.x:

openxs@linux:~/dbs/6.0> bin/mysql -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 6.0.5-alpha Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> explain SELECT objectmeta.name AS objname
    ->        , IFNULL(NV1.value,'NULL')  AS "projectName"
    ->        , objectmeta.objId , IFNULL(NV2.value,'NULL')  AS "NEType"
    ->        , objectmeta.objectState
    ->        , objectmeta.lastModifiedBy
    ->        , objectmeta.lastModificationTime , objectmeta.description
    ->        , objectmeta.creationTime , objectmeta.objectType
    ->        , objectmeta.createdBy
    -> FROM objectmeta
    -> LEFT JOIN
    -> (select  oid,name,value from objectmeta_userattributes where name = 'pro
jectName') NV1
    -> ON objectmeta.oid=NV1.oid
    -> LEFT JOIN (select  oid,name,value from objectmeta_userattributes where n
ame = 'NEType')
    -> NV2
O    -> ON objectmeta.oid=NV2.oid
    -> WHERE objectmeta.oid IN (select versionmeta_objectmeta from versionmeta
    ->                          where versionmeta.createdby = 'netsmart' and ve
rsionState = '1')
    -> ORDER by 7 desc;
+----+-------------+---------------------------+--------+-----------------------
------------+------------------+---------+--------------------------------------
---+--------+---------------------------------------------------------------+
| id | select_type | table                     | type   | possible_keys
            | key              | key_len | ref
   | rows   | Extra                                                         |
+----+-------------+---------------------------+--------+-----------------------
------------+------------------+---------+--------------------------------------
---+--------+---------------------------------------------------------------+
|  1 | PRIMARY     | <derived3>                | system | NULL
            | NULL             | NULL    | NULL
   |      0 | const row not found                                           |
|  1 | PRIMARY     | versionmeta               | ref    | sk_versionmeta_1,sk_ve
rsionmeta_2 | sk_versionmeta_1 | 3       | const
   |     10 | Using where; Using temporary; Using filesort; Start temporary |
|  1 | PRIMARY     | objectmeta                | eq_ref | PRIMARY
            | PRIMARY          | 18      | test.versionmeta.VersionMeta_ObjectMe
ta |      1 | End temporary                                                 |
|  1 | PRIMARY     | <derived2>                | ALL    | NULL
            | NULL             | NULL    | NULL
   |  61360 |                                                               |
|  3 | DERIVED     | objectmeta_userattributes | ALL    | NULL
            | NULL             | NULL    | NULL
   | 336170 | Using where                                                   |
|  2 | DERIVED     | objectmeta_userattributes | ALL    | NULL
            | NULL             | NULL    | NULL
   | 336170 | Using where                                                   |
+----+-------------+---------------------------+--------+-----------------------
------------+------------------+---------+--------------------------------------
---+--------+---------------------------------------------------------------+
6 rows in set (5.38 sec)

So, as you can see, no DEPENDENT rows in the plan. This problem should be already fixed in 6.0.3-alpha, but it is unlikely for this fix to appear in 5.0 or 5.1 any time soon (as too many changes in code will be introduced). Please, check if execution time for this query in 6.0.3 is acceptable in your environment.
[25 Feb 2008 19:01] Ratan Mohla
Thank you very much for the analysis and investigation on the query optimization issue. To answer your question, we did test this query in 5.1.22 version earlier and the performance is same as that in 5.0.54 or 5.0.18. and the explain output is the same also. So, the problem still exists in 5.1.22. 

From our perspective, since 6.0.3 is in alpha stage, and we cannot use 6.0 in our product until it is GA release, and since it is still very long I guess it is possible that the GA will be as much as a year and a half which will be a very long time to wait. Since we are migrating from oracle to mysql, this is a very big issue for us, and since we know that this bug will be fixed in 6.0, it does not make sense for us to do the code changes as the code changes are not minor, it is going to take huge amount of code changes at may places since we are using this query pattern very extensively (from 2 way joins upto 10 way joins) and will require us to change a lot of XML and JAVA files along with unit, system and performance testing which will be too much time and effort. It is good to know that this issue is addressed in 6.0 and as I understand it solves a lot of other subquery optimization issues and not just the bug reported by me. But it is going to be a lot of help to us if we can at least somehow just translate the subquery reported in this bug to the join (since we use only this subquery pattern) and let join be optimized since we know that join performance is very good in 5.1. We do not need all the subquery cases where the subquery optimization can be done and we will be ok if they are addressed in 6.0.
[26 Feb 2008 18:26] Timothy Smith
Hi.  I am sorry to bring bad news, but the extensive changes needed to improve the subquery optimization are not likely to be backported to MySQL 5.1.

With apologies,

Timothy
[30 Jul 2012 17:14] Paul Dubois
Noted in 5.6.6 changelog.

LEFT JOIN on derived tables was very slow. This is now addressed
through the use of subquery materialization.