Bug #20182 LEFT JOIN optimizations
Submitted: 1 Jun 2006 0:00 Modified: 11 Jul 2006 6:06
Reporter: Ruslan Zakirov Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:4.1.19 OS:Linux (gentoo linux)
Assigned to: Igor Babaev CPU Architecture:Any

[1 Jun 2006 0:00] Ruslan Zakirov
Description:
When a query has a LEFT join and WHERE condition doesn't allow NULLs on the right side mysql doesn't convert the join to the normal type. I think it's possible to optimize this situation on the server side. I call it 'optimize' as LEFT JOINs restrict order of a plan and leave less freedom to the plan optmizer.

How to repeat:
Below is a query that select tickets by requestor's email address:

SELECT DISTINCT main.*
FROM
    Tickets main
    JOIN Groups Groups_1  ON ( Groups_1.Instance = main.id )
    LEFT JOIN CachedGroupMembers CachedGroupMembers_2 ON ( CachedGroupMembers_2.GroupId = Groups_1.id )
    LEFT JOIN Users Users_3 ON ( Users_3.id = CachedGroupMembers_2.MemberId )
WHERE
    main.EffectiveId = main.id AND main.Status != 'deleted' AND main.Type = 'ticket'
    AND Groups_1.Domain = 'RT::Ticket-Role' AND Groups_1.Type = 'Requestor'
    AND Users_3.EmailAddress = 'foo.bar@test.com'
LIMIT 10;

This query is very slow as mysql have to do full scan on the Tickets table.

We couldn't convert LEFT joins to NORMAL in the application as we use the same FROM template to do next queries:
1) sort tickets by users' fields
2) find tickets by users' fields
3) find tickets that has [no] requestors
4) random combinations of the above.

Suggested fix:
1) simplify WHERE condition
2) for each LEFT JOIN check if WHERE condition allow NULLs on the right side, otherwise change it NORMAL join
3) converted JOINs may influence other LEFT JOINs (in the example this algorithm converts LEFT JOIN Users on the step 2, what makes LEFT JOIN CachedGroupMembers also optimisable)
4) continue with other steps
[1 Jun 2006 18:09] Jorge del Conde
Hi!

Can you please send me the CREATE TABLE statement as well as some data for the tables used in your query so that we can reproduce this bug ?

Thanks!
[1 Jun 2006 20:00] Ruslan Zakirov
mysql dump

Attachment: test.dump (application/octet-stream, text), 7.88 KiB.

[1 Jun 2006 20:02] Ruslan Zakirov
Sure. I've simplified the query a little:
EXPLAIN
SELECT g.*
FROM
    Groups g
    LEFT JOIN GroupMembers gm ON ( gm.GroupId = g.id )
    LEFT JOIN Users u ON ( u.id = gm.MemberId )
WHERE
    u.EmailAddress = 'root@localhost'
LIMIT 10;
+----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+
| id | select_type | table | type | possible_keys  | key           | key_len | ref       | rows | Extra                    |
+----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+
|  1 | SIMPLE      | u     | ref  | PRIMARY,Users4 | Users4        |     361 | const     |    1 | Using where; Using index |
|  1 | SIMPLE      | g     | ALL  | NULL           | NULL          |    NULL | NULL      |   56 |                          |
|  1 | SIMPLE      | gm    | ref  | GroupMembers1  | GroupMembers1 |       4 | test.g.id |    7 | Using where; Using index |
+----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+

This query should work the DB dump I've uploaded and reproduce a problem.
[14 Jun 2006 15:48] Valeriy Kravchuk
I was able to repeat the EXPLAIN results on your test data:

mysql> EXPLAIN SELECT g.*
    -> FROM
    ->     Groups g
    ->     LEFT JOIN GroupMembers gm ON ( gm.GroupId = g.id )
    ->     LEFT JOIN Users u ON ( u.id = gm.MemberId )
    -> WHERE
    ->     u.EmailAddress = 'root@localhost'
    -> LIMIT 10;
+----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+
| id | select_type | table | type | possible_keys  | key           | key_len | ref       | rows | Extra                    |
+----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+
|  1 | SIMPLE      | u     | ref  | PRIMARY,Users4 | Users4        |     361 | const     |    1 | Using where; Using index |
|  1 | SIMPLE      | g     | ALL  | NULL           | NULL          |    NULL | NULL      |   56 |                          |
|  1 | SIMPLE      | gm    | ref  | GroupMembers1  | GroupMembers1 |       4 | test.g.id |    7 | Using where; Using index |
+----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+
3 rows in set (0.00 sec)

But, please, explain what do you want from optimizer? Plan like the following:

mysql> EXPLAIN SELECT g.* FROM     Groups g JOIN GroupMembers gm ON (gm.GroupId = g.id ) JOIN Users u ON ( u.id = gm.MemberId ) WHERE     u.EmailAddress = 'root@localhost' LIMIT 10;
+----+-------------+-------+--------+-----------------------------+---------------+---------+-----------------+------+--------------------------+
| id | select_type | table | type   | possible_keys               | key  | key_len | ref             | rows | Extra                    |
+----+-------------+-------+--------+-----------------------------+---------------+---------+-----------------+------+--------------------------+
|  1 | SIMPLE      | u     | ref    | PRIMARY,Users4              | Users4  |     361 | const           |    1 | Using where; Using index |
|  1 | SIMPLE      | gm    | ref    | GroupMembers1,GroupMembers2 | GroupMembers
2 |       4 | test.u.id       |    7 | Using index              |
|  1 | SIMPLE      | g     | eq_ref | PRIMARY                     | PRIMARY  |       4 | test.gm.GroupId |    1 |                          |
+----+-------------+-------+--------+-----------------------------+---------------+---------+-----------------+------+--------------------------+
3 rows in set (0.01 sec)

Explain, what prevents rows in GroupMembers table to NOT have corresponding rows in Groups table?
[14 Jun 2006 17:53] Ruslan Zakirov
Ok, I'll describe my ideas step by step.
1) Query:
SELECT g.id, gm.id, u.id
FROM Groups g LEFT JOIN GroupMembers gm ON (gm.GroupId = g.id )
  LEFT JOIN Users u ON ( u.id = gm.MemberId)
WHERE u.EmailAddress = 'root@localhost';

This query finds all groups with member 'root@localhost'.

2) query has condition (u.EmailAddress = 'root@localhost') that drops NULLs in Users table from result set. Isn't it right? I think here mysql can convert 'LEFT JOIN Users' into 'JOIN Users'.

3) Rewriten query is:
SELECT g.id, gm.id, u.id
FROM
  Groups g LEFT JOIN GroupMembers gm ON (gm.GroupId = g.id ),
  Users u
WHERE gm.MemberId = u.id AND u.EmailAddress = 'root@localhost';

As you can see the same idea applies to the GroupMembers table.

Also, consider next generic query:
SELECT g.id, gm.id, u.id
FROM
  Groups g
  LEFT JOIN GroupMembers gm ON (gm.GroupId = g.id)
  LEFT JOIN Users u ON (u.id = gm.MemberId);

Result set may contain:
gid | NULL | NULL
gid | gmid | NULL
gid | gmid | uid

but never:
gid | NULL | uid

So condition 'u.id IS NOT NULL' gives you only one possible type:
gid | gmid | uid

Also, as I understand mysql does use optmisation like 2) as planner on some systems choose Users table as first one but then always scans ALL groups.

I hope this discription clarify things.
[5 Jul 2006 19:37] Valeriy Kravchuk
Your requirements and ideas looks reasonable for me.
[11 Jul 2006 6:06] Igor Babaev
This is implemented in 5.0. 
See MySQL 5.0 Reference Manual
7.2.11. Outer Join Simplification.

This optimization may change some plans for existing applications. That's why we are not going to add it for 4.1.