Bug #20182 | LEFT JOIN optimizations | ||
---|---|---|---|
Submitted: | 1 Jun 2006 0:00 | Modified: | 11 Jul 2006 6:06 |
Reporter: | Ruslan Zakirov | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 4.1.19 | OS: | Linux (gentoo linux) |
Assigned to: | Igor Babaev | CPU Architecture: | Any |
[1 Jun 2006 0:00]
Ruslan Zakirov
[1 Jun 2006 18:09]
Jorge del Conde
Hi! Can you please send me the CREATE TABLE statement as well as some data for the tables used in your query so that we can reproduce this bug ? Thanks!
[1 Jun 2006 20:00]
Ruslan Zakirov
mysql dump
Attachment: test.dump (application/octet-stream, text), 7.88 KiB.
[1 Jun 2006 20:02]
Ruslan Zakirov
Sure. I've simplified the query a little: EXPLAIN SELECT g.* FROM Groups g LEFT JOIN GroupMembers gm ON ( gm.GroupId = g.id ) LEFT JOIN Users u ON ( u.id = gm.MemberId ) WHERE u.EmailAddress = 'root@localhost' LIMIT 10; +----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+ | 1 | SIMPLE | u | ref | PRIMARY,Users4 | Users4 | 361 | const | 1 | Using where; Using index | | 1 | SIMPLE | g | ALL | NULL | NULL | NULL | NULL | 56 | | | 1 | SIMPLE | gm | ref | GroupMembers1 | GroupMembers1 | 4 | test.g.id | 7 | Using where; Using index | +----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+ This query should work the DB dump I've uploaded and reproduce a problem.
[14 Jun 2006 15:48]
Valeriy Kravchuk
I was able to repeat the EXPLAIN results on your test data: mysql> EXPLAIN SELECT g.* -> FROM -> Groups g -> LEFT JOIN GroupMembers gm ON ( gm.GroupId = g.id ) -> LEFT JOIN Users u ON ( u.id = gm.MemberId ) -> WHERE -> u.EmailAddress = 'root@localhost' -> LIMIT 10; +----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+ | 1 | SIMPLE | u | ref | PRIMARY,Users4 | Users4 | 361 | const | 1 | Using where; Using index | | 1 | SIMPLE | g | ALL | NULL | NULL | NULL | NULL | 56 | | | 1 | SIMPLE | gm | ref | GroupMembers1 | GroupMembers1 | 4 | test.g.id | 7 | Using where; Using index | +----+-------------+-------+------+----------------+---------------+---------+-----------+------+--------------------------+ 3 rows in set (0.00 sec) But, please, explain what do you want from optimizer? Plan like the following: mysql> EXPLAIN SELECT g.* FROM Groups g JOIN GroupMembers gm ON (gm.GroupId = g.id ) JOIN Users u ON ( u.id = gm.MemberId ) WHERE u.EmailAddress = 'root@localhost' LIMIT 10; +----+-------------+-------+--------+-----------------------------+---------------+---------+-----------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+-----------------------------+---------------+---------+-----------------+------+--------------------------+ | 1 | SIMPLE | u | ref | PRIMARY,Users4 | Users4 | 361 | const | 1 | Using where; Using index | | 1 | SIMPLE | gm | ref | GroupMembers1,GroupMembers2 | GroupMembers 2 | 4 | test.u.id | 7 | Using index | | 1 | SIMPLE | g | eq_ref | PRIMARY | PRIMARY | 4 | test.gm.GroupId | 1 | | +----+-------------+-------+--------+-----------------------------+---------------+---------+-----------------+------+--------------------------+ 3 rows in set (0.01 sec) Explain, what prevents rows in GroupMembers table to NOT have corresponding rows in Groups table?
[14 Jun 2006 17:53]
Ruslan Zakirov
Ok, I'll describe my ideas step by step. 1) Query: SELECT g.id, gm.id, u.id FROM Groups g LEFT JOIN GroupMembers gm ON (gm.GroupId = g.id ) LEFT JOIN Users u ON ( u.id = gm.MemberId) WHERE u.EmailAddress = 'root@localhost'; This query finds all groups with member 'root@localhost'. 2) query has condition (u.EmailAddress = 'root@localhost') that drops NULLs in Users table from result set. Isn't it right? I think here mysql can convert 'LEFT JOIN Users' into 'JOIN Users'. 3) Rewriten query is: SELECT g.id, gm.id, u.id FROM Groups g LEFT JOIN GroupMembers gm ON (gm.GroupId = g.id ), Users u WHERE gm.MemberId = u.id AND u.EmailAddress = 'root@localhost'; As you can see the same idea applies to the GroupMembers table. Also, consider next generic query: SELECT g.id, gm.id, u.id FROM Groups g LEFT JOIN GroupMembers gm ON (gm.GroupId = g.id) LEFT JOIN Users u ON (u.id = gm.MemberId); Result set may contain: gid | NULL | NULL gid | gmid | NULL gid | gmid | uid but never: gid | NULL | uid So condition 'u.id IS NOT NULL' gives you only one possible type: gid | gmid | uid Also, as I understand mysql does use optmisation like 2) as planner on some systems choose Users table as first one but then always scans ALL groups. I hope this discription clarify things.
[5 Jul 2006 19:37]
Valeriy Kravchuk
Your requirements and ideas looks reasonable for me.
[11 Jul 2006 6:06]
Igor Babaev
This is implemented in 5.0. See MySQL 5.0 Reference Manual 7.2.11. Outer Join Simplification. This optimization may change some plans for existing applications. That's why we are not going to add it for 4.1.