Bug #8202 left join optimizer
Submitted: 30 Jan 2005 13:34 Modified: 14 Jan 2020 21:50
Reporter: Martin Friebe (Gold Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:4.1.9, 5.x OS:Any
Assigned to: CPU Architecture:Any
Tags: qc

[30 Jan 2005 13:34] Martin Friebe
Description:
I believe, I have found a left join condition, that could be optimized far better. I have no prove that it does not allready happen, as all I could to do find the current behaviour where some timing tests on really large data.

if you set up 2 large enough tables:

select count(*)   from large1 l1    left join large2 l2  on  l1.some_key = l2.some_key

compared against:

select count(*)   from large1 l1    left join large2 l2  on  l1.some_key = l2.some_key  AND 1=2 # impossible condition in on

the time used to execute the query is the same.

if you run or explain
select count(*)   from large1 l1    left join large2 l2  on  l1.some_key = l2.some_key  AND 1=2 # impossible condition in on

it is obvious, that mysql evalutes the condition for each possible row, and most likely, AFTER fetching the row.

mysql could optimize those parts of the oncondition, that do not refer to the "to be joined in" table, and check them, before reading the rows. If this part of the condition fails, then no row will match, the 2nd table does not need to be read, and a set of null values can be created.

A mor practical example of the above

select  u.name, e.address
from user u  left join email e on e.user_id=u.id  AND u.mailfalg=1;

should return ALL users (therefore the mailflag condition can not go into the where clause, but the email is only needed, if the flag is on.
The fewer user have the falg, the more impact would the above optimization have

How to repeat:
see description above

Suggested fix:
check for conditions (in an "and" list of conditions) in the on clause, that could be evaluated before the rows of the 2nd table a read. only read the 2nd table, if thaty part of the condition is true.
[30 Jan 2005 23:19] Martin Friebe
just seen my third example is mystyped

select count(*)   from large1 l1    left join large2 l2  on   1=2 # impossible condition in on 

it takes forever, it checks every row against every in the other table
[27 May 2006 11:59] Valeriy Kravchuk
Thank you for a resonable feature request. This kind of optimization is not performed even in latest 5.0.23-BK:

mysql> explain select count(*) from large1 left join large2 on 1=2;
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  |
 rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
|  1 | SIMPLE      | large1 | index | NULL          | PRIMARY | 4       | NULL |
  384 | Using index |
|  1 | SIMPLE      | large2 | ALL   | NULL          | NULL    | NULL    | NULL |
  384 |             |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
2 rows in set (0.00 sec)
[18 Aug 2007 2:00] Igor Babaev
- This is a request for an optimization that can be applied in rare cases.
- An implementation of this optimization will require ~ 2-3 man-weeks.  

By the above reasons I move the bug to 'To be fixed later'.
Product management will decide in what version a fix for this problem appears.
[14 Jan 2020 21:50] Jon Stephens
Fixed in MySQL 8.0.20 together with BUG#97552, see same for docs info.

Closed.