Bug #45880 OUTER JOINed tables with no condition in WHERE must be at the end of explain
Submitted: 1 Jul 2009 14:21
Reporter: Andrii Nikitin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0, 5.1 OS:Any
Assigned to: Sergei Glukhov CPU Architecture:Any
Triage: Triaged: D3 (Medium) / R3 (Medium) / E2 (Low)

[1 Jul 2009 14:21] Andrii Nikitin
Description:
If query has:
- table in OUTER JOIN 
- there is no filter condition for that table

Then this table must be last in join order: otherwise it is possible that some rows retrieved from this table will be discarded in further filtering, i.e. we fetched more rows than was needed. 
At the same time retrieving unnecessary rows doesn't give any advantage.

How to repeat:
Problem observed in both 5.0 and 5.1 but in 5.0 performance difference not so big. This is simplified example while in real life difference may be 4min vs 2sec (120x times).

1. Load sakila database 
http://downloads.mysql.com/docs/sakila-db.zip

2. See explain plan for query with LEFT JOIN and no condition on tables f, fc, c:
explain SELECT SQL_NO_CACHE -- STRAIGHT_JOIN
c.name AS category
, SUM(p.amount) AS total_sales
FROM rental AS r
JOIN payment AS p ON p.rental_id = r.rental_id
JOIN inventory AS i ON r.inventory_id = i.inventory_id
LEFT JOIN film AS f ON i.film_id = f.film_id
LEFT JOIN film_category AS fc ON f.film_id = fc.film_id
LEFT JOIN category AS c ON fc.category_id = c.category_id
WHERE p.amount > 10
and r.rental_date < "2005-06-22"
GROUP BY c.name;

See that some of tables f, fc, c are joined before p and r

3. Execute queries:
flush status;

SELECT SQL_NO_CACHE -- STRAIGHT_JOIN
c.name AS category
, SUM(p.amount) AS total_sales
FROM rental AS r
JOIN payment AS p ON p.rental_id = r.rental_id
JOIN inventory AS i ON r.inventory_id = i.inventory_id
LEFT JOIN film AS f ON i.film_id = f.film_id
LEFT JOIN film_category AS fc ON f.film_id = fc.film_id
LEFT JOIN category AS c ON fc.category_id = c.category_id
WHERE p.amount > 10
and r.rental_date < "2005-06-22"
GROUP BY c.name;

show status like "ha%";

flush status;

SELECT SQL_NO_CACHE STRAIGHT_JOIN
c.name AS category
, SUM(p.amount) AS total_sales
FROM rental AS r
JOIN payment AS p ON p.rental_id = r.rental_id
JOIN inventory AS i ON r.inventory_id = i.inventory_id
LEFT JOIN film AS f ON i.film_id = f.film_id
LEFT JOIN film_category AS fc ON f.film_id = fc.film_id
LEFT JOIN category AS c ON fc.category_id = c.category_id
WHERE p.amount > 10
and r.rental_date < "2005-06-22"
GROUP BY c.name;

show status like "ha%";

4. See that query with STRAIGHT_JOIN executes in 0.04 sec while optimizer chooses plan which executed in 0.16 sec (4x times difference).

ha% counters:
Handler_read_key	29042
Handler_read_next	57346
vs:
Handler_read_key	7146
Handler_read_next	13908

Suggested fix:
Check if OUTER joined table has no condition in WHERE and place it to the end of JOIN order.