Description:
If query has:
- table in OUTER JOIN
- there is no filter condition for that table
Then this table must be last in join order: otherwise it is possible that some rows retrieved from this table will be discarded in further filtering, i.e. we fetched more rows than was needed.
At the same time retrieving unnecessary rows doesn't give any advantage.
How to repeat:
Problem observed in both 5.0 and 5.1 but in 5.0 performance difference not so big. This is simplified example while in real life difference may be 4min vs 2sec (120x times).
1. Load sakila database
http://downloads.mysql.com/docs/sakila-db.zip
2. See explain plan for query with LEFT JOIN and no condition on tables f, fc, c:
explain SELECT SQL_NO_CACHE -- STRAIGHT_JOIN
c.name AS category
, SUM(p.amount) AS total_sales
FROM rental AS r
JOIN payment AS p ON p.rental_id = r.rental_id
JOIN inventory AS i ON r.inventory_id = i.inventory_id
LEFT JOIN film AS f ON i.film_id = f.film_id
LEFT JOIN film_category AS fc ON f.film_id = fc.film_id
LEFT JOIN category AS c ON fc.category_id = c.category_id
WHERE p.amount > 10
and r.rental_date < "2005-06-22"
GROUP BY c.name;
See that some of tables f, fc, c are joined before p and r
3. Execute queries:
flush status;
SELECT SQL_NO_CACHE -- STRAIGHT_JOIN
c.name AS category
, SUM(p.amount) AS total_sales
FROM rental AS r
JOIN payment AS p ON p.rental_id = r.rental_id
JOIN inventory AS i ON r.inventory_id = i.inventory_id
LEFT JOIN film AS f ON i.film_id = f.film_id
LEFT JOIN film_category AS fc ON f.film_id = fc.film_id
LEFT JOIN category AS c ON fc.category_id = c.category_id
WHERE p.amount > 10
and r.rental_date < "2005-06-22"
GROUP BY c.name;
show status like "ha%";
flush status;
SELECT SQL_NO_CACHE STRAIGHT_JOIN
c.name AS category
, SUM(p.amount) AS total_sales
FROM rental AS r
JOIN payment AS p ON p.rental_id = r.rental_id
JOIN inventory AS i ON r.inventory_id = i.inventory_id
LEFT JOIN film AS f ON i.film_id = f.film_id
LEFT JOIN film_category AS fc ON f.film_id = fc.film_id
LEFT JOIN category AS c ON fc.category_id = c.category_id
WHERE p.amount > 10
and r.rental_date < "2005-06-22"
GROUP BY c.name;
show status like "ha%";
4. See that query with STRAIGHT_JOIN executes in 0.04 sec while optimizer chooses plan which executed in 0.16 sec (4x times difference).
ha% counters:
Handler_read_key 29042
Handler_read_next 57346
vs:
Handler_read_key 7146
Handler_read_next 13908
Suggested fix:
Check if OUTER joined table has no condition in WHERE and place it to the end of JOIN order.