Description:
Hello,
For certain queries using DERIVED tables, the execution is very inefficient, clearly lacking the opportunity for plan optimization.
For example:
mysql > EXPLAIN SELECT * FROM customers c JOIN ( SELECT customer_id, SUM(amount) AS total_amt FROM orders WHERE order_date BETWEEN '2017-10-01' AND '2017-10-31' GROUP BY customer_id ) totals ON c.customer_id = totals.customer_id WHERE c.customer_name IN ('Customer#1', 'Customer#2');
+----+-------------+------------+------------+-------+----------------------+-------------+---------+--------------------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+----------------------+-------------+---------+--------------------+--------+----------+--------------------------+
| 1 | PRIMARY | c | NULL | range | PRIMARY,idx_name | idx_name | 103 | NULL | 2 | 100.00 | Using where; Using index |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 5 | db10.c.customer_id | 50 | 100.00 | NULL |
| 2 | DERIVED | orders | NULL | index | idx_cust_id,idx_date | idx_cust_id | 5 | NULL | 100345 | 50.00 | Using where |
+----+-------------+------------+------------+-------+----------------------+-------------+---------+--------------------+--------+----------+--------------------------+
3 rows in set, 1 warning (0.00 sec)
mysql > show status like 'ha%';
+----------------------------+--------+
| Variable_name | Value |
+----------------------------+--------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 4 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 5 |
| Handler_read_last | 0 |
| Handler_read_next | 100006 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 1000 |
+----------------------------+--------+
18 rows in set (0.05 sec)
With Lateral Derived Optimization, the alternative database variant shows a much more efficient plan (same data and tables):
+------+-----------------+------------+-------+----------------------+-------------+---------+-------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-----------------+------------+-------+----------------------+-------------+---------+-------------------+------+--------------------------+
| 1 | PRIMARY | c | range | PRIMARY,idx_name | idx_name | 103 | NULL | 2 | Using where; Using index |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 5 | db2.c.customer_id | 2 | |
| 2 | LATERAL DERIVED | orders | ref | idx_cust_id,idx_date | idx_cust_id | 5 | db2.c.customer_id | 1 | Using where |
+------+-----------------+------------+-------+----------------------+-------------+---------+-------------------+------+--------------------------+
mysql > show status like 'ha%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 0 |
| Handler_icp_attempts | 0 |
| Handler_icp_match | 0 |
| Handler_mrr_init | 0 |
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 6 |
| Handler_read_last | 0 |
| Handler_read_next | 216 |
| Handler_read_prev | 0 |
| Handler_read_retry | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_tmp_delete | 0 |
| Handler_tmp_update | 0 |
| Handler_tmp_write | 2 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+
How to repeat:
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
INDEX idx_name (customer_name)
);
-- Insert 1000 customers
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT,
order_date DATE,
amount DECIMAL(10,2),
INDEX idx_cust_id (customer_id),
INDEX idx_date (order_date)
);
-- Insert 100,000 orders for October 2017, distributed across customers
EXPLAIN SELECT *
FROM customers c
JOIN (
SELECT customer_id, SUM(amount) AS total_amt
FROM orders
WHERE order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY customer_id
) totals
ON c.customer_id = totals.customer_id
WHERE c.customer_name IN ('Customer#1', 'Customer#2');
I will upload the example data dump.
Suggested fix:
Can we have a similar lateral optimization implemented in MySQL Community?