Bug #118778 Implement Lateral Derived Optimization
Submitted: 4 Aug 14:58 Modified: 5 Aug 5:17
Reporter: Przemyslaw Malkowski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:8.0, 8.4 OS:Any
Assigned to: CPU Architecture:Any
Tags: derived, LATERAL, Optimizer bug

[4 Aug 14:58] Przemyslaw Malkowski
Description:
Hello,

For certain queries using DERIVED tables, the execution is very inefficient, clearly lacking the opportunity for plan optimization. 
For example:

mysql > EXPLAIN SELECT * FROM customers c JOIN ( SELECT customer_id, SUM(amount) AS total_amt FROM orders   WHERE order_date BETWEEN '2017-10-01' AND '2017-10-31' GROUP BY customer_id ) totals ON c.customer_id = totals.customer_id WHERE c.customer_name IN ('Customer#1', 'Customer#2');
+----+-------------+------------+------------+-------+----------------------+-------------+---------+--------------------+--------+----------+--------------------------+
| id | select_type | table      | partitions | type  | possible_keys        | key         | key_len | ref                | rows   | filtered | Extra                    |
+----+-------------+------------+------------+-------+----------------------+-------------+---------+--------------------+--------+----------+--------------------------+
|  1 | PRIMARY     | c          | NULL       | range | PRIMARY,idx_name     | idx_name    | 103     | NULL               |      2 |   100.00 | Using where; Using index |
|  1 | PRIMARY     | <derived2> | NULL       | ref   | <auto_key0>          | <auto_key0> | 5       | db10.c.customer_id |     50 |   100.00 | NULL                     |
|  2 | DERIVED     | orders     | NULL       | index | idx_cust_id,idx_date | idx_cust_id | 5       | NULL               | 100345 |    50.00 | Using where              |
+----+-------------+------------+------------+-------+----------------------+-------------+---------+--------------------+--------+----------+--------------------------+
3 rows in set, 1 warning (0.00 sec)

mysql > show status like 'ha%';
+----------------------------+--------+
| Variable_name              | Value  |
+----------------------------+--------+
| Handler_commit             | 1      |
| Handler_delete             | 0      |
| Handler_discover           | 0      |
| Handler_external_lock      | 4      |
| Handler_mrr_init           | 0      |
| Handler_prepare            | 0      |
| Handler_read_first         | 1      |
| Handler_read_key           | 5      |
| Handler_read_last          | 0      |
| Handler_read_next          | 100006 |
| Handler_read_prev          | 0      |
| Handler_read_rnd           | 0      |
| Handler_read_rnd_next      | 0      |
| Handler_rollback           | 0      |
| Handler_savepoint          | 0      |
| Handler_savepoint_rollback | 0      |
| Handler_update             | 0      |
| Handler_write              | 1000   |
+----------------------------+--------+
18 rows in set (0.05 sec)

With Lateral Derived Optimization, the alternative database variant shows a much more efficient plan (same data and tables):
+------+-----------------+------------+-------+----------------------+-------------+---------+-------------------+------+--------------------------+
| id   | select_type     | table      | type  | possible_keys        | key         | key_len | ref               | rows | Extra                    |
+------+-----------------+------------+-------+----------------------+-------------+---------+-------------------+------+--------------------------+
|    1 | PRIMARY         | c          | range | PRIMARY,idx_name     | idx_name    | 103     | NULL              | 2    | Using where; Using index |
|    1 | PRIMARY         | <derived2> | ref   | key0                 | key0        | 5       | db2.c.customer_id | 2    |                          |
|    2 | LATERAL DERIVED | orders     | ref   | idx_cust_id,idx_date | idx_cust_id | 5       | db2.c.customer_id | 1    | Using where              |
+------+-----------------+------------+-------+----------------------+-------------+---------+-------------------+------+--------------------------+

mysql > show status like 'ha%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 6     |
| Handler_read_last          | 0     |
| Handler_read_next          | 216   |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_delete         | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 2     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

How to repeat:
CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  customer_name VARCHAR(100),
  INDEX idx_name (customer_name)
);
-- Insert 1000 customers

CREATE TABLE orders (
  order_id INT PRIMARY KEY AUTO_INCREMENT,
  customer_id INT,
  order_date DATE,
  amount DECIMAL(10,2),
  INDEX idx_cust_id (customer_id),
  INDEX idx_date (order_date)
);
-- Insert 100,000 orders for October 2017, distributed across customers

EXPLAIN SELECT *
FROM customers c
JOIN (
  SELECT customer_id, SUM(amount) AS total_amt
  FROM orders
  WHERE order_date BETWEEN '2017-10-01' AND '2017-10-31'
  GROUP BY customer_id
) totals
ON c.customer_id = totals.customer_id
WHERE c.customer_name IN ('Customer#1', 'Customer#2');

I will upload the example data dump.

Suggested fix:
Can we have a similar lateral optimization implemented in MySQL Community?
[4 Aug 15:02] Przemyslaw Malkowski
Example random data

Attachment: dump_orders_customers_FR.sql.gz (application/gzip, text), 912.70 KiB.

[5 Aug 5:17] MySQL Verification Team
Hello Przemyslaw,

Thank you for the reasonable feature request!

Thanks,
Umesh