| Bug #22765 | subselect uses all partitions instead of the actual needed ones. | ||
|---|---|---|---|
| Submitted: | 27 Sep 2006 23:01 | Modified: | 22 Jan 2008 19:23 |
| Reporter: | Ted van Diepen | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Partitions | Severity: | S5 (Performance) |
| Version: | 5.1.11-beta-log | OS: | Linux (Linux, Fedora core 5) |
| Assigned to: | Sergey Petrunya | CPU Architecture: | Any |
| Tags: | partitioning, subselect | ||
[28 Sep 2006 11:22]
Valeriy Kravchuk
Thank you for a problem report. The real, well known reason for this problem is not related to partitioning. It is: DEPENDENT SUBQUERY in EXPLAIN results. MySQL still incorrectly treats most subqueries as correlated (dependent), and re-executres them for each row. I hope, this will be fixed eventually.
[22 Jan 2008 19:23]
Sergey Petrunya
The situation is improved by new semi-join subquery optimizations (see http://forge.mysql.com/worklog/task.php?id=2980 and its sub-entries) in MySQL 6.0. The provided example query will be converted into join, which will allow to do constant table detection and then use that for partition pruning: mysql> EXPLAIN PARTITIONS SELECT * FROM `OrderLine` WHERE OrderID IN (SELECT ID FROM `Order` WHERE ID = 3); *************************** 1. row *************************** id: 1 select_type: PRIMARY table: Order partitions: NULL type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: Using index *************************** 2. row *************************** id: 1 select_type: PRIMARY table: OrderLine partitions: p3 type: ref possible_keys: OrderID key: OrderID key_len: 5 ref: const rows: 5 Extra: 2 rows in set (0.00 sec) If the subquery doesn't select primary key, it will be converted into a semi-join. In that case, static partition pruning will not be possible: mysql> EXPLAIN PARTITIONS SELECT * FROM `Order` WHERE ID IN (SELECT OrderID FROM `OrderLine` WHERE ID = 3)\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: Order partitions: NULL type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: *************************** 2. row *************************** id: 1 select_type: PRIMARY table: OrderLine partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39 type: ref possible_keys: OrderID key: OrderID key_len: 5 ref: const rows: 2 Extra: Using index; FirstMatch(Order) 2 rows in set (0.01 sec) but partition selection (see http://forge.mysql.com/wiki/MySQL_Internals_Optimizer#Partition_selection) will cause only one partition to be accessed. The EXPLAIN support for this is underway, see http://forge.mysql.com/worklog/task.php?id=4128. Changing status to Closed. Ted, feel free to re-open if you think the above is not a complete resolution. And thanks for taking time to report this.
[27 Aug 2014 13:36]
Marc Vettorel
Okay, where is Version 6.0 now?
[27 Aug 2014 15:05]
Sergey Petrunya
MySQL 6.0 was never released as stable, but semi-join optimizations were ported into MySQL 5.6. MySQL 5.6 should execute the query from this bug report efficiently.

Description: It seems that the optimizer isn't smart enough to discover which partitions are needed when using a subselect. How to repeat: 1. Create 2 tables: ******************************************************* CREATE TABLE `Order` ( `ID` INT, `OrderDate` TIMESTAMP, PRIMARY KEY ( `ID` ) ); CREATE TABLE `OrderLine` ( `OrderID` INT, `ProductID` INT, `Quantity` INT, INDEX ( `OrderID` ) ) PARTITION BY HASH (`OrderID`) PARTITIONS 40; 2. Insert some records in both tables ******************************************************* INSERT INTO `Order` (ID) VALUES (1), (2), (3), (4); INSERT INTO `OrderLine` (OrderID, ProductID, Quantity) VALUES (1, 1, 100), (1, 2, 10), (1, 3, 50); INSERT INTO `OrderLine` (OrderID, ProductID, Quantity) VALUES (2, 1, 100), (2, 2, 10), (2, 3, 50); INSERT INTO `OrderLine` (OrderID, ProductID, Quantity) VALUES (3, 1, 100), (3, 2, 10), (3, 3, 50); INSERT INTO `OrderLine` (OrderID, ProductID, Quantity) VALUES (4, 1, 100), (4, 2, 10), (4, 3, 50); 3. EXPLAIN PARTITIONS SELECT ******************************************************* The next explains shows that the optimizer isn't working right when using a subselect query. Good query: EXPLAIN PARTITIONS SELECT * FROM `OrderLine` WHERE OrderID IN (3); output: +----+-------------+-----------+------------+------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------------+------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | OrderLine | p3 | ref | OrderID | OrderID | 5 | const | 2 | Using where | +----+-------------+-----------+------------+------+---------------+---------+---------+-------+------+-------------+ 1 row in set (0.00 sec) Bad query: EXPLAIN PARTITIONS SELECT * FROM `OrderLine` WHERE OrderID IN (SELECT ID FROM `Order` WHERE ID = 3); output: +----+--------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+---------+---------+-------+------+-------------+ | 1 | PRIMARY | OrderLine | 9$h7 ?9$h7 5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p3? | ALL | NULL | NULL | NULL | NULL | 12 | Using where | | 2 | DEPENDENT SUBQUERY | Order | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index | +----+--------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+---------+---------+-------+------+-------------+ 2 rows in set (0.00 sec) It seems that the optimizer doesn't notice the first select as a constant value (or multipe values) for the second select. Suggested fix: optimizer fix.