Bug #22765 subselect uses all partitions instead of the actual needed ones.
Submitted: 27 Sep 2006 23:01 Modified: 22 Jan 2008 19:23
Reporter: Ted van Diepen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Partitions Severity:S5 (Performance)
Version:5.1.11-beta-log OS:Linux (Linux, Fedora core 5)
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: partitioning, subselect

[27 Sep 2006 23:01] Ted van Diepen
Description:
It seems that the optimizer isn't smart enough to discover which partitions are needed when using a subselect. 

How to repeat:
1. Create 2 tables: 
*******************************************************

CREATE TABLE `Order` (
      `ID`              INT,
      `OrderDate`      TIMESTAMP,
      PRIMARY KEY ( `ID` )
);

CREATE TABLE `OrderLine` (
      `OrderID`         INT,
      `ProductID`       INT,
      `Quantity`        INT,
      INDEX ( `OrderID` )
)
PARTITION BY HASH (`OrderID`)
PARTITIONS 40;

2. Insert some records in both tables
*******************************************************

INSERT INTO `Order` (ID) VALUES (1), (2), (3), (4);

INSERT INTO `OrderLine` (OrderID, ProductID, Quantity) VALUES (1, 1, 100), (1, 2, 10), (1, 3, 50);
INSERT INTO `OrderLine` (OrderID, ProductID, Quantity) VALUES (2, 1, 100), (2, 2, 10), (2, 3, 50);
INSERT INTO `OrderLine` (OrderID, ProductID, Quantity) VALUES (3, 1, 100), (3, 2, 10), (3, 3, 50);
INSERT INTO `OrderLine` (OrderID, ProductID, Quantity) VALUES (4, 1, 100), (4, 2, 10), (4, 3, 50);

3. EXPLAIN PARTITIONS SELECT
*******************************************************
The next explains shows that the optimizer isn't working right when using a subselect query. 

Good query: 
EXPLAIN PARTITIONS SELECT * FROM `OrderLine` WHERE OrderID IN (3); 

output: 
+----+-------------+-----------+------------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-----------+------------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | OrderLine | p3         | ref  | OrderID       | OrderID | 5       | const |    2 | Using where |
+----+-------------+-----------+------------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

Bad query:
EXPLAIN PARTITIONS SELECT * FROM `OrderLine` WHERE OrderID IN (SELECT ID FROM `Order` WHERE ID = 3);

output:
+----+--------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type        | table     | partitions                                                                                                                                            | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+--------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | PRIMARY            | OrderLine |  9$h7   ?9$h7   5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p3?     | ALL   | NULL          | NULL    | NULL    | NULL  |   12 | Using where |
|  2 | DEPENDENT SUBQUERY | Order     | NULL                                                                                                                                                  | const | PRIMARY       | PRIMARY | 4       | const |    1 | Using index |
+----+--------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

It seems that the optimizer doesn't notice the first select as a constant value (or multipe values) for the second select.

Suggested fix:
optimizer fix.
[28 Sep 2006 11:22] Valeriy Kravchuk
Thank you for a problem report. The real, well known reason for this problem is not related to partitioning. It is:

DEPENDENT SUBQUERY

in EXPLAIN results. MySQL still incorrectly treats most subqueries as correlated (dependent), and re-executres them for each row. I hope, this will be fixed eventually.
[22 Jan 2008 19:23] Sergey Petrunya
The situation is improved by new semi-join subquery optimizations (see
http://forge.mysql.com/worklog/task.php?id=2980 and its sub-entries) in MySQL 6.0.

The provided example query will be converted into join, which will allow to do constant table detection and then use that for partition pruning:

mysql> EXPLAIN PARTITIONS SELECT * FROM `OrderLine` WHERE OrderID IN (SELECT ID FROM `Order` WHERE ID = 3);
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: Order
   partitions: NULL
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: OrderLine
   partitions: p3
         type: ref
possible_keys: OrderID
          key: OrderID
      key_len: 5
          ref: const
         rows: 5
        Extra: 
2 rows in set (0.00 sec)

If the subquery doesn't select primary key, it will be converted into a semi-join. In that case, static partition pruning will not be possible:

mysql> EXPLAIN PARTITIONS SELECT * FROM `Order` WHERE ID IN (SELECT OrderID FROM `OrderLine` WHERE ID = 3)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: Order
   partitions: NULL
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: OrderLine
   partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39
         type: ref
possible_keys: OrderID
          key: OrderID
      key_len: 5
          ref: const
         rows: 2
        Extra: Using index; FirstMatch(Order)
2 rows in set (0.01 sec)

but partition selection (see http://forge.mysql.com/wiki/MySQL_Internals_Optimizer#Partition_selection) will cause only one partition to be accessed.  The EXPLAIN support for this is underway, see http://forge.mysql.com/worklog/task.php?id=4128.

Changing status to Closed. Ted, feel free to re-open if you think the above is not a complete resolution. And thanks for taking time to report this.
[27 Aug 2014 13:36] Marc Vettorel
Okay, where is Version 6.0 now?
[27 Aug 2014 15:05] Sergey Petrunya
MySQL 6.0 was never released as stable, but semi-join optimizations were ported into MySQL 5.6. MySQL 5.6 should execute the query from this bug report efficiently.