Bug #26630 Partition pruning not working against joins
Submitted: 26 Feb 2007 11:21 Modified: 16 Oct 2008 17:59
Reporter: Lim tienaik Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.1.18-BK, 5.1.14-beta OS:Linux (linux)
Assigned to: CPU Architecture:Any
Tags: Partition pruning
Triage: Needs Triage: D5 (Feature request)

[26 Feb 2007 11:21] Lim tienaik
Description:
All partitions are use although its not required, the optimizer should choose the partition that contains the data instead of everything.

How to repeat:
create table tbl_test
(
num int,
val int
)
partition by range(num)
(
partition p0 values less than (5),
partition p1 values less than (10),
partition p2 values less than (15),
partition p3 values less than (20)
);
create table tbl_co
(
num int,
reg int
);
insert into tbl_test values
(1,12),
(2,11),
(3,123),
(4,342),
(5,23412),
(6,343),
(7,65),
(8,67),
(9,87),
(10,132),
(11,1234),
(12,1245),
(13,12468),
(14,12342),
(15,15),
(16,1),
(17,1223);
insert into tbl_co values
(1,2),
(2,3),
(3,5),
(4,2),
(5,6),
(6,9),
(7,6),
(8,4),
(9,7),
(10,7),
(11,10),
(12,2),
(13,1),
(14,1),
(15,8),
(16,8),
(17,1);
explain partitions
select * from tbl_test  t
inner join tbl_co c on t.num = c.num
where reg = 8;

Suggested fix:
add check for partition pruning on joins
[7 Mar 2007 18:26] Valeriy Kravchuk
Thank you for a problem report. I was able to repeat the behaviour described with latest 5.1.17-BK on Linux:

openxs@suse:~/dbs/5.1> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.17-beta Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table tbl_test
    -> (
    -> num int,
    -> val int
    -> )
    -> partition by range(num)
    -> (
    -> partition p0 values less than (5),
    -> partition p1 values less than (10),
    -> partition p2 values less than (15),
    -> partition p3 values less than (20)
    -> );
Query OK, 0 rows affected (0.02 sec)

mysql> create table tbl_co
    -> (
    -> num int,
    -> reg int
    -> );
Query OK, 0 rows affected (0.03 sec)

mysql> insert into tbl_test values
    -> (1,12),
    -> (2,11),
    -> (3,123),
    -> (4,342),
    -> (5,23412),
    -> (6,343),
    -> (7,65),
    -> (8,67),
    -> (9,87),
    -> (10,132),
    -> (11,1234),
    -> (12,1245),
    -> (13,12468),
    -> (14,12342),
    -> (15,15),
    -> (16,1),
    -> (17,1223);
Query OK, 17 rows affected (0.01 sec)
Records: 17  Duplicates: 0  Warnings: 0

mysql> insert into tbl_co values
    -> (1,2),
    -> (2,3),
    -> (3,5),
    -> (4,2),
    -> (5,6),
    -> (6,9),
    -> (7,6),
    -> (8,4),
    -> (9,7),
    -> (10,7),
    -> (11,10),
    -> (12,2),
    -> (13,1),
    -> (14,1),
    -> (15,8),
    -> (16,8),
    -> (17,1);
Query OK, 17 rows affected (0.01 sec)
Records: 17  Duplicates: 0  Warnings: 0

mysql> explain partitions
    -> select * from tbl_test  t
    -> inner join tbl_co c on t.num = c.num
    -> where reg = 8;
+----+-------------+-------+-------------+------+---------------+------+--------
-+------+------+-------------+
| id | select_type | table | partitions  | type | possible_keys | key  | key_len
 | ref  | rows | Extra       |
+----+-------------+-------+-------------+------+---------------+------+--------
-+------+------+-------------+
|  1 | SIMPLE      | t     | p0,p1,p2,p3 | ALL  | NULL          | NULL | NULL
 | NULL |   17 |             |
|  1 | SIMPLE      | c     | NULL        | ALL  | NULL          | NULL | NULL
 | NULL |   17 | Using where |
+----+-------------+-------+-------------+------+---------------+------+--------
-+------+------+-------------+
2 rows in set (0.00 sec)

As tbl_test is selected as outer (and why not, with 17 rows in both tables?), partitions prunning is not needed: we just select all rows from partitioned table, then select corresponding one from tbl_co and test condition on it.

Theoreticall prunning is possible if we select rows with reg=8 from tbl_co, and then, for each row we might be able to access one and only one partition. Let's try:

mysql> explain partitions select * from tbl_co c straight_join tbl_test t on t.
num = c.num where reg = 8;
+----+-------------+-------+-------------+------+---------------+------+--------
-+------+------+-------------+
| id | select_type | table | partitions  | type | possible_keys | key  | key_len
 | ref  | rows | Extra       |
+----+-------------+-------+-------------+------+---------------+------+--------
-+------+------+-------------+
|  1 | SIMPLE      | c     | NULL        | ALL  | NULL          | NULL | NULL
 | NULL |   17 | Using where |
|  1 | SIMPLE      | t     | p0,p1,p2,p3 | ALL  | NULL          | NULL | NULL
 | NULL |   17 | Using where |
+----+-------------+-------+-------------+------+---------------+------+--------
-+------+------+-------------+
2 rows in set (0.00 sec)

I suspect pruning was used, but how one can show that in EXPLAIN? For each row from tbl_co you will need some partition, and you do not know what exact one!

If we do know, then we can see the pruning:

mysql> explain partitions select * from tbl_co c straight_join tbl_test t on t.
num = c.num where reg = 8 and t.num < 5;
+----+-------------+-------+------------+------+---------------+------+---------
+------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len
| ref  | rows | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------
+------+------+-------------+
|  1 | SIMPLE      | c     | NULL       | ALL  | NULL          | NULL | NULL
| NULL |   17 | Using where |
|  1 | SIMPLE      | t     | p0         | ALL  | NULL          | NULL | NULL
| NULL |    4 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------
+------+------+-------------+
2 rows in set (0.00 sec)

So, I think, it is not a bug (I mean, you can not tell that this is a bug based on EXPLAIN PARTITIONS results). If you can prove that no pruning is used to find individual rows from tbl_test in my example with straight_join then I agree, it is something to consider.

Your comments are welcomed.
[7 Apr 2007 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[13 Apr 2007 13:49] Valeriy Kravchuk
I tried to check again, with latest 5.1.18-BK, and based on the following results (continue my previous tests):

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from tbl_co c straight_join tbl_test t on t.num = c.num where r
eg = 8;
+------+------+------+------+
| num  | reg  | num  | val  |
+------+------+------+------+
|   15 |    8 |   15 |   15 |
|   16 |    8 |   16 |    1 |
+------+------+------+------+
2 rows in set (0.01 sec)

mysql> show status like 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 39    |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
15 rows in set (0.00 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from tbl_co c straight_join tbl_test t on t.num = c.num where r
eg = 8 and t.num < 5;
Empty set (0.00 sec)

mysql> show status like 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 18    |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
15 rows in set (0.00 sec)

I can see a clear difference in Handler_read_rnd_next value (18 when partition prunning was surely used vs. 39 when it was NOT likely used in joining individual rows from outer table with partitioned one). So, I think, there is a problem to consider.
[2 May 2007 6:25] Igor Babaev
This is a serious problem, not easy to resolve.
Marked as to be fixed later (after 5.2)
[16 Dec 2008 19:46] Omer Barnir
triage: forgot to set e/r values with last comment - updating now
[3 Jan 2009 19:59] Sergey Petrunya
First note: If you re-run the EXPLAINs from the example on a recent 5.1 you'll see that "Using join buffer" is used for the second table. It was there before, too, it was just not shown in EXPLAIN. This means that the query plan actually is 

1. Read records from the first table into join buffer until it is full;
2. Read records from the second table and join them with what is in the buffer. 

That is, tables are read almost independently and we hardly can prune anything. The only option is  to do this:

1. Read records from the first table into join buffer until it is full;
2A. Walk through the buffer and see which partitions we'll need to scan in the second table
2. Scan the needed partitions in the second table and join them with what is in the buffer. 

... which could be not worthwhile, and also I think there are more urgently needed things in partitioning.
[3 Jan 2009 20:15] Sergey Petrunya
Please note that for ref-type access it will actually work (can be checked with counters or tracing), although EXPLAIN will look like it doesn't:

mysql> explain partitions select * from tbl_co c straight_join tbl_test t on t. num = c.num where reg = 8\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 17
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: p0,p1,p2,p3
         type: ref
possible_keys: num
          key: num
      key_len: 5
          ref: sjm3.c.num
         rows: 2
        Extra: 
2 rows in set (0.01 sec)
[10 Apr 2009 14:56] shashi reddy
i am not sure if i understand this right but...

oracle seem to have a feature called partition-wise join which significantly improves the performance of joins on partitioned tables(if both tables are partitioned by the same key field)it performs the join  by spitting the join into smaller joins(partitions) thus processing them in parallel.

i tried this in mysql by doing a join on partitioned tables(with equal number of partitions) put i see no performance improvement...

thoughts?

shashi
[11 Apr 2009 12:17] Sergey Petrunya
Hi Shashi, 

Agree that scanning partitions in parallel could speed up certain queries, and joins in particular. This is not supported by MySQL at the moment, though.  There is understanding that this would be a nice feature to implement, but so far we don't commit to releasing it in any particular version.
[30 Apr 2010 22:11] Andres Chaves
Hi,
Is this bug planned to be solved in any release?

As a datawarehouse developer the partitioning feature is useful but only if it can  work with star schemas which need the join heavely.

Thanks,
[2 Aug 2010 12:03] Rajesh Sharma
Hi,

I am also facing this similar problem, is there any fixes for this issue in latest version of mysql.

If yes then please let me know the mysql version.

Thanks,

Rajesh Sharma
[21 Feb 2011 15:59] Roberto Spadim
any news?
[5 Jun 2014 9:43] AmirBehzad Eslami
We're using MySQL 5.6.17, and we have the same issue. Partition pruning does not occur under JOINs, or when the values in the WHERE clause are not provided explicitly (e.g. sub-queries).  Is there any plan to fix this?
[24 Jul 2014 19:44] Jonathan DeLanders
ditto on this feature,    parallelism is neat but not as important as getting the pruning to work even if its single threaded.

This feature would be great for datawarehouse.
[17 Feb 2017 0:09] Joe Walker
This bug still seems present in MySQL 5.7.16.
[30 May 2018 9:06] Petr Skrivanek
Do you have any news with solving this bug?
I'm using version 5.7.22 and this bug still seems present there.
[30 May 2018 12:22] Roy Lyseng
Please note that this is a feature request and not a bug: MySQL works as it is supposed to do.

Also note that some pruning may take effect, even though it is not reflected in EXPLAIN. Check this quote from 2009 (yes it is old): "Please note that for ref-type access it will actually work (can be checked with counters or tracing), although EXPLAIN will look like it doesn't".

In the future we may consider the following enhancements for A JOIN B:
- join on partition by partition (requires that both tables have equal partitioning).
- late partition pruning: Calculate active partitions for table B first after having read table A, then
  read only the necessary partitions from table B. However, this is difficult to combine with a
  nested-loops join strategy.

None of these issues are worked on currently, though.