MySQL Bugs: #31870: index range search in partition

Bug #31870	index range search in partition
Submitted:	26 Oct 2007 8:54	Modified:	23 Nov 2007 17:10
Reporter:	Carrie Chen	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Partitions	Severity:	S3 (Non-critical)
Version:	5.1.22-rc	OS:	Any
Assigned to:	Mikael Ronström	CPU Architecture:	Any
Tags:	bfsm_2007_11_15

Description:
I had a test of partition. It seems that optimizer doesn't do range search on the index that is used for partition.

Is it a bug?

How to repeat:
CREATE TABLE part_date
    (  c1 int default NULL,
     c2 varchar(10) default NULL,
     c3 date default NULL) engine=myisam
      partition by range (to_days(c3))
    (PARTITION p0 VALUES LESS THAN (to_days('2007-07-01')),
    PARTITION p1 VALUES LESS THAN (to_days('2007-08-01')) ,
    PARTITION p2 VALUES LESS THAN (to_days('2007-09-01')) ,
    PARTITION p3 VALUES LESS THAN MAXVALUE);

insert into index_date values(1,'chen','2007-07-03'),
(2,'aaa','2007-08-04'),
(43,'fdd','2007-08-09'),
(2,'aaa','2007-09-04'),
(2,'aaa','2007-09-08'),
(2,'aaa','2007-06-04'),
(21,'aaa','2007-06-04'),
(22,'aaad','2007-08-04'),
(23,'aadfa','2007-08-04'),
(24,'aaa','2007-08-04'),
(25,'aafda','2007-08-04'),
(26,'aaa','2007-10-04'),
(23,'aaa','2007-10-02'),
(12,'aaa','2007-11-04'),
(32,'aaa','2007-12-04'),
(112,'aaa','2007-08-22'),
(29,'aaa','2007-08-13'),
(31,'aaa','2007-08-14'),
(122,'aaa','2007-07-24');    

explain select count(*) from part_date where c3 between '2007-08-01' and '2007-10-01'\G;

alter table part_date add index (c3);

explain select count(*) from part_date where c3 between '2007-08-01' and '2007-10-01'\G;

/* The two results above should be the same*/

create table date2 select * from part_date;
alter table date2 add index (c3);
explain select count(*) from date2 where c3 between '2007-08-01' and '2007-10-01'\G;

Thank you for the report.

Verified as described.

Hi,

1> A partitioned table without index.
CREATE TABLE part_date
    (  c1 int default NULL,
       c2 varchar(10) default NULL,
       c3 date default NULL) engine=myisam
           partition by range (to_days(c3))
         (PARTITION p0 VALUES LESS THAN (to_days('2007-07-01')),
         PARTITION p1 VALUES LESS THAN (to_days('2007-08-01')) ,
        PARTITION p2 VALUES LESS THAN (to_days('2007-09-01')) ,
         PARTITION p3 VALUES LESS THAN MAXVALUE);

mysql> explain select count(*) from part_date where c3 between '2007-08-01' and '2007-10-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: part_date
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 15
        Extra: Using where
1 row in set (0.01 sec)

It's reasonable.

2> Add index on column that is used to partition.
mysql> alter table part_date add index (c3);

mysql> explain select count(*) from part_date where c3 between '2007-08-01' and '2007-10-01'\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: part_date
         type: index
possible_keys: c3
          key: c3
      key_len: 4
          ref: NULL
         rows: 15
        Extra: Using where; Using index
1 row in set (0.01 sec)

We can see that MySQL uses index SCAN rather than index RANGE.  I suppose MySQL would use index range to search a partitioned table as to a non-partitioned table.

3> a non-partitioned table with index
mysql> create table index_date select * from part_date;
mysql> alter table index_date add index (c3);

mysql> explain select count(*) from index_date where c3 between '2007-08-01' and '2007-10-01'\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: index_date
         type: range
possible_keys: c3
          key: c3
      key_len: 4
          ref: NULL
         rows: 10
        Extra: Using where; Using index
1 row in set (0.00 sec)

Sorry, but both "range" and "index" access methods are not very different when you select 11 rows out of 19 total in table. Optimizer can choose "range" as well, as soon as estimated number of rows will be smaller. Compare:

mysql> explain select count(*) from part_date where c3 between '2007-08-01' and
'2007-10-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: part_date
         type: index
possible_keys: c3
          key: c3
      key_len: 4
          ref: NULL
         rows: 15
        Extra: Using where; Using index
1 row in set (0.02 sec)

mysql> explain select count(*) from part_date where c3 between '2007-08-01' and
'2007-08-03'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: part_date
         type: range
possible_keys: c3
          key: c3
      key_len: 4
          ref: NULL
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

Current optimizer just can not estimate number of rows more accurately. So, this is NOT a bug in optimizer, I think.

From the other side, plan IS different ("range" is used) for similar bug not partitioned table:

mysql> create table np_date as select * from part_date;
Query OK, 19 rows affected (0.59 sec)
Records: 19  Duplicates: 0  Warnings: 0

mysql> alter table np_date add index (c3);
Query OK, 19 rows affected (0.42 sec)
Records: 19  Duplicates: 0  Warnings: 0

mysql> explain select count(*) from np_date where c3 between '2007-08-01' and '2
007-10-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: np_date
         type: range
possible_keys: c3
          key: c3
      key_len: 4
          ref: NULL
         rows: 9
        Extra: Using where; Using index
1 row in set (0.01 sec)

This, the fact that optimizer may choose different access path for partitioned table with the same data (because of different algorythm for calculating number of rows or for whatever reason) should be at least documented. So, if not optimizer bug, this is at least a reasonable documentation request.

There is nothing to document here since the two queries are not seeing the same
data due to partition pruning.

The query only uses partition p2 and p3 and thus the original query selected 11 out of
19 and the partitioned query selects 11 out of 11 records in p2 and p3.

Closing as !Bug per Mikael's comments above.