MySQL Bugs: #70265: Optimizer under-estimates rows when there are many unmatched partitions

Bug #70265	Optimizer under-estimates rows when there are many unmatched partitions
Submitted:	6 Sep 2013 18:10	Modified:	9 Sep 2013 10:24
Reporter:	Yoshinori Matsunobu (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Partitions	Severity:	S5 (Performance)
Version:	5.6.13	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
In below cases, MySQL optimizer under-estimates rows for query plan, and may choose an inefficient index.

* Table is range partitioned (i.e p1 .. p15, partitioned by part_key)
* There is an index that does not cover partitioned key (i.e. index(c2))
* SELECT sets WHERE conditions on both a partitioned key and a non-partitioned indexed column
 (SELECT .. WHERE part_key < x AND c2 < y)
* Rows where "c2 < y" exist on only a few partitions (i.e. p1..p3)
* If x is very large (accessing many partitions), estimated rows for scan drops significantly

How to repeat:
1. Create a range partitioned table

create table tp (c1 int, c2 int, c3 int, c4 int, part_key int,
primary key (c1, part_key), index(c2)
) engine=innodb
PARTITION BY RANGE(part_key)(
PARTITION p1 VALUES LESS THAN (1000000),
PARTITION p2 VALUES LESS THAN (2000000),
PARTITION p3 VALUES LESS THAN (3000000),
PARTITION p4 VALUES LESS THAN (4000000),
PARTITION p5 VALUES LESS THAN (5000000),
PARTITION p6 VALUES LESS THAN (6000000),
PARTITION p7 VALUES LESS THAN (7000000),
PARTITION p8 VALUES LESS THAN (8000000),
PARTITION p9 VALUES LESS THAN (9000000),
PARTITION p10 VALUES LESS THAN (10000000),
PARTITION p11 VALUES LESS THAN (11000000),
PARTITION p12 VALUES LESS THAN (12000000),
PARTITION p13 VALUES LESS THAN (13000000),
PARTITION p14 VALUES LESS THAN (14000000)
);

2. Generate sample rows: 1 to 10million, and load the data into table
-----
#!/usr/bin/perl

for($i=1; $i <= 10000000; $i++) {
  print "$i,$i,$i,$i,$i\n";
}
-----

3. Run the following two queries and check query execution plans.
a. select count(*) from tp where c2 < 3000000 and part_key > 1000000 and part_key < 15000000;
b. select count(*) from tp where c2 < 3000000 and part_key > 1000000 and part_key < 2000000;

"c2 < 3000000" exists on partition p1..p3 only. Query b should be faster, but 5.6 optimizer says no.

Query plan in 5.6.13:
mysql> explain partitions select count(*) from tp where c2 < 3000000 and part_key > 1000000 and part_key < 15000000;
+----+-------------+-------+---------------------------------------------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | partitions                                  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+---------------------------------------------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | tp    | p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14 | range | c2            | c2   | 5       | NULL |   17 | Using where; Using index |
+----+-------------+-------+---------------------------------------------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> explain partitions select count(*) from tp where c2 < 3000000 and part_key > 1000000 and part_key < 2000000;
+----+-------------+-------+------------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+------------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | tp    | p2         | range | c2            | c2   | 5       | NULL | 498816 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

"17" is apparently much smaller than actual number of scanned rows.

Query plan in 5.1.69:
mysql> explain partitions select count(*) from tp where c2 < 3000000 and part_key > 1000000 and part_key < 15000000;
+----+-------------+-------+---------------------------------------------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table | partitions                                  | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+---------------------------------------------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | tp    | p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14 | range | c2            | c2   | 5       | NULL | 908668 | Using where; Using index |
+----+-------------+-------+---------------------------------------------+-------+---------------+------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

mysql>  explain partitions select count(*) from tp where c2 < 3000000 and part_key > 1000000 and part_key < 2000000;
+----+-------------+-------+------------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+------------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | tp    | p2         | range | c2            | c2   | 5       | NULL | 349448 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

The incorrect estimate was coming from ha_partition::records_in_range().
------
min_rows_to_check= min_rows_for_estimate();
...
while ((part_id= get_biggest_used_partition(&partition_index))
!= NO_CURRENT_PART_ID)
{
  rows= m_file[part_id]->records_in_range(inx, min_key, max_key);
...
  estimated_rows+= rows;
  checked_rows+= m_file[part_id]->stats.records;
  if (estimated_rows && checked_rows &&
    checked_rows >= min_rows_to_check)
  {
    DBUG_PRINT("info",
    ("records_in_range(inx %u): %lu (%lu * %lu / %lu)",inx,
    (ulong) (estimated_rows * stats.records / checked_rows),
    (ulong) estimated_rows,
    (ulong) stats.records,
    (ulong) checked_rows));
    DBUG_RETURN(estimated_rows * stats.records / checked_rows);
  }
------

When I debugged, I noticed that
- min_rows_to_check was small .. i.e. 3 million from 15 mil rows table
- checked_rows was added by number of rows per partition .. i.e. + 1 million per partition

If "if .. checked_rows >= min_rows_to_check" condition was met by just scanning a few partitions, optimizer stops reading from other partitions.
This is not a problem if keys are equally distributed across partitions, but in many cases this is not true (consider an index on timestamp column).

5.1 seems to scan all necessary partitions, which can give much better estimates.

This is not an optimizer bug, and it is being analysed by the developer working on partitioning.

Hello Yoshinori,

Thank you for the bug report and the test case. 
Verified as described.

Thanks,
Umesh

Preliminary patch for mysql-5.6

Attachment: b70265.diff (application/octet-stream, text), 28.07 KiB.

Both InnoDB and MyISAM will return at least 1 for records_in_range, which caused the partitioning handler to return a too small estimate, since it did not call records_in_range for all partitions (only the biggest ones).

The uploaded preliminary patch changes from:
Only call records_in_range() until the sum of the estimates is at least one and the number of rows in the called partitions is greater than total number of rows divided by log2(number of partitions).

to:
Call records_in_range() and until the number of partitions that return > 1 rows is larger than log2(number of partitions).

I.e.
If no matching rows exists in the partitions with the most rows, we will now continue calling records_in_range() until enough partitions returning higher estimates, indicating that the partition will contain the range.

Notice that there are several queries that has changed execution plans in the tests!

Can you test the patch and see if it works as expected?