Bug #70265 | Optimizer under-estimates rows when there are many unmatched partitions | ||
---|---|---|---|
Submitted: | 6 Sep 2013 18:10 | Modified: | 9 Sep 2013 10:24 |
Reporter: | Yoshinori Matsunobu (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Partitions | Severity: | S5 (Performance) |
Version: | 5.6.13 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[6 Sep 2013 18:10]
Yoshinori Matsunobu
[6 Sep 2013 20:44]
Yoshinori Matsunobu
The incorrect estimate was coming from ha_partition::records_in_range(). ------ min_rows_to_check= min_rows_for_estimate(); ... while ((part_id= get_biggest_used_partition(&partition_index)) != NO_CURRENT_PART_ID) { rows= m_file[part_id]->records_in_range(inx, min_key, max_key); ... estimated_rows+= rows; checked_rows+= m_file[part_id]->stats.records; if (estimated_rows && checked_rows && checked_rows >= min_rows_to_check) { DBUG_PRINT("info", ("records_in_range(inx %u): %lu (%lu * %lu / %lu)",inx, (ulong) (estimated_rows * stats.records / checked_rows), (ulong) estimated_rows, (ulong) stats.records, (ulong) checked_rows)); DBUG_RETURN(estimated_rows * stats.records / checked_rows); } ------ When I debugged, I noticed that - min_rows_to_check was small .. i.e. 3 million from 15 mil rows table - checked_rows was added by number of rows per partition .. i.e. + 1 million per partition If "if .. checked_rows >= min_rows_to_check" condition was met by just scanning a few partitions, optimizer stops reading from other partitions. This is not a problem if keys are equally distributed across partitions, but in many cases this is not true (consider an index on timestamp column). 5.1 seems to scan all necessary partitions, which can give much better estimates.
[9 Sep 2013 7:49]
Manyi Lu
This is not an optimizer bug, and it is being analysed by the developer working on partitioning.
[9 Sep 2013 10:24]
MySQL Verification Team
Hello Yoshinori, Thank you for the bug report and the test case. Verified as described. Thanks, Umesh
[10 Sep 2013 15:25]
Mattias Jonsson
Preliminary patch for mysql-5.6
Attachment: b70265.diff (application/octet-stream, text), 28.07 KiB.
[10 Sep 2013 15:51]
Mattias Jonsson
Both InnoDB and MyISAM will return at least 1 for records_in_range, which caused the partitioning handler to return a too small estimate, since it did not call records_in_range for all partitions (only the biggest ones). The uploaded preliminary patch changes from: Only call records_in_range() until the sum of the estimates is at least one and the number of rows in the called partitions is greater than total number of rows divided by log2(number of partitions). to: Call records_in_range() and until the number of partitions that return > 1 rows is larger than log2(number of partitions). I.e. If no matching rows exists in the partitions with the most rows, we will now continue calling records_in_range() until enough partitions returning higher estimates, indicating that the partition will contain the range. Notice that there are several queries that has changed execution plans in the tests! Can you test the patch and see if it works as expected?