Bug #117826 Add histogram selectivity for key ranges
Submitted: 28 Mar 17:29 Modified: 30 Mar 8:12
Reporter: Kaiwang CHen (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:9.2 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[28 Mar 17:29] Kaiwang CHen
Description:
Previously histogram selectivity supported only item expressions
with filtering effect, that is, post-scan filtering conditions.
In some cases, a key range (KEY_MULTI_RANGE), which a kind of
filter during scan, could be estimated by index statistics.

However, a key range could not be estimated by histogram, even if
the histogram would produce smaller error. Note that histogram
assumes uniformity in a single bucket while index statistics
assumes uniformity in the whole value domain.

Besides, when a key range is very big, it tends to be estimated
dramatically: the InnoDB algorithm simply multiplies the original
estimation by 2 and caps to records divided by 2. In this case,
estimating by histogram should also be better.

In short, histogram provides universal estimation, although with
some maintenance overhead. It is expected to be used in many cases.

How to repeat:
CREATE TABLE tbl_int (col1 INT, key(col1));
INSERT INTO tbl_int VALUES (1), (2), (3), (4), (4), (4), (4), (8), (NULL), (NULL);
ANALYZE TABLE tbl_int;
ANALYZE TABLE tbl_int UPDATE HISTOGRAM ON col1 WITH 10 BUCKETS;

set eq_range_index_dive_limit = 2;

-- estimation: 3 rows
-- same effect: SET range_estimation = COMPATIBLE;
EXPLAIN SELECT * FROM tbl_int WHERE col1 IN (1, 3, 4);

-- improved estimation: 6 rows
SET range_estimation = USE_STATISTICS_EQ_LIMIT;
EXPLAIN SELECT * FROM tbl_int WHERE col1 IN (1, 3, 4);

Suggested fix:
Add range estimation by statistics, specifically histograms.

See enclosed in the next comments.

With a few preliminary bug fixes (1-3 of 8) of previous efforts
which tried to reduce estimation overhead, the range estimator is
extracted, histogram selectivity for a single field range is added
on top of item-based estimation as a template, then histogram
selectivity for a key range is added. Adjusting for big ranges is
also provided.

Changset (patches in sequence, reviewed by tianfengli):

range_estimator_1_of_8_bug_117750.patch
range_estimator_2_of_8_bug_117776.patch
range_estimator_3_of_8_bug_117791.patch
range_estimator_4_of_8_extract.patch
range_estimator_5_of_8_templatize.patch
range_estimator_6_of_8_histogram_field.patch
range_estimator_7_of_8_histogram_range.patch
range_estimator_8_of_8_histogram_adjust.patch
[28 Mar 17:31] Kaiwang CHen
Bug #117750 FORCE INDEX does not skip records_in_range() for 'x IS NOT NULL'

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_1_of_8_bug_117750.patch (application/octet-stream, text), 2.43 KiB.

[28 Mar 17:32] Kaiwang CHen
Bug #117776: Index statistics is unexpectedly used to estimate eq ranges with null part

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_2_of_8_bug_117776.patch (application/octet-stream, text), 14.17 KiB.

[28 Mar 17:32] Kaiwang CHen
Bug #117791 FORCE INDEX does not skip records_in_range() for 'x IS NULL'

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_3_of_8_bug_117791.patch (application/octet-stream, text), 6.02 KiB.

[28 Mar 17:33] Kaiwang CHen
Extract single range estimator to range_estimator.cc

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_4_of_8_extract.patch (application/octet-stream, text), 20.20 KiB.

[28 Mar 17:33] Kaiwang CHen
Templatize get_selectivity_dispatcher()

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_5_of_8_templatize.patch (application/octet-stream, text), 6.29 KiB.

[28 Mar 17:34] Kaiwang CHen
Add histogram selectivity against field range.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_6_of_8_histogram_field.patch (application/octet-stream, text), 18.91 KiB.

[28 Mar 17:35] Kaiwang CHen
Add histogram selectivity for key range, and sys vars.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_7_of_8_histogram_range.patch (application/octet-stream, text), 76.03 KiB.

[28 Mar 17:35] Kaiwang CHen
Add histogram adjust.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_8_of_8_histogram_adjust.patch (application/octet-stream, text), 9.31 KiB.

[30 Mar 8:12] MySQL Verification Team
Hello Kaiwang,

Thank you for the report and contribution.

regards,
Umesh
[1 Apr 10:59] Kaiwang CHen
postfixes: keyno and range flag with next keypart

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_postfix_1_keyno_flag.patch (application/octet-stream, text), 9.36 KiB.

[8 Apr 3:58] Kaiwang CHen
Refine comment on the field range selectivity formula

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_postfix_2_comment.patch (application/octet-stream, text), 835 bytes.

[8 Apr 4:06] Kaiwang CHen
Here's the feature description in worklog format. Notice the XXX parts, which could be improved in successive WLs.

Attachment: worklog_117826.txt (text/plain), 26.79 KiB.

[8 Apr 4:39] MySQL Verification Team
Thank you, for your contribution.

Sincerely,
Umesh
[14 May 13:12] Kaiwang CHen
Postfix: Uninitialized range estimator for internal temporary tables

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_postfix_3_init.patch (application/octet-stream, text), 2.82 KiB.

[14 May 13:12] Kaiwang CHen
Postfix: Uninitialized range estimator for internal temporary tables

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: range_estimator_postfix_3_init.patch (application/octet-stream, text), 2.82 KiB.