Bug #117776 | Index statistics is unexpectedly used to estimate eq ranges with null part | ||
---|---|---|---|
Submitted: | 24 Mar 5:01 | Modified: | 24 Mar 7:46 |
Reporter: | Kaiwang CHen (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 8.0, 8.4, 9.2 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution |
[24 Mar 5:01]
Kaiwang CHen
[24 Mar 5:07]
Kaiwang CHen
Reinterprete NULL_RANGE for range estimation as the range has at least one null part "kp_i IS NULL" (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug_117776.patch (application/octet-stream, text), 14.17 KiB.
[24 Mar 5:18]
MySQL Verification Team
Hello Kaiwang, Thank you for the report and contribution. regards, Umesh
[24 Mar 7:46]
Kaiwang CHen
Besides, I found the comment key_range_flags are not consistent in their usage sites. I guess a range can be classified by three factors (1) all parts are equality, or not (2) one or more null parts ("x IS NULL"), or no null part (3) all the parts are equality and contains a full unique key, or not I would suggest (1) EQ_RANGE, (2) NULL_RANGE, (3) UNIQUE_RANGE. I guess these flags could be redefined as stated above. EQ_RANGE and NULL_RANGE is primary properties, while UNIQUE_RANGE is a combined property (EQ_RANGE | all parts are over a full unique key). The full unique key property can also be a primary property. Then, 1. an equality range over a full unique index is ((flag & UNIQUE_RANGE) == UNIQUE_RANGE), or in long form ((flag & (EQ_RANGE | UNIQUE_RANGE) == (EQ_RANGE | UNIQUE_RANGE)). Here UNIQUE_RANGE is a sub flag to EQ_RANGE. 2. a real unique range is ((flag & (UNIQUE_RANGE | NULL_RANGE)) == UNIQUE_RANGE) 3. an equality range is ((flag & EQ_RANGE) == EQ_RANGE) 4. an equality range without null part is ((flag & (EQ_RANGE | NULL_RANGE)) == EQ_RANGE ) 5. an equality range with some null part is ((flag & (EQ_RANGE | NULL_RANGE)) == (EQ_RANGE | NULL_RANGE)) 6. a non-equality range is ((flag & EQ_RANGE) == 0) 7. a non-equality range without null part is ((flag & (EQ_RANGE | NULL_RANGE)) == 0) 8. a non-equality range with some null part is ((flag & (EQ_RANGE | NULL_RANGE)) == NULL_RANGE) Then there could be a refactor to introduce some inline functions with clear names to unify the interpretations is needed. Currently they are 1. UNIQUE_RANGE: a real unique range. With new def, _RANGE removed, it is ((flag & (UNIQUE | NULL)) == UNIQUE) 2. EQ_RANGE: an equality range over a full key, with zero or more null parts. Is it really useful? 3. NULL_RANGE: an equality range over a full unique key, with one or more null part. With new def, it is ((flag & (UNIQUE | NULL)) == (UNIQUE | NULL)). /* This flag means that index is a unique index, and the interval is equivalent to "AND(keypart_i = const_i)", where all of const_i are not NULLs. */ UNIQUE_RANGE = 1 << 4, /* This flag means that the interval is equivalent to "AND(keypart_i = const_i)", where not all key parts may be used but all of const_i are not NULLs. */ EQ_RANGE = 1 << 5, /* This flag has the same meaning as UNIQUE_RANGE, except that for at least one keypart the condition is "keypart IS NULL". */ NULL_RANGE = 1 << 6,