Bug #118009 | SELECT query on primary key column deteriorates to full table scan when filter involves too long string | ||
---|---|---|---|
Submitted: | 18 Apr 7:13 | Modified: | 18 Apr 11:31 |
Reporter: | Pawel Kudzia | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 8.4.5, 9.3.0 | OS: | Debian (12) |
Assigned to: | CPU Architecture: | x86 (amd64) | |
Tags: | regression |
[18 Apr 7:13]
Pawel Kudzia
[18 Apr 11:31]
MySQL Verification Team
Hello Pawel Kudzia, Thank you for the report and test case. Verified as described. regards, Umesh
[20 May 12:12]
Knut Anders Hatlen
This is probably a consequence of the fix for Bug#35169384. The problem fixed by that bug is that you cannot really assume, in general, that a matching string is not found in the table just because the string literal is longer than the declared column type. Because collations can make strings of different lengths match. So the code that made that assumption was removed from the range optimizer, and this made it stop simplifying c IN ('012345678901234567', 'a') to c IN ('a'), as this simplification could lead to wrong results. For example, with the table in the bug description, you can see this if you add a new row like this: INSERT INTO t VALUES ('0⑫34567890⑫34567'); This value has only 16 characters, but it matches the 18 character long string literal '012345678901234567' in the default collation, because '⑫'='12'. With this row in the table, the query in the bug description, SELECT * FROM t WHERE c IN ( '012345678901234567','a' ), returned wrong results before MySQL 8.4 and correct results after. Before 8.4, it returns wrong results when the index is used and correct results when the index is not used: mysql> SELECT * FROM t WHERE c IN ( '012345678901234567','a' ); +---+ | c | +---+ | a | +---+ 1 row in set mysql> SELECT /*+ NO_INDEX(t) */ * FROM t WHERE c IN ( '012345678901234567','a' ); +----------------------+ | c | +----------------------+ | 0⑫34567890⑫34567 | | a | +----------------------+ 2 rows in set Starting with 8.4, it returns correct results even without the NO_INDEX hint: mysql> SELECT * FROM t WHERE c IN ( '012345678901234567','a' ); +----------------------+ | c | +----------------------+ | 0⑫34567890⑫34567 | | a | +----------------------+ 2 rows in set The bug fix only fixed the issue in the range optimizer, so when looking for a single value, the result is wrong both before and after the bug fix when an index lookup is used: mysql> SELECT * FROM t WHERE c IN ( '012345678901234567' ); Empty set mysql> SELECT /*+ NO_INDEX(t) */ * FROM t WHERE c IN ( '012345678901234567' ); +----------------------+ | c | +----------------------+ | 0⑫34567890⑫34567 | +----------------------+ 1 row in set So I think there are two issues here: 1) Index lookups perform an optimization which is not correct and can give wrong results. 2) Even though '012345678901234567' cannot safely be removed from the IN list, it's unfortunate that the range optimizer has to fall back to a full scan when it doesn't remove it. It should be possible to perform index lookups or index range scans for a value that is longer than the column type.