Bug #118009 SELECT query on primary key column deteriorates to full table scan when filter involves too long string
Submitted: 18 Apr 7:13 Modified: 18 Apr 11:31
Reporter: Pawel Kudzia Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.4.5, 9.3.0 OS:Debian (12)
Assigned to: CPU Architecture:x86 (amd64)
Tags: regression

[18 Apr 7:13] Pawel Kudzia
Description:
SELECT query, that should use primary index, deteriorates to a full table scan when filtering part of WHERE refers in IN to 2 or more values, where at at least one of them is longer than defined length of the column.

How to repeat:
MySQL 8.4.5 with default settings, installed from Oracle's MySQL APT repository.

CREATE TABLE t ( c varchar(16) NOT NULL , PRIMARY KEY (c) );

some test data:

INSERT INTO t(c) VALUES('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u');

INSERT IGNORE INTO t(c) SELECT SUBSTR(MD5(CONCAT(t1.c, ROW_NUMBER() OVER ( PARTITION BY 'x')) ),1,16) FROM t t1 JOIN t t2 ORDER BY t1.c;
INSERT IGNORE INTO t(c) SELECT SUBSTR(MD5(CONCAT(t1.c, ROW_NUMBER() OVER ( PARTITION BY 'x')) ),1,16) FROM t t1 JOIN t t2 ORDER BY t1.c;

this query works as expected, there's no full table scan:

mysql> EXPLAIN SELECT * FROM t WHERE c IN ( '012345678901234567');
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | no matching row in const table |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+

but adding one more value into the IN list, even short one - leads to full table scan:

mysql> explain SELECT * FROM t WHERE c IN ( '012345678901234567','a' );
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | t     | NULL       | index | PRIMARY       | PRIMARY | 66      | NULL | 223855 |    20.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+

Suggested fix:
in the ideal world - query planner should make use of the index even if some of the values in IN list have length longer than maximum allowed column length; for them there'll be no matching row.

thank you!
[18 Apr 11:31] MySQL Verification Team
Hello Pawel Kudzia,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[20 May 12:12] Knut Anders Hatlen
This is probably a consequence of the fix for Bug#35169384. The problem fixed by that bug is that you cannot really assume, in general, that a matching string is not found in the table just because the string literal is longer than the declared column type. Because collations can make strings of different lengths match. So the code that made that assumption was removed from the range optimizer, and this made it stop simplifying c IN ('012345678901234567', 'a') to c IN ('a'), as this simplification could lead to wrong results.

For example, with the table in the bug description, you can see this if you add a new row like this:

INSERT INTO t VALUES ('0⑫34567890⑫34567');

This value has only 16 characters, but it matches the 18 character long string literal '012345678901234567' in the default collation, because '⑫'='12'.

With this row in the table, the query in the bug description, SELECT * FROM t WHERE c IN ( '012345678901234567','a' ), returned wrong results before MySQL 8.4 and correct results after.

Before 8.4, it returns wrong results when the index is used and correct results when the index is not used:

mysql> SELECT * FROM t WHERE c IN ( '012345678901234567','a' );
+---+
| c |
+---+
| a |
+---+
1 row in set

mysql> SELECT /*+ NO_INDEX(t) */ * FROM t WHERE c IN ( '012345678901234567','a' );
+----------------------+
| c                    |
+----------------------+
| 0⑫34567890⑫34567     |
| a                    |
+----------------------+
2 rows in set

Starting with 8.4, it returns correct results even without the NO_INDEX hint:

mysql> SELECT * FROM t WHERE c IN ( '012345678901234567','a' );
+----------------------+
| c                    |
+----------------------+
| 0⑫34567890⑫34567     |
| a                    |
+----------------------+
2 rows in set

The bug fix only fixed the issue in the range optimizer, so when looking for a single value, the result is wrong both before and after the bug fix when an index lookup is used:

mysql> SELECT * FROM t WHERE c IN ( '012345678901234567' );
Empty set

mysql> SELECT /*+ NO_INDEX(t) */ * FROM t WHERE c IN ( '012345678901234567' );
+----------------------+
| c                    |
+----------------------+
| 0⑫34567890⑫34567     |
+----------------------+
1 row in set

So I think there are two issues here:

1) Index lookups perform an optimization which is not correct and can give wrong results.

2) Even though '012345678901234567' cannot safely be removed from the IN list, it's unfortunate that the range optimizer has to fall back to a full scan when it doesn't remove it. It should be possible to perform index lookups or index range scans for a value that is longer than the column type.