MySQL Bugs: #118009: SELECT query on primary key column deteriorates to full table scan when filter involves too long string

Bug #118009	SELECT query on primary key column deteriorates to full table scan when filter involves too long string
Submitted:	18 Apr 2025 7:13	Modified:	18 Apr 2025 11:31
Reporter:	Pawel Kudzia	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	8.4.5, 9.3.0	OS:	Debian (12)
Assigned to:		CPU Architecture:	x86 (amd64)
Tags:	regression

Description:
SELECT query, that should use primary index, deteriorates to a full table scan when filtering part of WHERE refers in IN to 2 or more values, where at at least one of them is longer than defined length of the column.

How to repeat:
MySQL 8.4.5 with default settings, installed from Oracle's MySQL APT repository.

CREATE TABLE t ( c varchar(16) NOT NULL , PRIMARY KEY (c) );

some test data:

INSERT INTO t(c) VALUES('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u');

INSERT IGNORE INTO t(c) SELECT SUBSTR(MD5(CONCAT(t1.c, ROW_NUMBER() OVER ( PARTITION BY 'x')) ),1,16) FROM t t1 JOIN t t2 ORDER BY t1.c;
INSERT IGNORE INTO t(c) SELECT SUBSTR(MD5(CONCAT(t1.c, ROW_NUMBER() OVER ( PARTITION BY 'x')) ),1,16) FROM t t1 JOIN t t2 ORDER BY t1.c;

this query works as expected, there's no full table scan:

mysql> EXPLAIN SELECT * FROM t WHERE c IN ( '012345678901234567');
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | no matching row in const table |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+

but adding one more value into the IN list, even short one - leads to full table scan:

mysql> explain SELECT * FROM t WHERE c IN ( '012345678901234567','a' );
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | t     | NULL       | index | PRIMARY       | PRIMARY | 66      | NULL | 223855 |    20.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+

Suggested fix:
in the ideal world - query planner should make use of the index even if some of the values in IN list have length longer than maximum allowed column length; for them there'll be no matching row.

thank you!

Hello Pawel Kudzia,

Thank you for the report and test case.
Verified as described.

regards,
Umesh

This is probably a consequence of the fix for Bug#35169384. The problem fixed by that bug is that you cannot really assume, in general, that a matching string is not found in the table just because the string literal is longer than the declared column type. Because collations can make strings of different lengths match. So the code that made that assumption was removed from the range optimizer, and this made it stop simplifying c IN ('012345678901234567', 'a') to c IN ('a'), as this simplification could lead to wrong results.

For example, with the table in the bug description, you can see this if you add a new row like this:

INSERT INTO t VALUES ('0⑫34567890⑫34567');

This value has only 16 characters, but it matches the 18 character long string literal '012345678901234567' in the default collation, because '⑫'='12'.

With this row in the table, the query in the bug description, SELECT * FROM t WHERE c IN ( '012345678901234567','a' ), returned wrong results before MySQL 8.4 and correct results after.

Before 8.4, it returns wrong results when the index is used and correct results when the index is not used:

mysql> SELECT * FROM t WHERE c IN ( '012345678901234567','a' );
+---+
| c |
+---+
| a |
+---+
1 row in set

mysql> SELECT /*+ NO_INDEX(t) */ * FROM t WHERE c IN ( '012345678901234567','a' );
+----------------------+
| c                    |
+----------------------+
| 0⑫34567890⑫34567     |
| a                    |
+----------------------+
2 rows in set

Starting with 8.4, it returns correct results even without the NO_INDEX hint:

mysql> SELECT * FROM t WHERE c IN ( '012345678901234567','a' );
+----------------------+
| c                    |
+----------------------+
| 0⑫34567890⑫34567     |
| a                    |
+----------------------+
2 rows in set

The bug fix only fixed the issue in the range optimizer, so when looking for a single value, the result is wrong both before and after the bug fix when an index lookup is used:

mysql> SELECT * FROM t WHERE c IN ( '012345678901234567' );
Empty set

mysql> SELECT /*+ NO_INDEX(t) */ * FROM t WHERE c IN ( '012345678901234567' );
+----------------------+
| c                    |
+----------------------+
| 0⑫34567890⑫34567     |
+----------------------+
1 row in set

So I think there are two issues here:

1) Index lookups perform an optimization which is not correct and can give wrong results.

2) Even though '012345678901234567' cannot safely be removed from the IN list, it's unfortunate that the range optimizer has to fall back to a full scan when it doesn't remove it. It should be possible to perform index lookups or index range scans for a value that is longer than the column type.

Bug#118486 and bug#119770 have been closed as duplicates of this bug. Bug#119770 shows that the problem also affects prefix indexes.

Patch for Bug#118009, Bug#118486, Bug#119770

Attachment: bug118009.patch (application/octet-stream, text), 24.07 KiB.

Patch attached: bug118009.patch (against mysql-server 8.4 trunk)

This patch fixes Bug#118009, Bug#118486, and Bug#119770 — all share the same root cause.

Root cause: Commit 6ae1d0216c4 (Bug#35169384) made
save_value_and_handle_conversion() return "always true" for
TYPE_WARN_TRUNCATED on non-binary strnxfrm collations (e.g.
utf8mb4_0900_ai_ci). This is correct for range predicates (<=, >=,
BETWEEN) but breaks IN() lists: tree_or(valid_tree, nullptr) returns
nullptr, so a single truncated value destroys all valid ranges and
forces a full table scan.

Affected: All utf8mb4/utf8mb3 non-binary collations (e.g. _ai_ci,
_as_cs). Binary collations and latin1 are not affected.

The patch has two fixes:

Fix 1 — get_mm_leaf (Bug#119770): For EQ_FUNC on a prefix key
(HA_PART_KEY_SEG), the truncated value is a valid prefix lookup key.
Don't bail out — create a point range with inexact=true so the WHERE
filter rechecks full equality.

Fix 2 — get_func_mm_tree_from_in_predicate (Bug#118009/118486): When
get_mm_parts() returns nullptr for an IN() value that is a string
exceeding the column's character capacity, skip it and continue. No row
can ever match such a value. The WHERE filter rechecks the full IN()
condition.

Example — Bug#118009 (value exceeds column length):

  CREATE TABLE t1 (pk VARCHAR(16) PRIMARY KEY)
    CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
  INSERT INTO t1 VALUES ('abc'), ('def');

  -- 'this_is_way_too_long...' exceeds VARCHAR(16)
  EXPLAIN SELECT * FROM t1
    WHERE pk IN ('abc', 'this_is_way_too_long_for_varchar16', 'def');
  -- Before fix: type=ALL  (full table scan)
  -- After fix:  type=range (index range scan)

Example — Bug#119770 (value exceeds prefix index length):

  CREATE TABLE t2 (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(64) NOT NULL,
    KEY idx_prefix (name(10))
  ) CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
  INSERT INTO t2 (name) VALUES ('short'), ('another'),
    ('abcdefghijklmnopqrstuvwxyz0123456789');

  -- Values fit VARCHAR(64) but exceed prefix(10)
  EXPLAIN SELECT * FROM t2
    WHERE name IN ('short', 'abcdefghijklmnopqrstuvwxyz0123456789', 'another');
  -- Before fix: type=ALL  (full table scan)
  -- After fix:  type=range (index range scan on idx_prefix)

Test cases included covering both bugs, binary collation non-regression,
nullable columns, prefix indexes, and Bug#35169384 non-regression.

Branch with the fix:
https://github.com/yakirgb/percona-server/tree/bug118009-range-in-oversized-value

Thank you,
Yakir Gibraltar

Patch for Bug#118009, Bug#118486, Bug#119770, Bug#119867 (combined)

Attachment: bug118009_combined.patch (application/octet-stream, text), 29.56 KiB.

clang-format fix

Attachment: bug118009_combined.patch (application/octet-stream, text), 29.56 KiB.

More fixes

Attachment: bug118009_combined.patch (application/octet-stream, text), 28.05 KiB.

Bug#119867 is marked as a duplicate of this one.