MySQL Bugs: #120350: Prefix index on TEXT column causes range scan to miss qualifying rows in MySQL

Bug #120350	Prefix index on TEXT column causes range scan to miss qualifying rows in MySQL
Submitted:	27 Apr 8:35	Modified:	28 Apr 4:31
Reporter:	Li Zeyan	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	9.6.0	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
Description

A prefix index created on a TEXT column produces incorrect results when used in range queries. Specifically, when forcing the use of a prefix index in a range scan with a comparison operator (>), the query returns fewer rows than a full table scan with the same WHERE clause.

Reproduction Steps:

Create a table with a TEXT column
Insert test data including a row with value 0xD6B22F69EFBFBF2761 (Hebrew accent ZINOR + forward slash + additional characters)
Create a prefix index on the first 2 characters: CREATE INDEX prefix_idx ON t0(c0(2))
Execute range query: SELECT COUNT(*) FROM t0 WHERE c0 > '/&'
Execute same query forcing prefix index: SELECT COUNT(*) FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&'

Problematic Row:

Hex value: D6B22F69EFBFBF2761
Display value: ֲ/i'a
First 2 bytes: D6B22F (Hebrew accent ZINOR U+05B2 + forward slash /)
This row satisfies the condition c0 > '/&' but is incorrectly filtered out when the prefix index is used.

Expected Result
Both queries should return the same row count:
   Full table scan:        18 rows
   Force prefix index:     18 rows
The row with hex value D6B22F69EFBFBF2761 should be included in both result sets, as it satisfies the condition c0 > '/&'.

Actual Result
The queries return different row counts:
  Full table scan:        18 rows  ✓ (correct)
  Force prefix index:     17 rows  ✗ (incorrect - missing 1 row)

Missing row:
c0: ֲ/i'a
HEX: D6B22F69EFBFBF2761

The EXPLAIN output shows the optimizer uses an "Index range scan" with the condition (unprintable_blob_value <= c0), suggesting the prefix index range boundaries are incorrectly calculated for multi-byte character sequences, causing the row to be excluded from the scan range.

How to repeat:
Reproduction Steps

-- Minimal reproduction of MySQL 9.6.0 prefix index range scan bug
-- Bug: prefix index on TEXT column causes range scan to miss qualifying rows
-- Affected: MySQL 9.6.0
-- Symptom: SELECT with FORCE INDEX on prefix index returns fewer rows than full table scan

DROP DATABASE IF EXISTS min_bug_repro;
CREATE DATABASE min_bug_repro;
USE min_bug_repro;

CREATE TABLE t0(c0 TEXT);

-- The problematic row value: ֲ/i'a (HEX: D6B22F69EFBFBF2761)
-- First 2 chars: ֲ/ (HEX: D6B22F) - Hebrew accent ZINOR + forward slash
-- This is the row that gets MISSED by the prefix index range scan
INSERT INTO t0(c0) VALUES (0xD6B22F69EFBFBF2761);

-- Other rows that are correctly returned by both scans
INSERT INTO t0(c0) VALUES ('0'), ('1'), ('A'), ('B'), ('Z'), ('a'), ('b'), ('z');
INSERT INTO t0(c0) VALUES ('/0'), ('/1'), ('/A'), ('/Z'), ('/a'), ('/z');
INSERT INTO t0(c0) VALUES ('0.5'), ('hello'), ('world');

-- Create a prefix index on first 2 characters of c0
CREATE INDEX prefix_idx ON t0(c0(2));
ANALYZE TABLE t0;

-- ========================================
-- Query test: c0 > '/&'
-- ========================================

-- Query 1: Full table scan (correct result)
SELECT '=== Full table scan (correct) ===' AS test;
SELECT COUNT(*) AS full_scan_count FROM t0 WHERE c0 > '/&';

-- Query 2: Force prefix index (BUG - returns fewer rows!)
SELECT '=== Force prefix index (BUG!) ===' AS test;
SELECT COUNT(*) AS force_index_count FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&';

-- Show the missing row
SELECT '=== Missing row when using prefix index ===' AS test;
(SELECT c0, HEX(c0) FROM t0 WHERE c0 > '/&')
EXCEPT
(SELECT c0, HEX(c0) FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&');

-- Show execution plans
SELECT '=== EXPLAIN: full scan ===' AS test;
EXPLAIN SELECT * FROM t0 WHERE c0 > '/&';

SELECT '=== EXPLAIN: force index ===' AS test;
EXPLAIN SELECT * FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&';

-- Cleanup
-- DROP DATABASE IF EXISTS min_bug_repro;

Suggested fix:
Root Cause Hypothesis: 

The prefix index appears to incorrectly handle range boundaries when the indexed prefix contains multi-byte UTF-8 characters (specifically the Hebrew accent character U+05B2). The range scan logic may be using byte-level comparison on the prefix rather than proper character-level comparison, causing rows with certain multi-byte character combinations to fall outside the computed scan range.

Hi Li Zeyan,

Thank you for the test case. Verified as described.