Description:
Description
A prefix index created on a TEXT column produces incorrect results when used in range queries. Specifically, when forcing the use of a prefix index in a range scan with a comparison operator (>), the query returns fewer rows than a full table scan with the same WHERE clause.
Reproduction Steps:
Create a table with a TEXT column
Insert test data including a row with value 0xD6B22F69EFBFBF2761 (Hebrew accent ZINOR + forward slash + additional characters)
Create a prefix index on the first 2 characters: CREATE INDEX prefix_idx ON t0(c0(2))
Execute range query: SELECT COUNT(*) FROM t0 WHERE c0 > '/&'
Execute same query forcing prefix index: SELECT COUNT(*) FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&'
Problematic Row:
Hex value: D6B22F69EFBFBF2761
Display value: ֲ/i'a
First 2 bytes: D6B22F (Hebrew accent ZINOR U+05B2 + forward slash /)
This row satisfies the condition c0 > '/&' but is incorrectly filtered out when the prefix index is used.
Expected Result
Both queries should return the same row count:
Full table scan: 18 rows
Force prefix index: 18 rows
The row with hex value D6B22F69EFBFBF2761 should be included in both result sets, as it satisfies the condition c0 > '/&'.
Actual Result
The queries return different row counts:
Full table scan: 18 rows ✓ (correct)
Force prefix index: 17 rows ✗ (incorrect - missing 1 row)
Missing row:
c0: ֲ/i'a
HEX: D6B22F69EFBFBF2761
The EXPLAIN output shows the optimizer uses an "Index range scan" with the condition (unprintable_blob_value <= c0), suggesting the prefix index range boundaries are incorrectly calculated for multi-byte character sequences, causing the row to be excluded from the scan range.
How to repeat:
Reproduction Steps
-- Minimal reproduction of MySQL 9.6.0 prefix index range scan bug
-- Bug: prefix index on TEXT column causes range scan to miss qualifying rows
-- Affected: MySQL 9.6.0
-- Symptom: SELECT with FORCE INDEX on prefix index returns fewer rows than full table scan
DROP DATABASE IF EXISTS min_bug_repro;
CREATE DATABASE min_bug_repro;
USE min_bug_repro;
CREATE TABLE t0(c0 TEXT);
-- The problematic row value: ֲ/i'a (HEX: D6B22F69EFBFBF2761)
-- First 2 chars: ֲ/ (HEX: D6B22F) - Hebrew accent ZINOR + forward slash
-- This is the row that gets MISSED by the prefix index range scan
INSERT INTO t0(c0) VALUES (0xD6B22F69EFBFBF2761);
-- Other rows that are correctly returned by both scans
INSERT INTO t0(c0) VALUES ('0'), ('1'), ('A'), ('B'), ('Z'), ('a'), ('b'), ('z');
INSERT INTO t0(c0) VALUES ('/0'), ('/1'), ('/A'), ('/Z'), ('/a'), ('/z');
INSERT INTO t0(c0) VALUES ('0.5'), ('hello'), ('world');
-- Create a prefix index on first 2 characters of c0
CREATE INDEX prefix_idx ON t0(c0(2));
ANALYZE TABLE t0;
-- ========================================
-- Query test: c0 > '/&'
-- ========================================
-- Query 1: Full table scan (correct result)
SELECT '=== Full table scan (correct) ===' AS test;
SELECT COUNT(*) AS full_scan_count FROM t0 WHERE c0 > '/&';
-- Query 2: Force prefix index (BUG - returns fewer rows!)
SELECT '=== Force prefix index (BUG!) ===' AS test;
SELECT COUNT(*) AS force_index_count FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&';
-- Show the missing row
SELECT '=== Missing row when using prefix index ===' AS test;
(SELECT c0, HEX(c0) FROM t0 WHERE c0 > '/&')
EXCEPT
(SELECT c0, HEX(c0) FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&');
-- Show execution plans
SELECT '=== EXPLAIN: full scan ===' AS test;
EXPLAIN SELECT * FROM t0 WHERE c0 > '/&';
SELECT '=== EXPLAIN: force index ===' AS test;
EXPLAIN SELECT * FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&';
-- Cleanup
-- DROP DATABASE IF EXISTS min_bug_repro;
Suggested fix:
Root Cause Hypothesis:
The prefix index appears to incorrectly handle range boundaries when the indexed prefix contains multi-byte UTF-8 characters (specifically the Hebrew accent character U+05B2). The range scan logic may be using byte-level comparison on the prefix rather than proper character-level comparison, causing rows with certain multi-byte character combinations to fall outside the computed scan range.
Description: Description A prefix index created on a TEXT column produces incorrect results when used in range queries. Specifically, when forcing the use of a prefix index in a range scan with a comparison operator (>), the query returns fewer rows than a full table scan with the same WHERE clause. Reproduction Steps: Create a table with a TEXT column Insert test data including a row with value 0xD6B22F69EFBFBF2761 (Hebrew accent ZINOR + forward slash + additional characters) Create a prefix index on the first 2 characters: CREATE INDEX prefix_idx ON t0(c0(2)) Execute range query: SELECT COUNT(*) FROM t0 WHERE c0 > '/&' Execute same query forcing prefix index: SELECT COUNT(*) FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&' Problematic Row: Hex value: D6B22F69EFBFBF2761 Display value: ֲ/i'a First 2 bytes: D6B22F (Hebrew accent ZINOR U+05B2 + forward slash /) This row satisfies the condition c0 > '/&' but is incorrectly filtered out when the prefix index is used. Expected Result Both queries should return the same row count: Full table scan: 18 rows Force prefix index: 18 rows The row with hex value D6B22F69EFBFBF2761 should be included in both result sets, as it satisfies the condition c0 > '/&'. Actual Result The queries return different row counts: Full table scan: 18 rows ✓ (correct) Force prefix index: 17 rows ✗ (incorrect - missing 1 row) Missing row: c0: ֲ/i'a HEX: D6B22F69EFBFBF2761 The EXPLAIN output shows the optimizer uses an "Index range scan" with the condition (unprintable_blob_value <= c0), suggesting the prefix index range boundaries are incorrectly calculated for multi-byte character sequences, causing the row to be excluded from the scan range. How to repeat: Reproduction Steps -- Minimal reproduction of MySQL 9.6.0 prefix index range scan bug -- Bug: prefix index on TEXT column causes range scan to miss qualifying rows -- Affected: MySQL 9.6.0 -- Symptom: SELECT with FORCE INDEX on prefix index returns fewer rows than full table scan DROP DATABASE IF EXISTS min_bug_repro; CREATE DATABASE min_bug_repro; USE min_bug_repro; CREATE TABLE t0(c0 TEXT); -- The problematic row value: ֲ/i'a (HEX: D6B22F69EFBFBF2761) -- First 2 chars: ֲ/ (HEX: D6B22F) - Hebrew accent ZINOR + forward slash -- This is the row that gets MISSED by the prefix index range scan INSERT INTO t0(c0) VALUES (0xD6B22F69EFBFBF2761); -- Other rows that are correctly returned by both scans INSERT INTO t0(c0) VALUES ('0'), ('1'), ('A'), ('B'), ('Z'), ('a'), ('b'), ('z'); INSERT INTO t0(c0) VALUES ('/0'), ('/1'), ('/A'), ('/Z'), ('/a'), ('/z'); INSERT INTO t0(c0) VALUES ('0.5'), ('hello'), ('world'); -- Create a prefix index on first 2 characters of c0 CREATE INDEX prefix_idx ON t0(c0(2)); ANALYZE TABLE t0; -- ======================================== -- Query test: c0 > '/&' -- ======================================== -- Query 1: Full table scan (correct result) SELECT '=== Full table scan (correct) ===' AS test; SELECT COUNT(*) AS full_scan_count FROM t0 WHERE c0 > '/&'; -- Query 2: Force prefix index (BUG - returns fewer rows!) SELECT '=== Force prefix index (BUG!) ===' AS test; SELECT COUNT(*) AS force_index_count FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&'; -- Show the missing row SELECT '=== Missing row when using prefix index ===' AS test; (SELECT c0, HEX(c0) FROM t0 WHERE c0 > '/&') EXCEPT (SELECT c0, HEX(c0) FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&'); -- Show execution plans SELECT '=== EXPLAIN: full scan ===' AS test; EXPLAIN SELECT * FROM t0 WHERE c0 > '/&'; SELECT '=== EXPLAIN: force index ===' AS test; EXPLAIN SELECT * FROM t0 FORCE INDEX(prefix_idx) WHERE c0 > '/&'; -- Cleanup -- DROP DATABASE IF EXISTS min_bug_repro; Suggested fix: Root Cause Hypothesis: The prefix index appears to incorrectly handle range boundaries when the indexed prefix contains multi-byte UTF-8 characters (specifically the Hebrew accent character U+05B2). The range scan logic may be using byte-level comparison on the prefix rather than proper character-level comparison, causing rows with certain multi-byte character combinations to fall outside the computed scan range.