Bug #120352 Prefix index with length 1 causes incorrect "Zero rows" optimization with float overflow literal in range condition
Submitted: 27 Apr 9:34 Modified: 28 Apr 4:26
Reporter: Li Zeyan Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:9.6.0 OS:Linux
Assigned to: CPU Architecture:Any

[27 Apr 9:34] Li Zeyan
Description:
When a MEDIUMTEXT column has a prefix index of length 1 (e.g., c0(1)), and a range query uses a float overflow literal like '-1e500' in the WHERE clause (e.g., c0 >= '-1e500' AND c0 <= '᪾JwQ ['), the optimizer incorrectly concludes "Zero rows (no matching row in const table)" and returns 0 results, even though matching rows exist.

The bug appears to be related to:
1. Prefix index length (length 1 triggers the bug, length 2+ may work correctly)
2. Float overflow literal '-1e500' in string comparison context
3. Optimizer's range analysis on prefix indexes

The issue does NOT occur when:
- Using IGNORE INDEX to force full table scan
- Using prefix index with length >= 2 (in some cases)
- Removing the prefix index entirely

Expected Result

Query: SELECT COUNT(*) FROM t0 WHERE c0 >= '-1e500' AND c0 <= '᪾JwQ [';

Expected count: 12
The query should return all rows where the string value c0 falls within the specified 
range according to string collation rules. The ground truth query using CASE WHEN 
(which bypasses index optimization) correctly returns 12 matching rows.

EXPLAIN should show either:
- Index range scan with correct row estimate, OR
- Full table scan if index cannot be used effectively
It should NOT show "Zero rows (no matching row in const table)" when matching rows exist.

Actual Result

Query: SELECT COUNT(*) FROM t0 WHERE c0 >= '-1e500' AND c0 <= '᪾JwQ [';

Actual count: 0 (INCORRECT)
EXPLAIN output:
+--------------------------------------------------------------------+
| EXPLAIN                                                            |
+--------------------------------------------------------------------+
| -> Zero rows (no matching row in const table)  (cost=0..0 rows=0) |
+--------------------------------------------------------------------+

The optimizer incorrectly determines that no rows match the condition and returns 
an empty result set, despite 12 rows actually satisfying the WHERE clause.

Workaround: Using IGNORE INDEX(prefix_idx) returns the correct count of 12 rows.
Pattern observed:
- Prefix index c0(1): Returns 0 rows (BUG)
- Prefix index c0(2): Returns 12 rows (correct in this test)
- Prefix index c0(3): Returns 12 rows (correct)
- No index / IGNORE INDEX: Returns 12 rows (correct)

How to repeat:
DROP DATABASE IF EXISTS min_overflow_bug;
CREATE DATABASE min_overflow_bug;
USE min_overflow_bug;

CREATE TABLE t0(c0 MEDIUMTEXT);

-- Insert data: numeric-like strings that satisfy c0 >= '-1e500' (string comparison)
INSERT INTO t0(c0) VALUES ('0.5'), ('123'), ('abc'), ('-2117222092'), ('hello'), ('Cv');
INSERT INTO t0(c0) VALUES ('*#'), ('!b'), ('0I'), ('1*tE'), ('4Ti}k'), ('5(2*07');
-- Insert data that does NOT satisfy the condition
INSERT INTO t0(c0) VALUES ('wp'), ('x&'), ('jx+');

-- Create prefix index with length 1 (triggers the bug)
CREATE INDEX prefix_idx ON t0(c0(1));
ANALYZE TABLE t0;

-- ========================================
-- Query test: c0 >= '-1e500' AND c0 <= '᪾JwQ ['
-- ========================================

-- Ground truth: CASE WHEN (no index influence)
SELECT '=== Ground truth (CASE WHEN) ===' AS test;
SELECT
  SUM(CASE WHEN c0 >= '-1e500' AND c0 <= '᪾JwQ [' THEN 1 ELSE 0 END) AS truth_matches,
  SUM(CASE WHEN NOT (c0 >= '-1e500' AND c0 <= '᪾JwQ [') THEN 1 ELSE 0 END) AS truth_non_matches
FROM t0;

-- Default plan (optimizer picks prefix_idx, returns 0 rows = BUG!)
SELECT '=== Default plan (BUG: returns 0) ===' AS test;
SELECT COUNT(*) AS default_count FROM t0 WHERE c0 >= '-1e500' AND c0 <= '᪾JwQ [' ORDER BY c0 ASC;
EXPLAIN SELECT c0 FROM t0 WHERE c0 >= '-1e500' AND c0 <= '᪾JwQ [' ORDER BY c0 ASC;

-- Ignore index (full table scan, correct result)
SELECT '=== Ignore index (correct) ===' AS test;
SELECT COUNT(*) AS ignore_index_count FROM t0 IGNORE INDEX(prefix_idx) WHERE c0 >= '-1e500' AND c0 <= '᪾JwQ [' ORDER BY c0 ASC;

-- Force different prefix lengths to show the pattern
SELECT '=== Force prefix_idx c0(1) = BUG ===' AS test;
SELECT COUNT(*) AS force_c0_1 FROM t0 FORCE INDEX(prefix_idx) WHERE c0 >= '-1e500' AND c0 <= '᪾JwQ [';

DROP INDEX prefix_idx ON t0;
CREATE INDEX prefix_idx2 ON t0(c0(2));
ANALYZE TABLE t0;
SELECT '=== Force prefix_idx c0(2) = still wrong ===' AS test;
SELECT COUNT(*) AS force_c0_2 FROM t0 FORCE INDEX(prefix_idx2) WHERE c0 >= '-1e500' AND c0 <= '᪾JwQ [';

DROP INDEX prefix_idx2 ON t0;
CREATE INDEX prefix_idx3 ON t0(c0(3));
ANALYZE TABLE t0;
SELECT '=== Force prefix_idx c0(3) = correct ===' AS test;
SELECT COUNT(*) AS force_c0_3 FROM t0 FORCE INDEX(prefix_idx3) WHERE c0 >= '-1e500' AND c0 <= '᪾JwQ [';

Suggested fix:
The issue appears to be in the range optimizer's handling of numeric overflow literals (e.g., '-1e500') when used in string comparison contexts with 1-character prefix indexes.

Recommended approach:
1. In sql/opt_range.cc, modify the range analysis logic to avoid treating overflow 
   literals as "impossible" values when the column type is string-based (TEXT/VARCHAR)
2. Ensure that for string-typed columns, range bounds are evaluated using string 
   collation semantics, not numeric overflow detection
3. Add special handling for prefix_length == 1 case to prevent premature "zero rows" 
   optimization when overflow literals are present
[28 Apr 4:26] Chaithra Marsur Gopala Reddy
Hi Li Zeyan,

Thank you for the test case. Verified as described.