Description:
1. FEATURE SUMMARY
------------------
When an IN-list exceeds eq_range_index_dive_limit (default 200), MySQL falls
back to static index statistics for row estimation (EXPLAIN shows type:range).
Benchmarks on identical hardware show this statistics path is consistently
SLOWER than the full-dive path (type:index), and the overhead grows as the
IN-list length increases.
I propose two new SESSION-scoped system variables:
eq_range_index_dive_sample_count (uint, default 10, range 0-200)
eq_range_index_dive_sample_method (ENUM: RANDOM|UNIFORM|FIRST, default RANDOM)
When the IN-list exceeds eq_range_index_dive_limit and sample_count > 0,
MySQL performs actual index dives on only K sampled values, extrapolates the
result to the full list, and returns that estimate — replacing the statistics
fallback entirely. Benchmarks on a patched build confirm up to a 5× improvement in
query execution time vs. the statistics path on the same hardware.
2. PROBLEM DESCRIPTION
----------------------
MySQL's range optimizer uses eq_range_index_dive_limit (default 200) to decide
whether to perform exact index dives for each IN-list value. When the IN-list
length N exceeds this limit, the optimizer falls back to index statistics
(handler::index_statistics), which sets use_index_statistics = true and
produces a type:range plan.
The problem is that the statistics fallback is not a performance optimization —
it is actually slower than the full-dive path. The reasons are:
a) Statistics-based estimation triggers a different execution plan
(type:range vs type:index) that incurs more random I/O per row.
b) The statistics path reads metadata from the index but still executes
a full index scan over the matched ranges.
c) The penalty grows with N: the larger the IN-list, the worse the
statistics path performs relative to full dives.
This behavior means that users who increase eq_range_index_dive_limit beyond
its default, or who encounter IN-lists that barely exceed the limit, receive
worse performance — not better.
Example query:
SELECT COUNT(*) FROM t1 WHERE val IN (1, 2, 3, ..., N);
With eq_range_index_dive_limit = N-1 (forces statistics): type:range, SLOW.
With eq_range_index_dive_limit = N+1 (forces full dive): type:index, FAST.
3. BENCHMARK RESULTS
--------------------
--- Scenario D: Statistics vs. Full Dive (Vanilla EC2 only — same hardware) ---
This is the strongest evidence. On the SAME machine, forcing statistics
(limit = N-1) is always slower than full dives (limit = N+1), and the gap
widens as N grows.
SQL (full dive): SET SESSION eq_range_index_dive_limit = <N+1>;
SELECT COUNT(*) FROM t1 WHERE val IN (1, 2, ..., N);
SQL (statistics): SET SESSION eq_range_index_dive_limit = <N-1>;
SELECT COUNT(*) FROM t1 WHERE val IN (1, 2, ..., N);
+------+----------------+------------------+------------------+
| N | Full dive (ms) | Statistics (ms) | Stats overhead |
+------+----------------+------------------+------------------+
| 10 | 4,960.6 | 5,736.7 | +15.7% |
| 20 | 4,761.4 | 5,567.5 | +16.9% |
| 50 | 4,515.7 | 5,430.6 | +20.3% |
| 100 | 4,487.4 | 5,653.6 | +26.0% |
+------+----------------+------------------+------------------+
Interpretation: The statistics overhead is not constant — it increases with N.
At N=100 the statistics path is 26% slower than full dives on the same machine.
--- Scenario A: sample_count Effect (Patch EC2 only — same hardware) ---
With eq_range_index_dive_limit = 5 and N = 10 (so N > limit → heuristic
fires), we vary K (sample_count). count=0 is the baseline statistics path.
SQL:
SET SESSION eq_range_index_dive_limit = 5;
SET SESSION eq_range_index_dive_sample_count = <K>;
SET SESSION eq_range_index_dive_sample_method = 'RANDOM';
SELECT COUNT(*) FROM t1 WHERE val IN (1,2,3,4,5,6,7,8,9,10);
+-----+----------+--------------------+
| K | avg (ms) | vs count=0 (stats) |
+-----+----------+--------------------+
| 0 | 1,678.0 | baseline (stats) |
| 1 | 1,489.1 | -11.3% |
| 3 | 1,499.4 | -10.7% |
| 5 | 1,493.7 | -10.9% |
| 7 | 1,520.2 | -9.4% |
| 10 | 1,487.1 | -11.4% |
| 20 | 1,503.3 | -10.4% |
| 50 | 1,494.6 | -10.9% |
| 100 | 1,505.1 | -10.3% |
| 200 | 1,490.6 | -11.2% |
+-----+----------+--------------------+
Key finding: Even K=1 (sampling just 1 out of 10 values) eliminates the
statistics penalty. K=5 gives ~10.9% improvement. The gain plateaus past K=5,
suggesting K=5 is a practical sweet spot.
OPTIMIZER_TRACE confirms: count=0 → index_dives_for_range_access: 0 (stats);
count≥1 → index_dives_for_range_access: K (heuristic).
--- Scenario B: sample_method Effect (Patch EC2 only — same hardware) ---
With N=10, limit=5, count=5, we compare the three sampling strategies.
SQL:
SET SESSION eq_range_index_dive_sample_method = '<METHOD>';
(all other settings same as Scenario A)
+----------+----------+
| method | avg (ms) |
+----------+----------+
| RANDOM | 1,506.3 |
| UNIFORM | 1,495.2 |
| FIRST | 1,508.6 |
+----------+----------+
All three methods outperform the statistics fallback (1,678 ms) by ~10-11%.
UNIFORM is marginally best (deterministic, evenly-spaced sample positions).
--- Scenario C: Vanilla vs. Patch Summary (different hardware — for reference) ---
NOTE: These EC2 instances have different hardware. This table is for context
only; do not use it to compare absolute values between Vanilla and Patch.
+-------------------------------+--------+----------+---------------------------+
| Case | EC2 | avg (ms) | Notes |
+-------------------------------+--------+----------+---------------------------+
| Vanilla full dive (limit=200) | Vanilla| 5,038.5 | N=10 < limit → full dive |
| Vanilla stats (limit=5) | Vanilla| 5,981.1 | N=10 > limit → statistics |
| Patch K=1 (count=1, limit=5) | Patch | 1,494.9 | heuristic, 1 sample |
| Patch K=3 (count=3, limit=5) | Patch | 1,502.3 | heuristic, 3 samples |
| Patch K=5 (count=5, limit=5) | Patch | 1,492.6 | heuristic, 5 samples |
| Patch K=7 (count=7, limit=5) | Patch | 1,525.7 | heuristic, 7 samples |
| Patch K=10 (count=10,limit=5) | Patch | 1,496.5 | full dive (K >= N) |
+-------------------------------+--------+----------+---------------------------+
On Vanilla, statistics (5,981 ms) is 18.7% slower than full dive (5,039 ms),
consistent with Scenario D findings.
How to repeat:
TEST SETUP
-------------
All benchmarks use the following table and data distribution:
CREATE TABLE t1 (
id INT NOT NULL AUTO_INCREMENT,
val INT NOT NULL,
col2 VARCHAR(100),
PRIMARY KEY (id),
KEY idx_val (val)
) ENGINE=InnoDB;
Rows : ~10,000,000
val 1-10 : ~800,000 rows each (total ~80% of table, high cardinality)
val 11-500 : ~4,000 rows each (total ~20% of table)
Two EC2 instances were used:
- Vanilla EC2 : stock MySQL 8.4, unmodified
- Patch EC2 : MySQL 8.4 rebuilt RelWithDebInfo with patch
NOTE: The Vanilla and Patch EC2s are different machines,
both running on AWS EC2 t2.xlarge instances (4 vCPUs, 16 GiB memory).
Cross-machine comparisons (Scenario C) are provided for reference only.
Intra-machine comparisons (Scenario A, B, D) are the primary evidence.
Timing method: 3-run average.
SET @s = NOW(6);
SELECT COUNT(*) FROM t1 WHERE val IN (...);
SELECT ROUND(TIMESTAMPDIFF(MICROSECOND, @s, NOW(6)) / 1000, 1); -- ms
Description: 1. FEATURE SUMMARY ------------------ When an IN-list exceeds eq_range_index_dive_limit (default 200), MySQL falls back to static index statistics for row estimation (EXPLAIN shows type:range). Benchmarks on identical hardware show this statistics path is consistently SLOWER than the full-dive path (type:index), and the overhead grows as the IN-list length increases. I propose two new SESSION-scoped system variables: eq_range_index_dive_sample_count (uint, default 10, range 0-200) eq_range_index_dive_sample_method (ENUM: RANDOM|UNIFORM|FIRST, default RANDOM) When the IN-list exceeds eq_range_index_dive_limit and sample_count > 0, MySQL performs actual index dives on only K sampled values, extrapolates the result to the full list, and returns that estimate — replacing the statistics fallback entirely. Benchmarks on a patched build confirm up to a 5× improvement in query execution time vs. the statistics path on the same hardware. 2. PROBLEM DESCRIPTION ---------------------- MySQL's range optimizer uses eq_range_index_dive_limit (default 200) to decide whether to perform exact index dives for each IN-list value. When the IN-list length N exceeds this limit, the optimizer falls back to index statistics (handler::index_statistics), which sets use_index_statistics = true and produces a type:range plan. The problem is that the statistics fallback is not a performance optimization — it is actually slower than the full-dive path. The reasons are: a) Statistics-based estimation triggers a different execution plan (type:range vs type:index) that incurs more random I/O per row. b) The statistics path reads metadata from the index but still executes a full index scan over the matched ranges. c) The penalty grows with N: the larger the IN-list, the worse the statistics path performs relative to full dives. This behavior means that users who increase eq_range_index_dive_limit beyond its default, or who encounter IN-lists that barely exceed the limit, receive worse performance — not better. Example query: SELECT COUNT(*) FROM t1 WHERE val IN (1, 2, 3, ..., N); With eq_range_index_dive_limit = N-1 (forces statistics): type:range, SLOW. With eq_range_index_dive_limit = N+1 (forces full dive): type:index, FAST. 3. BENCHMARK RESULTS -------------------- --- Scenario D: Statistics vs. Full Dive (Vanilla EC2 only — same hardware) --- This is the strongest evidence. On the SAME machine, forcing statistics (limit = N-1) is always slower than full dives (limit = N+1), and the gap widens as N grows. SQL (full dive): SET SESSION eq_range_index_dive_limit = <N+1>; SELECT COUNT(*) FROM t1 WHERE val IN (1, 2, ..., N); SQL (statistics): SET SESSION eq_range_index_dive_limit = <N-1>; SELECT COUNT(*) FROM t1 WHERE val IN (1, 2, ..., N); +------+----------------+------------------+------------------+ | N | Full dive (ms) | Statistics (ms) | Stats overhead | +------+----------------+------------------+------------------+ | 10 | 4,960.6 | 5,736.7 | +15.7% | | 20 | 4,761.4 | 5,567.5 | +16.9% | | 50 | 4,515.7 | 5,430.6 | +20.3% | | 100 | 4,487.4 | 5,653.6 | +26.0% | +------+----------------+------------------+------------------+ Interpretation: The statistics overhead is not constant — it increases with N. At N=100 the statistics path is 26% slower than full dives on the same machine. --- Scenario A: sample_count Effect (Patch EC2 only — same hardware) --- With eq_range_index_dive_limit = 5 and N = 10 (so N > limit → heuristic fires), we vary K (sample_count). count=0 is the baseline statistics path. SQL: SET SESSION eq_range_index_dive_limit = 5; SET SESSION eq_range_index_dive_sample_count = <K>; SET SESSION eq_range_index_dive_sample_method = 'RANDOM'; SELECT COUNT(*) FROM t1 WHERE val IN (1,2,3,4,5,6,7,8,9,10); +-----+----------+--------------------+ | K | avg (ms) | vs count=0 (stats) | +-----+----------+--------------------+ | 0 | 1,678.0 | baseline (stats) | | 1 | 1,489.1 | -11.3% | | 3 | 1,499.4 | -10.7% | | 5 | 1,493.7 | -10.9% | | 7 | 1,520.2 | -9.4% | | 10 | 1,487.1 | -11.4% | | 20 | 1,503.3 | -10.4% | | 50 | 1,494.6 | -10.9% | | 100 | 1,505.1 | -10.3% | | 200 | 1,490.6 | -11.2% | +-----+----------+--------------------+ Key finding: Even K=1 (sampling just 1 out of 10 values) eliminates the statistics penalty. K=5 gives ~10.9% improvement. The gain plateaus past K=5, suggesting K=5 is a practical sweet spot. OPTIMIZER_TRACE confirms: count=0 → index_dives_for_range_access: 0 (stats); count≥1 → index_dives_for_range_access: K (heuristic). --- Scenario B: sample_method Effect (Patch EC2 only — same hardware) --- With N=10, limit=5, count=5, we compare the three sampling strategies. SQL: SET SESSION eq_range_index_dive_sample_method = '<METHOD>'; (all other settings same as Scenario A) +----------+----------+ | method | avg (ms) | +----------+----------+ | RANDOM | 1,506.3 | | UNIFORM | 1,495.2 | | FIRST | 1,508.6 | +----------+----------+ All three methods outperform the statistics fallback (1,678 ms) by ~10-11%. UNIFORM is marginally best (deterministic, evenly-spaced sample positions). --- Scenario C: Vanilla vs. Patch Summary (different hardware — for reference) --- NOTE: These EC2 instances have different hardware. This table is for context only; do not use it to compare absolute values between Vanilla and Patch. +-------------------------------+--------+----------+---------------------------+ | Case | EC2 | avg (ms) | Notes | +-------------------------------+--------+----------+---------------------------+ | Vanilla full dive (limit=200) | Vanilla| 5,038.5 | N=10 < limit → full dive | | Vanilla stats (limit=5) | Vanilla| 5,981.1 | N=10 > limit → statistics | | Patch K=1 (count=1, limit=5) | Patch | 1,494.9 | heuristic, 1 sample | | Patch K=3 (count=3, limit=5) | Patch | 1,502.3 | heuristic, 3 samples | | Patch K=5 (count=5, limit=5) | Patch | 1,492.6 | heuristic, 5 samples | | Patch K=7 (count=7, limit=5) | Patch | 1,525.7 | heuristic, 7 samples | | Patch K=10 (count=10,limit=5) | Patch | 1,496.5 | full dive (K >= N) | +-------------------------------+--------+----------+---------------------------+ On Vanilla, statistics (5,981 ms) is 18.7% slower than full dive (5,039 ms), consistent with Scenario D findings. How to repeat: TEST SETUP ------------- All benchmarks use the following table and data distribution: CREATE TABLE t1 ( id INT NOT NULL AUTO_INCREMENT, val INT NOT NULL, col2 VARCHAR(100), PRIMARY KEY (id), KEY idx_val (val) ) ENGINE=InnoDB; Rows : ~10,000,000 val 1-10 : ~800,000 rows each (total ~80% of table, high cardinality) val 11-500 : ~4,000 rows each (total ~20% of table) Two EC2 instances were used: - Vanilla EC2 : stock MySQL 8.4, unmodified - Patch EC2 : MySQL 8.4 rebuilt RelWithDebInfo with patch NOTE: The Vanilla and Patch EC2s are different machines, both running on AWS EC2 t2.xlarge instances (4 vCPUs, 16 GiB memory). Cross-machine comparisons (Scenario C) are provided for reference only. Intra-machine comparisons (Scenario A, B, D) are the primary evidence. Timing method: 3-run average. SET @s = NOW(6); SELECT COUNT(*) FROM t1 WHERE val IN (...); SELECT ROUND(TIMESTAMPDIFF(MICROSECOND, @s, NOW(6)) / 1000, 1); -- ms