Description:
when more values in an in list than eq_range_index_dive_limit, the query execution plan seems ignore index unconditionally.
which will cause performance problem.
How to repeat:
CREATE TABLE `large` (
`id` bigint unsigned NOT NULL,
`data` varchar(100) COLLATE utf8mb4_general_ci DEFAULT NULL,
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
insert into large values (1,rand()),(2,rand()),(3,rand()),(4,rand()),(5,rand()),(6,rand()),(7,rand()),(8,rand()),(9,rand()),(10,rand());
insert into large select 100,rand() from large; -- repeat 10 times to create some data in table.
analyze table large;
set session eq_range_index_dive_limit=200;
explain select sum(data) from large where id in (1,2,3,4,5,6,7,8,9,10) and id between 1 and 10;
the result is :
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | large | NULL | range | id | id | 8 | NULL | 10 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
which is fine.
set session eq_range_index_dive_limit=5; -- make it less than the numbers in the in list
explain select sum(data) from large where id in (1,2,3,4,5,6,7,8,9,10) and id between 1 and 10;
the result is :
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | large | NULL | ALL | id | NULL | NULL | NULL | 9853 | 83.32 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
which shows a full table scan.
test it:
set session long_query_time=0;
select sum(data) from large where id in (1,2,3,4,5,6,7,8,9,10) and id between 1 and 10;
slowlog confirms that sql scans a lot rows:
# Time: 2024-04-29T17:43:21.446726+08:00
# User@Host: root[root] @ localhost [] Id: 11
# Query_time: 0.004150 Lock_time: 0.000002 Rows_sent: 1 Rows_examined: 10242
SET timestamp=1714383801;
select sum(data) from large where id in (1,2,3,4,5,6,7,8,9,10) and id between 1 and 10;
without in list, the sql will use the "between and" as an index condition to filter data.
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | large | NULL | range | id | id | 8 | NULL | 10 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
# Time: 2024-04-29T17:47:10.633367+08:00
# User@Host: root[root] @ localhost [] Id: 11
# Query_time: 0.000265 Lock_time: 0.000003 Rows_sent: 1 Rows_examined: 10
SET timestamp=1714384030;
select sum(data) from large where id between 1 and 10;
Suggested fix:
when eq_range_index_dive_limit exceeded, there is an chance that other conditions can take effect, index should not ignored unconditionally.