Description:
rechecking_index_usage causes the materialized table to be generated using table scan in Materialize semijoin strategy.
How to repeat:
1. Create table
CREATE TABLE t1 (
id int PRIMARY KEY,
col1 int NOT NULL
);
CREATE TABLE t2 (
id int PRIMARY KEY,
col1 int NOT NULL,
col2 int,
KEY index_col1 (col1),
KEY index_col2 (col2)
);
2. Create procedure
DELIMITER //
CREATE PROCEDURE insert_data ()
BEGIN
DECLARE num int;
SET num = 1;
WHILE num <= 1000 DO
INSERT INTO t1
VALUES (num, num);
INSERT INTO t2
VALUES (num, num, num);
SET num = num + 1;
END WHILE;
END;//
DELIMITER ;
3. Call procedure to insert data
call insert_data();
4. Execute query
EXPLAIN SELECT count(*) FROM t1 WHERE t1.col1 not in (SELECT col1 FROM t2 force index(index_col2) WHERE t2.col2 in (1,2));
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------+------+----------+-------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------+------+----------+-------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | NULL |
| 1 | SIMPLE | <subquery2> | NULL | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 5 | bug.t1.col1 | 1 | 100.00 | Using where; Not exists |
| 2 | MATERIALIZED | t2 | NULL | ALL | index_col2 | NULL | NULL | NULL | 1000 | 100.00 | Using where |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------+------+----------+-------------------------+
It would be better to use range scan on index_col2 of table t to materialize table <subquery2>, but table scan was used instead. If we check optimizer_trace, we will see that the access type of t2 is changed from range to table_scan due to rechecking_index_usage.
"rechecking_index_usage": {
"recheck_reason": "not_first_table",
"range_analysis": {
"table_scan": {
"rows": 1000,
"cost": 1.79769e+308
},
"potential_range_indexes": [
{
"index": "PRIMARY",
"usable": false,
"cause": "not_applicable"
},
{
"index": "index_col1",
"usable": false,
"cause": "not_applicable"
},
{
"index": "index_col2",
"usable": true,
"key_parts": [
"col2",
"id"
]
}
],
"setup_range_conditions": [
],
"range_scan_possible": false,
"cause": "condition_always_true",
"group_index_range": {
"chosen": false,
"cause": "not_single_table"
},
"skip_scan_range": {
"chosen": false,
"cause": "not_single_table"
}
}
}
And the reason is 'condition_always_true'.
Suggested fix:
The rechecking_index_usage should not be applied to the base table used to generate the materialized table.
Description: rechecking_index_usage causes the materialized table to be generated using table scan in Materialize semijoin strategy. How to repeat: 1. Create table CREATE TABLE t1 ( id int PRIMARY KEY, col1 int NOT NULL ); CREATE TABLE t2 ( id int PRIMARY KEY, col1 int NOT NULL, col2 int, KEY index_col1 (col1), KEY index_col2 (col2) ); 2. Create procedure DELIMITER // CREATE PROCEDURE insert_data () BEGIN DECLARE num int; SET num = 1; WHILE num <= 1000 DO INSERT INTO t1 VALUES (num, num); INSERT INTO t2 VALUES (num, num, num); SET num = num + 1; END WHILE; END;// DELIMITER ; 3. Call procedure to insert data call insert_data(); 4. Execute query EXPLAIN SELECT count(*) FROM t1 WHERE t1.col1 not in (SELECT col1 FROM t2 force index(index_col2) WHERE t2.col2 in (1,2)); +----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------+------+----------+-------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------+------+----------+-------------------------+ | 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | NULL | | 1 | SIMPLE | <subquery2> | NULL | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 5 | bug.t1.col1 | 1 | 100.00 | Using where; Not exists | | 2 | MATERIALIZED | t2 | NULL | ALL | index_col2 | NULL | NULL | NULL | 1000 | 100.00 | Using where | +----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------+------+----------+-------------------------+ It would be better to use range scan on index_col2 of table t to materialize table <subquery2>, but table scan was used instead. If we check optimizer_trace, we will see that the access type of t2 is changed from range to table_scan due to rechecking_index_usage. "rechecking_index_usage": { "recheck_reason": "not_first_table", "range_analysis": { "table_scan": { "rows": 1000, "cost": 1.79769e+308 }, "potential_range_indexes": [ { "index": "PRIMARY", "usable": false, "cause": "not_applicable" }, { "index": "index_col1", "usable": false, "cause": "not_applicable" }, { "index": "index_col2", "usable": true, "key_parts": [ "col2", "id" ] } ], "setup_range_conditions": [ ], "range_scan_possible": false, "cause": "condition_always_true", "group_index_range": { "chosen": false, "cause": "not_single_table" }, "skip_scan_range": { "chosen": false, "cause": "not_single_table" } } } And the reason is 'condition_always_true'. Suggested fix: The rechecking_index_usage should not be applied to the base table used to generate the materialized table.