Bug #117770 rechecking_index_usage causes the materialized table to be generated using table scan in Materialize semijoin strategy
Submitted: 22 Mar 5:32 Modified: 24 Mar 7:53
Reporter: Jingqi Tian (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.36, 8.0.41, 8.4.4, 9.2.0 OS:Any
Assigned to: CPU Architecture:Any

[22 Mar 5:32] Jingqi Tian
Description:
rechecking_index_usage causes the materialized table to be generated using table scan in Materialize semijoin strategy.

How to repeat:
1. Create table

CREATE TABLE t1 (
	id int PRIMARY KEY,
	col1 int NOT NULL
);

CREATE TABLE t2 (
	id int PRIMARY KEY,
	col1 int NOT NULL,
	col2 int,
	KEY index_col1 (col1),
	KEY index_col2 (col2)
);

2. Create procedure

DELIMITER //
CREATE PROCEDURE insert_data ()
BEGIN
	DECLARE num int;
	SET num = 1;
	WHILE num <= 1000 DO
	INSERT INTO t1
	VALUES (num, num);
    INSERT INTO t2
	VALUES (num, num, num);
	SET num = num + 1;
	END WHILE;
END;//
DELIMITER ;

3. Call procedure to insert data

call insert_data();

4. Execute query

EXPLAIN SELECT count(*) FROM t1 WHERE t1.col1 not in (SELECT col1 FROM t2 force index(index_col2) WHERE t2.col2 in (1,2));

+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------+------+----------+-------------------------+
| id | select_type  | table       | partitions | type   | possible_keys       | key                 | key_len | ref         | rows | filtered | Extra                   |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------+------+----------+-------------------------+
|  1 | SIMPLE       | t1          | NULL       | ALL    | NULL                | NULL                | NULL    | NULL        | 1000 |   100.00 | NULL                    |
|  1 | SIMPLE       | <subquery2> | NULL       | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 5       | bug.t1.col1 |    1 |   100.00 | Using where; Not exists |
|  2 | MATERIALIZED | t2          | NULL       | ALL    | index_col2          | NULL                | NULL    | NULL        | 1000 |   100.00 | Using where             |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------+------+----------+-------------------------+

It would be better to use range scan on index_col2 of table t to materialize table <subquery2>, but table scan was used instead. If we check optimizer_trace, we will see that the access type of t2 is changed from range to table_scan due to rechecking_index_usage.

  "rechecking_index_usage": {
    "recheck_reason": "not_first_table",
    "range_analysis": {
      "table_scan": {
        "rows": 1000,
        "cost": 1.79769e+308
      },
      "potential_range_indexes": [
        {
          "index": "PRIMARY",
          "usable": false,
          "cause": "not_applicable"
        },
        {
          "index": "index_col1",
          "usable": false,
          "cause": "not_applicable"
        },
        {
          "index": "index_col2",
          "usable": true,
          "key_parts": [
            "col2",
            "id"
          ]
        }
      ],
      "setup_range_conditions": [
      ],
      "range_scan_possible": false,
      "cause": "condition_always_true",
      "group_index_range": {
        "chosen": false,
        "cause": "not_single_table"
      },
      "skip_scan_range": {
        "chosen": false,
        "cause": "not_single_table"
      }
    }
  }

And the reason is 'condition_always_true'.

Suggested fix:
The rechecking_index_usage should not be applied to the base table used to generate the materialized table.
[24 Mar 7:53] MySQL Verification Team
Hello Jingqi Tian,

Thank you for the report and test case.

regards,
Umesh