Description:
ReuseRangeEstimateForRef does not work properly when there are non-prefixed multi keypart ref.
How to repeat:
1. Create table
CREATE TABLE t1 (
id int PRIMARY KEY,
col1 int,
col2 int,
col3 int,
key index_col1(col1, col2, col3)
);
CREATE TABLE t2 (
id int PRIMARY KEY,
col1 int,
key index_col1(col1)
);
CREATE TABLE t3 (
id int PRIMARY KEY,
col1 int,
key index_col1(col1)
);
2. Create procedure
CREATE PROCEDURE insert_data()
BEGIN
DECLARE num int;
SET num = 1;
WHILE num < 100000 DO
IF num < 90000 THEN
INSERT INTO t1
VALUES (num, 0, num, num);
ELSE
INSERT INTO t1
VALUES (num, num, num, num);
END IF;
INSERT INTO t2
VALUES (num, num);
INSERT INTO t3
VALUES (num, num);
SET num = num + 1;
END WHILE;
END;
3. Call procedure and analyze table
call insert_data();
analyze table t1;
analyze table t2;
analyze table t3;
4. Enable optimizer_trace and execute SQL
SET optimizer_trace='enabled=on';
EXPLAIN SELECT * FROM t1 join t2 on t1.col3 = t2.col1 join t3 on t1.col2 = t3.col1 where t1.col1 = 0 and t2.id < 1000;
+----+-------------+-------+------------+-------+--------------------+------------+---------+---------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+--------------------+------------+---------+---------------+------+----------+--------------------------+
| 1 | SIMPLE | t2 | NULL | range | PRIMARY,index_col1 | PRIMARY | 4 | NULL | 999 | 100.00 | Using where |
| 1 | SIMPLE | t1 | NULL | ref | index_col1 | index_col1 | 5 | const | 10 | 10.00 | Using where; Using index |
| 1 | SIMPLE | t3 | NULL | ref | index_col1 | index_col1 | 5 | test7.t1.col2 | 1 | 100.00 | Using index |
+----+-------------+-------+------------+-------+--------------------+------------+---------+---------------+------+----------+--------------------------+
SELECT * FROM information_schema.optimizer_trace;
In range analysis, index_dive is used to estimate that the number of rows in table t1 that satisfy col1=0 is 50020.
"table": "`t1`",
"range_analysis": {
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "index_col1",
"ranges": [
"col1 = 0"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": true,
"in_memory": 1,
"rows": 50020,
"cost": 5031.21,
"chosen": true
}
],
However, when the join order is 't2, t1, t3', the ref of t1 is actually col1=0, but there is no Reuse Range Estimate. Due to severe data skew, the number of rows estimated using statistics is 10.
"plan_prefix": [
"`t2`"
],
"table": "`t1`",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "ref",
"index": "index_col1",
"rows": 10.018,
"cost": 1255.76,
"chosen": true
},
{
"access_type": "range",
"range_details": {
"used_index": "index_col1"
},
"cost": 10033.2,
"rows": 50020,
"chosen": false,
"cause": "cost"
}
]
},
This causes the optimizer to incorrectly choose the join order of t2, t1, t3, which is a much worse join order than t1, t2, t3.
Suggested fix:
In sql_planner.cc:
'''
} else if ((found_part & 1) &&
(!(table->file->index_flags(key, 0, false) &
HA_ONLY_WHOLE_INDEX) ||
all_key_parts_covered)) {
/*
Use as many key-parts as possible and a unique key is better
than a not unique key.
Set cur_fanout to (previous record count) * (records / combination)
*/
cur_used_keyparts = max_part_bit(found_part);
'''
The 'cur_used_keyparts' is the index prefix part that can be used to ref. But, the 'table_deps' is not updated resulting in using fewer keyparts but still depending on the previous table.
The 'table_deps' should be updated after cur_used_keyparts is calculated.