Description:
In the following case, there are two indexes of table `t1`, `idx_ab(a, b)` and `idx_c(c)` and there are 100 rows in `t1`. In addition, the value of all rows in column `a` is 1 and the value of all rows in both column `b` and `c` varies from 1 to 100.
MySQL [test]> show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
`c` int(11) DEFAULT NULL,
KEY `idx_ab` (`a`,`b`),
KEY `idx_c` (`c`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
MySQL [test]> explain select * from t1 where ((a = 1 and b = 100) or c = 100) and a = 1;
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ref | idx_ab,idx_c | idx_ab | 5 | const | 1 | 10.90 | Using where |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
In the above statement, there are two access paths, one is INDEX MERGE of 'union(idx_ab,idx_c)' using `((a = 1 and b = 100) or c = 100)`, the other is REF of 'idx_ab' using `a = 1`. Obviously, INDEX MERGE should be chosen. Because according to the data distribution, INDEX MERGE of 'union(idx_ab,idx_c)' using `((a = 1 and b = 100) or c = 100)` only results 1 rows in each index, while REF of 'idx_ab' using `a = 1` results 100 rows. This can be proved by 'rows_estimation' in the optimizer trace:
{
"index": "idx_ab",
"ranges": [
"1 <= a <= 1"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"in_memory": 1,
"rows": 100,
"cost": 35.26,
"chosen": false,
"cause": "cost"
}
"indexes_to_merge": [
{
"range_scan_alternatives": [
{
"index": "idx_ab",
"ranges": [
"1 <= a <= 1 AND 100 <= b <= 100"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": true,
"in_memory": 1,
"rows": 1,
"cost": 0.36,
"chosen": true
}
],
"index_to_merge": "idx_ab",
"cumulated_cost": 0.36
},
{
"range_scan_alternatives": [
{
"index": "idx_c",
"ranges": [
"100 <= c <= 100"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": true,
"in_memory": 1,
"rows": 1,
"cost": 0.36,
"chosen": true
}
],
"index_to_merge": "idx_c",
"cumulated_cost": 0.72
}
],
However, as the explain result shows, REF of 'idx_ab' using `a = 1` has finally been chosen. Looking at the 'considered_execution_plans' in the optimizer trace, it can be found that the row estimation for REF of 'idx_ab' using `a = 1` is changed from 100 (which results from index diving in 'row_estimation') to 1. This row estimation error leads to the wrong choice of access path.
"considered_execution_plans": [
{
"plan_prefix": [
],
"table": "`t1`",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "ref",
"index": "idx_ab",
"rows": 1,
"cost": 0.6,
"chosen": true
},
{
"access_type": "range",
"range_details": {
"used_index": "union(idx_ab,idx_c)"
},
"cost": 1.4575,
"rows": 2,
"chosen": false,
"cause": "cost"
}
]
},
"condition_filtering_pct": 10.9,
"rows_for_plan": 0.109,
"cost_for_plan": 0.6,
"chosen": true
}
]
However, the row estimation for REF of 'idx_ab' using `a = 1` has been achieved correctly by index diving in the previous phase of optimization. The reason of row estimation error is the value of 'quick_rows' for REF of `idx_ab` using `a = 1` is modified when doing index diving for `idx_ab` using `a = 1 and b = 100` in INDEX MERGE by the following code in function named `check_quick_select`:
if (rows != HA_POS_ERROR) {
param->table->quick_rows[keynr] = rows;
// ...
}
How to repeat:
1. source t1.sql
2. analyze table t1.
3. explain select * from t1 where ((a = 1 and b = 100) or c = 100) and a = 1;
ATTENTION: The file of t1.sql has been attached by the following comment!
Suggested fix:
change from
if (rows != HA_POS_ERROR) {
param->table->quick_rows[keynr] = rows;
if (update_tbl_stats) {
param->table->quick_keys.set_bit(keynr);
param->table->quick_key_parts[keynr] = seq.max_key_part + 1;
param->table->quick_n_ranges[keynr] = seq.range_count;
param->table->quick_condition_rows =
min(param->table->quick_condition_rows, rows);
}
param->table->possible_quick_keys.set_bit(keynr);
}
to
if (rows != HA_POS_ERROR) {
if (update_tbl_stats) {
param->table->quick_rows[keynr] = rows;
param->table->quick_keys.set_bit(keynr);
param->table->quick_key_parts[keynr] = seq.max_key_part + 1;
param->table->quick_n_ranges[keynr] = seq.range_count;
param->table->quick_condition_rows =
min(param->table->quick_condition_rows, rows);
}
param->table->possible_quick_keys.set_bit(keynr);
}