Description:
With executing a range query with multiple identical ranges OR-ed together, the optimizer would lose part of the range, and end up with a much worse range query plan or ref plan.
This seems to be a regression introduced in 8.0 and still repros in latest 8.0.23. 5.6 doesn't have this bug.
How to repeat:
CREATE TABLE test_or (
a bigint unsigned NOT NULL,
b bigint unsigned NOT NULL,
c bigint unsigned NOT NULL,
d VARCHAR(10), PRIMARY KEY (a, b, c));
INSERT INTO test_or VALUES (1, 1, 1, 'a'), (2, 2, 2, 'b'), (3, 3, 3, 'c');
Correct range query plan without identical OR:
EXPLAIN SELECT * FROM test_or FORCE INDEX (PRIMARY) WHERE (a=1 and b=1 and c=1) OR (a=1 and b=2 and c=2);
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | test_or | NULL | range | PRIMARY | PRIMARY | 24 | NULL | 2 | 100.00 | Using where |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Incorrect query plan with identical OR:
explain select * from test_or FORCE INDEX (PRIMARY) where (a=1 and b=1 and c=1) OR (a=1 and B=1 and c=1) OR (a=1 and b=2 and c=2);
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | test_or | NULL | ref | PRIMARY | PRIMARY | 8 | const | 1 | 33.33 | Using where |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
You can see MySQL end up choosing the ref plan with only the first column a. This is because the range optimizer would end up with a range query plan with the first column as well (1 <= a <= 1) and ref plan (1 <= a <= 1) ends up being just barely cheaper (0.475 vs range query plan 0.485). This can be observed with the optimizer trace:
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "PRIMARY",
"ranges": [
"1 <= a <= 1"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": false,
"rows": 2,
"cost": 0.485,
"chosen": true
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
},
"chosen_range_access_summary": {
"range_access_plan": {
"type": "range_scan",
"index": "PRIMARY",
"rows": 2,
"ranges": [
"1 <= a <= 1"
]
},
"rows_for_plan": 2,
"cost_for_plan": 0.485,
"chosen": true
}
}
}
]
},
{
"considered_execution_plans": [
{
"plan_prefix": [
],
"table": "`test_or` FORCE INDEX (PRIMARY)",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "ref",
"index": "PRIMARY",
"rows": 2,
"cost": 0.475,
"chosen": true
},
{
"access_type": "range",
"range_details": {
"used_index": "PRIMARY"
},
"chosen": false,
"cause": "heuristic_index_cheaper"
}
]
},
"condition_filtering_pct": 33.333,
"rows_for_plan": 0.6667,
"cost_for_plan": 0.475,
"chosen": true
}
]
},
This repros in latest
Suggested fix:
The bug seems to be happening in key_or:
When having multiple ranges that are the same, key_or would see that both ranges key1 and key2 are exactly the same, release key2->next_key_part, and then OR key1->next_key_part and key2->next_key_part= NULL (since it is released) with key_or. However, because NULL sub-tree are treat as TRUE (same as is_always), the entirety of key1->next_key_part gets dropped, so you end up with just the first key. I have a proposed fix that stop handling current range and simply move to the next range (without falling through to the remaining logic).
You can find my proposed fix in this PR:
https://github.com/facebook/mysql-5.6/pull/1158