Bug #75248 | OR conditions on multi-column index may not use all index columns to filter rows | ||
---|---|---|---|
Submitted: | 17 Dec 2014 14:59 | Modified: | 26 Aug 2015 14:11 |
Reporter: | Yoshinori Matsunobu (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.6.22, 5.6.23, 5.7.6, 5.5.43 | OS: | Any |
Assigned to: | Sreeharsha Ramanavarapu | CPU Architecture: | Any |
[17 Dec 2014 14:59]
Yoshinori Matsunobu
[17 Dec 2014 15:03]
Yoshinori Matsunobu
Changed a title a bit.
[17 Dec 2014 15:50]
MySQL Verification Team
Hello Yoshinori, Thank you for the report and test case. Thanks, Umesh
[17 Dec 2014 15:51]
MySQL Verification Team
// 5.7.6 mysql> show variables like '%version%'; +-------------------------+---------------------+ | Variable_name | Value | +-------------------------+---------------------+ | innodb_version | 5.7.6 | | protocol_version | 10 | | slave_type_conversions | | | version | 5.7.6-m16 | | version_comment | Source distribution | | version_compile_machine | x86_64 | | version_compile_os | Linux | +-------------------------+---------------------+ 7 rows in set (0.00 sec) mysql> explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=4 and c3=200); +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | 1 | SIMPLE | i1 | NULL | range | PRIMARY | PRIMARY | 12 | NULL | 3 | 100.00 | Using where | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.05 sec) mysql> explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=3 and c3=200); +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | 1 | SIMPLE | i1 | NULL | range | PRIMARY | PRIMARY | 12 | NULL | 1000 | 100.00 | Using where | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql>SELECT trace FROM information_schema.optimizer_trace\G *************************** 1. row *************************** .. "chosen_range_access_summary": { "range_access_plan": { "type": "range_scan", "index": "PRIMARY", "rows": 1000, "ranges": [ "1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5", "1 <= c1 <= 1 AND 3 <= c2 <= 3" ] }, "rows_for_plan": 1000, "cost_for_plan": 204.43, "chosen": true // 5.6.23 [test]> show variables like '%version%'; +-------------------------+---------------------------------------------------------+ | Variable_name | Value | +-------------------------+---------------------------------------------------------+ | innodb_version | 5.6.23 | | protocol_version | 10 | | slave_type_conversions | | | version | 5.6.23-enterprise-commercial-advanced | | version_comment | MySQL Enterprise Server - Advanced Edition (Commercial) | | version_compile_machine | x86_64 | | version_compile_os | Linux | +-------------------------+---------------------------------------------------------+ 7 rows in set (0.00 sec) [test]> explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=4 and c3=200); +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | i1 | range | PRIMARY | PRIMARY | 12 | NULL | 3 | Using where | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) [test]> explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=3 and c3=200); +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | i1 | range | PRIMARY | PRIMARY | 12 | NULL | 1000 | Using where | +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) [test]> set optimizer_trace='enabled=on'; Query OK, 0 rows affected (0.00 sec) [test]> select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=3 and c3=200); +----+----+-----+------+------+ | c1 | c2 | c3 | c4 | c5 | +----+----+-----+------+------+ | 1 | 2 | 5 | 0 | 0 | | 1 | 3 | 1 | 0 | 0 | | 1 | 3 | 200 | 0 | 0 | +----+----+-----+------+------+ 3 rows in set (0.00 sec) [test]> SELECT trace FROM information_schema.optimizer_trace\G *************************** 1. row *************************** .. "chosen_range_access_summary": { "range_access_plan": { "type": "range_scan", "index": "PRIMARY", "rows": 1000, "ranges": [ "1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5", "1 <= c1 <= 1 AND 3 <= c2 <= 3" ] }, "rows_for_plan": 1000, "cost_for_plan": 203.23, "chosen": true }
[10 Feb 2015 0:53]
Yoshinori Matsunobu
As far as I've investigated, this issue happens when: - Primary (or unique not null) keys having three (or more) columns as a multiple-column index - Using OR conditions using all of the key columns - Using duplicate conditions for the prefix columns Here is an example query causing the problem. select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=10) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=4 and c3=0); There is primary key (c1, c2, c3), and two OR conditions have (c1=1 and c2=3). In this case, somehow c3 was not used for filtering rows. "1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5", "1 <= c1 <= 1 AND 3 <= c2 <= 3", "1 <= c1 <= 1 AND 4 <= c2 <= 4 AND 0 <= c3 <= 0" If there are many rows matching (c1=1 and c2=3), this query is much more expensive than using all key columns to filter rows. As far as I've checked with debugger, tree_or() and sel_trees_can_be_ored() somehow merged "(c1=1 AND c2=3 AND c3=10) or (c1=1 AND c2=3 AND c3=1)" conditions into one "(c1=1 AND c2=3)" condition. Actual merge seemed happened at key_or() function. T@4 : | | | | | | | | | | | >tree_or T@4 : | | | | | | | | | | | | >sel_trees_can_be_ored T@4 : | | | | | | | | | | | | | info: sel_tree: tree1->keys[0(real_keynr: 0)]: (1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5) OR (1 <= c1 <= 1 AND 3 <= c2 <= 3 AND 10 <= c3 <= 10) T@4 : | | | | | | | | | | | | | info: sel_tree: tree2->keys[0(real_keynr: 0)]: (1 <= c1 <= 1 AND 3 <= c2 <= 3 AND 1 <= c3 <= 1) T@4 : | | | | | | | | | | | | <sel_trees_can_be_ored 7302 T@4 : | | | | | | | | | | | <tree_or 7501 T@4 : | | | | | | | | | | | info: sel_tree: after_or->keys[0(real_keynr: 0)]: (1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5) OR (1 <= c1 <= 1 AND 3 <= c2 <= 3) T@4 : | | | | | | | | | | | >get_mm_tree The problem repeated in MySQL 5.5 and 5.6. I couldn't repeat in MariaDB 5.5 and 10.0.
[10 Feb 2015 1:30]
Yoshinori Matsunobu
https://github.com/mysql/mysql-server/blob/5.7/sql/opt_range.cc#L8076 ---- if (key1->use_count > 0 || !(key1=key1->clone_tree(param))) return 0; // OOM ---- Should || be && ? I verified replacing || with && fixed the problem I described in this bug report.
[20 May 2015 16:35]
Paul DuBois
Noted in 5.6.26, 5.7.8, 5.8.0 changelogs. The optimizer could incorrectly believe an out-of-memory condition had occurred for optimization of range scans for OR operators, and thus return an overestimate of qualifying rows.
[17 Jun 2015 4:42]
Sreeharsha Ramanavarapu
Reverted fix for 20229614 which caused a regression.
[26 Aug 2015 14:11]
Paul DuBois
Noted in 5.6.27, 5.7.9, 5.8.0 changelogs. The optimizer could incorrectly assume an out-of-memory condition while optimizing a range scan for the OR operator, resulting in overestimation of the number of qualifying rows.