Bug #75248 OR conditions on multi-column index may not use all index columns to filter rows
Submitted: 17 Dec 2014 14:59 Modified: 26 Aug 2015 14:11
Reporter: Yoshinori Matsunobu (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.6.22, 5.6.23, 5.7.6, 5.5.43 OS:Any
Assigned to: Sreeharsha Ramanavarapu CPU Architecture:Any

[17 Dec 2014 14:59] Yoshinori Matsunobu
Description:
Suppose there is a primary key on (c1, c2, c3), and run a query SELECT ... WHERE (c1=x AND c2=y AND c3=z) OR (c1=x AND c2=yy AND c3=zz) OR (c1=x AND c2=yy AND c3=zzz).
In this case, all index columns (c1, c2, c3) are used to filter the first condition. But, only (c1, c2) are used to filter the second and third conditions. If there are many rows matching (c1=x and c2=yy), the filtering takes very long time.

How to repeat:
Generate 1 million rows by the following Perl script.
for(my $i= 1; $i <= 1000; $i++) {
  for(my $j= 1; $j <= 1000 ; $j++) {
    print "1,$i,$j,0,0\n";
  }
}

create table i1 (c1 int, c2 int, c3 int, c4 int, c5 int, primary key (c1,c2,c3)) engine=innodb;
load data local infile '/tmp/generated_data' into table i1 fields terminated by ',';

explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=4 and c3=200);
=> rows==3 (good execution plan)

explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=3 and c3=200);
=> rows==1000 (bad execution plan)

Optimizer trace (set optimizer_trace='enabled=on'; select; SELECT trace FROM information_schema.optimizer_trace;) also says c3 was not chosen for the second and third OR conditions. 
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 1000,
                      "ranges": [
                        "1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5",
                        "1 <= c1 <= 1 AND 3 <= c2 <= 3"
                      ]
                    },
                    "rows_for_plan": 1000,
                    "cost_for_plan": 203.23,
                    "chosen": true
                  }
[17 Dec 2014 15:03] Yoshinori Matsunobu
Changed a title a bit.
[17 Dec 2014 15:50] MySQL Verification Team
Hello Yoshinori,

Thank you for the report and test case.

Thanks,
Umesh
[17 Dec 2014 15:51] MySQL Verification Team
// 5.7.6

mysql> show variables like '%version%';
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| innodb_version          | 5.7.6               |
| protocol_version        | 10                  |
| slave_type_conversions  |                     |
| version                 | 5.7.6-m16           |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | Linux               |
+-------------------------+---------------------+
7 rows in set (0.00 sec)

mysql> explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=4 and c3=200);
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | i1    | NULL       | range | PRIMARY       | PRIMARY | 12      | NULL |    3 |   100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.05 sec)

mysql> explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=3 and c3=200);
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | i1    | NULL       | range | PRIMARY       | PRIMARY | 12      | NULL | 1000 |   100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql>SELECT trace FROM information_schema.optimizer_trace\G
*************************** 1. row ***************************
..
                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 1000,
                      "ranges": [
                        "1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5",
                        "1 <= c1 <= 1 AND 3 <= c2 <= 3"
                      ]
                    },
                    "rows_for_plan": 1000,
                    "cost_for_plan": 204.43,
                    "chosen": true

// 5.6.23

[test]> show variables like '%version%';
+-------------------------+---------------------------------------------------------+
| Variable_name           | Value                                                   |
+-------------------------+---------------------------------------------------------+
| innodb_version          | 5.6.23                                                  |
| protocol_version        | 10                                                      |
| slave_type_conversions  |                                                         |
| version                 | 5.6.23-enterprise-commercial-advanced                   |
| version_comment         | MySQL Enterprise Server - Advanced Edition (Commercial) |
| version_compile_machine | x86_64                                                  |
| version_compile_os      | Linux                                                   |
+-------------------------+---------------------------------------------------------+
7 rows in set (0.00 sec)

[test]> explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=4 and c3=200);
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | i1    | range | PRIMARY       | PRIMARY | 12      | NULL |    3 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

[test]> explain select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=3 and c3=200);
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | i1    | range | PRIMARY       | PRIMARY | 12      | NULL | 1000 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

[test]> set optimizer_trace='enabled=on';
Query OK, 0 rows affected (0.00 sec)

[test]> select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=3 and c3=200);
+----+----+-----+------+------+
| c1 | c2 | c3  | c4   | c5   |
+----+----+-----+------+------+
|  1 |  2 |   5 |    0 |    0 |
|  1 |  3 |   1 |    0 |    0 |
|  1 |  3 | 200 |    0 |    0 |
+----+----+-----+------+------+
3 rows in set (0.00 sec)

[test]> SELECT trace FROM information_schema.optimizer_trace\G
*************************** 1. row ***************************
..

                  "chosen_range_access_summary": {
                    "range_access_plan": {
                      "type": "range_scan",
                      "index": "PRIMARY",
                      "rows": 1000,
                      "ranges": [
                        "1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5",
                        "1 <= c1 <= 1 AND 3 <= c2 <= 3"
                      ]
                    },
                    "rows_for_plan": 1000,
                    "cost_for_plan": 203.23,
                    "chosen": true
                  }
[10 Feb 2015 0:53] Yoshinori Matsunobu
As far as I've investigated, this issue happens when:
- Primary (or unique not null) keys having three (or more) columns as a multiple-column index
- Using OR conditions using all of the key columns
- Using duplicate conditions for the prefix columns
Here is an example query causing the problem.

select * from i1 where (c1=1 and c2=2 and c3=5) or (c1=1 and c2=3 and c3=10) or (c1=1 and c2=3 and c3=1) or (c1=1 and c2=4 and c3=0);

There is primary key (c1, c2, c3), and two OR conditions have (c1=1 and c2=3).
In this case, somehow c3 was not used for filtering rows.
"1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5",
"1 <= c1 <= 1 AND 3 <= c2 <= 3",
"1 <= c1 <= 1 AND 4 <= c2 <= 4 AND 0 <= c3 <= 0"
If there are many rows matching (c1=1 and c2=3), this query is much more expensive than using all key columns to filter rows.

As far as I've checked with debugger, tree_or() and sel_trees_can_be_ored() somehow merged "(c1=1 AND c2=3 AND c3=10) or (c1=1 AND c2=3 AND c3=1)" conditions into one "(c1=1 AND c2=3)" condition. Actual merge seemed happened at key_or() function.
T@4 : | | | | | | | | | | | >tree_or
T@4 : | | | | | | | | | | | | >sel_trees_can_be_ored
T@4 : | | | | | | | | | | | | | info: sel_tree: tree1->keys[0(real_keynr: 0)]: (1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5) OR (1 <= c1 <= 1 AND 3 <= c2 <= 3 AND 10 <= c3 <= 10)
T@4 : | | | | | | | | | | | | | info: sel_tree: tree2->keys[0(real_keynr: 0)]: (1 <= c1 <= 1 AND 3 <= c2 <= 3 AND 1 <= c3 <= 1)
T@4 : | | | | | | | | | | | | <sel_trees_can_be_ored 7302
T@4 : | | | | | | | | | | | <tree_or 7501
T@4 : | | | | | | | | | | | info: sel_tree: after_or->keys[0(real_keynr: 0)]: (1 <= c1 <= 1 AND 2 <= c2 <= 2 AND 5 <= c3 <= 5) OR (1 <= c1 <= 1 AND 3 <= c2 <= 3)
T@4 : | | | | | | | | | | | >get_mm_tree

The problem repeated in MySQL 5.5 and 5.6. I couldn't repeat in MariaDB 5.5 and 10.0.
[10 Feb 2015 1:30] Yoshinori Matsunobu
https://github.com/mysql/mysql-server/blob/5.7/sql/opt_range.cc#L8076
----
 if (key1->use_count > 0 || !(key1=key1->clone_tree(param)))
  return 0; // OOM
----
Should || be && ?

I verified replacing || with && fixed the problem I described in this bug report.
[20 May 2015 16:35] Paul DuBois
Noted in 5.6.26, 5.7.8, 5.8.0 changelogs.

The optimizer could incorrectly believe an out-of-memory condition
had occurred for optimization of range scans for OR operators, and
thus return an overestimate of qualifying rows.
[17 Jun 2015 4:42] Sreeharsha Ramanavarapu
Reverted fix for 20229614 which caused a regression.
[26 Aug 2015 14:11] Paul DuBois
Noted in 5.6.27, 5.7.9, 5.8.0 changelogs.

The optimizer could incorrectly assume an out-of-memory condition
while optimizing a range scan for the OR operator, resulting in
overestimation of the number of qualifying rows.