Bug #71140 Optimizer does NOT use range checks in typical self-join for InnoDB tables
Submitted: 13 Dec 2013 9:16 Modified: 31 Jul 2015 8:59
Reporter: Valeriy Kravchuk Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.1, 5.5, 5.6.15, 5.7.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: performance, range check, self-join

[13 Dec 2013 9:16] Valeriy Kravchuk
Description:
Users often want to count or check records in a certain "interval" around any given record in a table. This "interval" can be defined over datatime/timestamp column or over integer column, and to get the results one has to join table with itself, like in this case:

mysql> explain select t1.id, count(*) from testi t1 join testi t2 on t2.c1 betwe
en t1.c1 and t1.c1 + 30 group by t1.id;
+----+-------------+-------+-------+---------------+------+---------+------+----
---+-----------------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s  | Extra                                                           |
+----+-------------+-------+-------+---------------+------+---------+------+----
---+-----------------------------------------------------------------+
|  1 | SIMPLE      | t2    | index | c1            | c1   | 5       | NULL | 660
34 | Using index; Using temporary; Using filesort                    |
|  1 | SIMPLE      | t1    | index | PRIMARY,c1    | c1   | 5       | NULL | 660
34 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+------+---------+------+----
---+-----------------------------------------------------------------+
2 rows in set (0.01 sec)

The plan above is actually bad, as optimizer plans to scan entire index for each row (it can be table scan as well in similar cases). It's easy to find/force better plan - one just has to start from t1:

mysql> explain select t1.id, count(*) from testi t1 straight_join testi t2 on t2
.c1 between t1.c1 and t1.c1 + 30 group by t1.id;
+----+-------------+-------+-------+---------------+---------+---------+------+-
------+------------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows  | Extra                                          |
+----+-------------+-------+-------+---------------+---------+---------+------+-
------+------------------------------------------------+
|  1 | SIMPLE      | t1    | index | PRIMARY,c1    | PRIMARY | 4       | NULL |
66034 | NULL                                           |
|  1 | SIMPLE      | t2    | ALL   | c1            | NULL    | NULL    | NULL |
66034 | Range checked for each record (index map: 0x2) |
+----+-------------+-------+-------+---------------+---------+---------+------+-
------+------------------------------------------------+
2 rows in set (0.00 sec)

Depending on data, the difference in execution time may be 10 times faster or more:

...
| 65535 |     2046 |
| 65536 |     1975 |
+-------+----------+
65536 rows in set (2 min 19.22 sec)

The above is for the forced plan, while with the original (table data fit in the buffer pool) we have:

...
| 65535 |     2046 |
| 65536 |     1975 |
+-------+----------+
65536 rows in set (22 min 34.53 sec)

See the details of the test case below.

How to repeat:
C:\Program Files\MySQL\MySQL Server 5.6\bin>mysql -uroot -proot -P3314 test
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.6.15-log MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create table testi(id int auto_increment primary key, c1 int, c2 int, key
(c1)) engine=MyISAM;
Query OK, 0 rows affected (1.36 sec)

mysql> insert into testi(c1, c2) values (1,1);
Query OK, 1 row affected (0.20 sec)

mysql> insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
Query OK, 1 row affected (0.36 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
Query OK, 2 rows affected (0.16 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

...

mysql> insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
Query OK, 32768 rows affected (0.72 sec)
Records: 32768  Duplicates: 0  Warnings: 0

mysql> explain select t1.id, count(*) from testi t1 join testi t2 on t2.c1 betwe
en t1.c1 and t1.c1 + 30 group by t1.id;
+----+-------------+-------+-------+---------------+------+---------+------+----
---+------------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s  | Extra                                          |
+----+-------------+-------+-------+---------------+------+---------+------+----
---+------------------------------------------------+
|  1 | SIMPLE      | t2    | index | c1            | c1   | 5       | NULL | 655
36 | Using index; Using temporary; Using filesort   |
|  1 | SIMPLE      | t1    | ALL   | c1            | NULL | NULL    | NULL | 655
36 | Range checked for each record (index map: 0x2) |
+----+-------------+-------+-------+---------------+------+---------+------+----
---+------------------------------------------------+
2 rows in set (0.00 sec)

So, for MyISAM, even though join order is "wrong", range check is used. Now, for InnoDB:

mysql> alter table testi engine=InnoDB;
Query OK, 65536 rows affected (12.84 sec)
Records: 65536  Duplicates: 0  Warnings: 0

mysql> explain select t1.id, count(*) from testi t1 join testi t2 on t2.c1 betwe
en t1.c1 and t1.c1 + 30 group by t1.id;
+----+-------------+-------+-------+---------------+------+---------+------+----
---+-----------------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s  | Extra                                                           |
+----+-------------+-------+-------+---------------+------+---------+------+----
---+-----------------------------------------------------------------+
|  1 | SIMPLE      | t2    | index | c1            | c1   | 5       | NULL | 660
34 | Using index; Using temporary; Using filesort                    |
|  1 | SIMPLE      | t1    | index | PRIMARY,c1    | c1   | 5       | NULL | 660
34 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+------+---------+------+----
---+-----------------------------------------------------------------+
2 rows in set (0.31 sec)

mysql> analyze table testi;
+------------+---------+----------+----------+
| Table      | Op      | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.testi | analyze | status   | OK       |
+------------+---------+----------+----------+
1 row in set (0.47 sec)

mysql> explain select t1.id, count(*) from testi t1 join testi t2 on t2.c1 betwe
en t1.c1 and t1.c1 + 30 group by t1.id;
+----+-------------+-------+-------+---------------+------+---------+------+----
---+-----------------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s  | Extra                                                           |
+----+-------------+-------+-------+---------------+------+---------+------+----
---+-----------------------------------------------------------------+
|  1 | SIMPLE      | t2    | index | c1            | c1   | 5       | NULL | 660
34 | Using index; Using temporary; Using filesort                    |
|  1 | SIMPLE      | t1    | index | PRIMARY,c1    | c1   | 5       | NULL | 660
34 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+------+---------+------+----
---+-----------------------------------------------------------------+
2 rows in set (0.01 sec)

mysql> explain select t1.id, count(*) from testi t1 straight_join testi t2 on t2
.c1 between t1.c1 and t1.c1 + 30 group by t1.id;
+----+-------------+-------+-------+---------------+---------+---------+------+-
------+------------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows  | Extra                                          |
+----+-------------+-------+-------+---------------+---------+---------+------+-
------+------------------------------------------------+
|  1 | SIMPLE      | t1    | index | PRIMARY,c1    | PRIMARY | 4       | NULL |
66034 | NULL                                           |
|  1 | SIMPLE      | t2    | ALL   | c1            | NULL    | NULL    | NULL |
66034 | Range checked for each record (index map: 0x2) |
+----+-------------+-------+-------+---------------+---------+---------+------+-
------+------------------------------------------------+
2 rows in set (0.00 sec)

Now you may want to compare execution times for the queries above, profiles, whatever...

Test case to copy/paste:

create table testi(id int auto_increment primary key, c1 int, c2 int, key
(c1)) engine=MyISAM;
insert into testi(c1, c2) values (1,1);
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;
insert into testi(c1, c2) select rand()*1000, rand()*1000 from testi;

explain select t1.id, count(*) from testi t1 join testi t2 on t2.c1 between t1.c1 and t1.c1 + 30 group by t1.id;

alter table testi engine=InnoDB;
analyze table testi;
explain select t1.id, count(*) from testi t1 join testi t2 on t2.c1 between t1.c1 and t1.c1 + 30 group by t1.id;
explain select t1.id, count(*) from testi t1 straight_join testi t2 on t2.c1 between t1.c1 and t1.c1 + 30 group by t1.id;

select t1.id, count(*) from testi t1 straight_join testi t2 on t2.c1 between t1.c1 and t1.c1 + 30 group by t1.id;
select t1.id, count(*) from testi t1 join testi t2 on t2.c1 between t1.c1 and t1.c1 + 30 group by t1.id;

Suggested fix:
Make optimizer properly consider/estimate cost of range checks for each record in cases like the above. It's a typical use case and I feel uncomfortable being "smarter" that the optimizer in cases like this.

Alternatively, add window functions support to MySQL :)
[16 Dec 2013 11:09] MySQL Verification Team
Hello Valeriy,

Thank you for the bug report and test case.
Verified as described.

Thanks,
Umesh
[31 Jul 2015 8:59] Valeriy Kravchuk
I see the same difference of plan for MyISAM vs InnoDB vs "good one" on 5.7.7:

...

mysql> explain select t1.id, count(*) from testi t1 join testi t2 on t2.c1 between t1.c1 and t1.c1 + 30 group by t1.id;
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                                          |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------------------+
|  1 | SIMPLE      | t2    | NULL       | index | c1            | c1   | 5       | NULL | 65536 |   100.00 | Using index; Using temporary; Using filesort   |
|  1 | SIMPLE      | t1    | NULL       | ALL   | PRIMARY,c1    | NULL | NULL    | NULL | 65536 |    11.11 | Range checked for each record (index map: 0x3) |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> show warnings\G                                                          *************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `test`.`t1`.`id` AS `id`,count(0) AS `count(*)` from `test`.`testi` `t1` join `test`.`testi` `t2` where (`test`.`t2`.`c1` between `test`.`t1`.`c1` and (`test`.`t1`.`c1` + 30)) group by `test`.`t1`.`id`
1 row in set (0.00 sec)

mysql> alter table testi engine=InnoDB;
Query OK, 65536 rows affected (4.52 sec)
Records: 65536  Duplicates: 0  Warnings: 0

mysql> analyze table testi;
+------------+---------+----------+----------+
| Table      | Op      | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.testi | analyze | status   | OK       |
+------------+---------+----------+----------+
1 row in set (0.08 sec)

mysql> explain select t1.id, count(*) from testi t1 join testi t2 on t2.c1 between t1.c1 and t1.c1 + 30 group by t1.id;
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                                                           |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------------------------------------------------+
|  1 | SIMPLE      | t2    | NULL       | index | c1            | c1   | 5       | NULL | 66034 |   100.00 | Using index; Using temporary; Using filesort                    |
|  1 | SIMPLE      | t1    | NULL       | index | PRIMARY,c1    | c1   | 5       | NULL | 66034 |    11.11 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)