Bug #29680 index_merge is used instead of (faster!) range
Submitted: 10 Jul 2007 6:38 Modified: 7 Aug 2007 20:00
Reporter: Valeriy Kravchuk Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.0.46-BK, 5.0.41 OS:Any
Assigned to: Paul DuBois CPU Architecture:Any
Tags: bfsm_2007_08_02

[10 Jul 2007 6:38] Valeriy Kravchuk
Description:
As opposed to the manual, http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html:

"If a range scan is possible on some key, an Index Merge is not considered."

it is not only considered, but also choosed by optimizer, even when range access based on one index scans much less rows, and is simply faster:

openxs@linux:~/dbs/5.0> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.46-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table tm(col1 int auto_increment primary key,
    -> col2 varchar(127),
    -> col3 varchar(8),
    -> col4 datetime,
    -> col5 tinyint,
    -> col6 tinyint,
    -> key idx_2(col3),
    -> key idx_3(col4),
    -> key idx_4(col5),
    -> key idx_5(col6)) engine=InnoDB;
Query OK, 0 rows affected (0.06 sec)

mysql> insert into tm(col3,col4,col5,col6) values('const', '2007-06-28 07:10:00', 0, 0);
Query OK, 1 row affected (0.01 sec)

mysql> insert into tm(col3,col4,col5,col6) values('const', now(), 1, 2);
Query OK, 1 row affected (0.01 sec)

mysql> insert into tm(col3,col4,col5,col6) values('var', now(), 2, 3);
Query OK, 1 row affected (0.00 sec)

mysql> insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0
...

mysql> insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm w
here col6 <> 0;
Query OK, 2048 rows affected (0.22 sec)
Records: 2048  Duplicates: 0  Warnings: 0

mysql> select * from tm where col3='const'
    -> and date_sub('2007-06-29 07:10:00', interval 10 second) > col4
    -> and col5=0
    -> and col6=0;
+------+------+-------+---------------------+------+------+
| col1 | col2 | col3  | col4                | col5 | col6 |
+------+------+-------+---------------------+------+------+
|    1 | NULL | const | 2007-06-28 07:10:00 |    0 |    0 |
+------+------+-------+---------------------+------+------+
1 row in set (0.01 sec)

mysql> explain select * from tm where col3='const' and date_sub('2007-06-29 07:
10:00', interval 10 second) > col4 and col5=0 and col6=0;
+----+-------------+-------+------+-------------------------+-------+---------+-
------+------+-------------+
| id | select_type | table | type | possible_keys           | key   | key_len |
ref   | rows | Extra       |
+----+-------------+-------+------+-------------------------+-------+---------+-
------+------+-------------+
|  1 | SIMPLE      | tm    | ref  | idx_2,idx_3,idx_4,idx_5 | idx_4 | 2       |
const |    1 | Using where |
+----+-------------+-------+------+-------------------------+-------+---------+-
------+------+-------------+
1 row in set (0.00 sec)

So, now, with single row, it is OK.

mysql> analyze table tm;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.tm | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.01 sec)

mysql> explain select * from tm where col3='const' and date_sub('2007-06-29 07:
10:00', interval 10 second) > col4 and col5=0 and col6=0;
+----+-------------+-------+------+-------------------------+-------+---------+-
------+------+-------------+
| id | select_type | table | type | possible_keys           | key   | key_len |
ref   | rows | Extra       |
+----+-------------+-------+------+-------------------------+-------+---------+-
------+------+-------------+
|  1 | SIMPLE      | tm    | ref  | idx_2,idx_3,idx_4,idx_5 | idx_4 | 2       |
const |    1 | Using where |
+----+-------------+-------+------+-------------------------+-------+---------+-
------+------+-------------+
1 row in set (0.00 sec)

Let's contnue:

mysql> insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm;
Query OK, 4097 rows affected (0.38 sec)
Records: 4097  Duplicates: 0  Warnings: 0

mysql> insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm;
Query OK, 8194 rows affected (0.62 sec)
Records: 8194  Duplicates: 0  Warnings: 0

So, now we have 4 rows for that WHERE. Let's check again:

mysql> explain select * from tm where col3='const' and date_sub('2007-06-29 07:
10:00', interval 10 second) > col4 and col5=0 and col6=0;
+----+-------------+-------+-------------+-------------------------+------------
-+---------+------+------+-------------------------------------------+
| id | select_type | table | type        | possible_keys           | key
 | key_len | ref  | rows | Extra                                     |
+----+-------------+-------+-------------+-------------------------+------------
-+---------+------+------+-------------------------------------------+
|  1 | SIMPLE      | tm    | index_merge | idx_2,idx_3,idx_4,idx_5 | idx_4,idx_5
 | 2,2     | NULL |    1 | Using intersect(idx_4,idx_5); Using where |
+----+-------------+-------+-------------+-------------------------+------------
-+---------+------+------+-------------------------------------------+
1 row in set (0.00 sec)

mysql> select * from tm where col3='const' and date_sub('2007-06-29 07:10:00',
interval 10 second) > col4 and col5=0 and col6=0;
+-------+------+-------+---------------------+------+------+
| col1  | col2 | col3  | col4                | col5 | col6 |
+-------+------+-------+---------------------+------+------+
|     1 | NULL | const | 2007-06-28 07:10:00 |    0 |    0 |
|  4098 | NULL | const | 2007-06-28 07:10:00 |    0 |    0 |
|  8195 | NULL | const | 2007-06-28 07:10:00 |    0 |    0 |
| 12292 | NULL | const | 2007-06-28 07:10:00 |    0 |    0 |
+-------+------+-------+---------------------+------+------+
4 rows in set (0.01 sec)

So, now index merge (intersect) is used. Somehow optmizer decided that only one row will satisfy all conditions, while we have 4 that satisfy each one of them + we need some time and resources for merging!

The main problem, formally, is that optimizer does NOT work as described in the manual, though. We do have index that can be used for range scan:

mysql> explain select * from tm force index(idx_3) where col3='const' and date_
sub('2007-06-29 07:10:00', interval 10 second) > col4 and col5=0 and col6=0;
+----+-------------+-------+-------+---------------+-------+---------+------+---
---+-------------+
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | ro
ws | Extra       |
+----+-------------+-------+-------+---------------+-------+---------+------+---
---+-------------+
|  1 | SIMPLE      | tm    | range | idx_3         | idx_3 | 9       | NULL |
 3 | Using where |
+----+-------------+-------+-------+---------------+-------+---------+------+---
---+-------------+
1 row in set (0.00 sec)

and in reality this access path is the best. Running ANALYZE/CHECK etc does not help.

How to repeat:
create table tm(
col1 int auto_increment primary key,
col2 varchar(127),
col3 varchar(8),
col4 datetime,
col5 tinyint,
col6 tinyint,
key idx_2(col3),
key idx_3(col4),
key idx_4(col5),
key idx_5(col6)) engine=InnoDB;

insert into tm(col3,col4,col5,col6) values('const', '2007-06-28 07:10:00', 0, 0);
insert into tm(col3,col4,col5,col6) values('const', now(), 1, 2);
insert into tm(col3,col4,col5,col6) values('var', now(), 2, 3);

insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm where col6 <> 0;

explain select * from tm where col3='const' and date_sub('2007-06-29 07:10:00', interval 10 second) > col4 and col5=0 and col6=0;

insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm;
insert into tm(col3,col4,col5,col6) select col3,col4,col5,col6 from tm;

explain select * from tm where col3='const' and date_sub('2007-06-29 07:10:00', interval 10 second) > col4 and col5=0 and col6=0;

explain select * from tm force index(idx_3) where col3='const' and date_sub('2007-06-29 07:10:00', interval 10 second) > col4 and col5=0 and col6=0;

Suggested fix:
Do it as described in the manual?

Calculate rows/costs properly for index_merge?
[7 Aug 2007 19:32] Sergey Petrunya
Please make these cahgnes in the documentation: 

In section 6.2.6 Index Merge Optimization (http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html):

Replace the words 

"If a range scan is possible on some key, an Index Merge is not considered."

with

"If a range scan is possible on some key, the optimizer will not consider using Index Merge Union or Index Merge Sort-Union algorithms".
[7 Aug 2007 19:57] Paul DuBois
Recategorizing as Documentation bug and assigning to myself.
[7 Aug 2007 20:00] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.