MySQL Bugs: #23322: Optimizer sometimes erroniously prefers other index over index merge

Bug #23322	Optimizer sometimes erroniously prefers other index over index merge
Submitted:	16 Oct 2006 5:50	Modified:	31 Aug 2012 11:29
Reporter:	Valeriy Kravchuk	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.0.28-BK, 5.0.27-BK, 5.0.24, 5.1.37-bzr	OS:	Linux (Linux)
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	bfsm_2006_12_07

Description:
mysql> show create table tm1\G
*************************** 1. row ***************************
       Table: tm1
Create Table: CREATE TABLE `tm1` (
  `c1` int(11) NOT NULL auto_increment,
  `c2` char(20) default NULL,
  `c3` char(20) default NULL,
  `c4` int(11) default NULL,
  PRIMARY KEY  (`c1`),
  KEY `k1` (`c2`),
  KEY `k2` (`c3`),
  KEY `k3` (`c4`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> show table status like 'tm1'\G
*************************** 1. row ***************************
           Name: tm1
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 239
 Avg_row_length: 205
    Data_length: 49152
Max_data_length: 0
   Index_length: 49152
      Data_free: 0
 Auto_increment: 259
    Create_time: 2006-10-13 14:25:23
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment: InnoDB free: 454656 kB
1 row in set (0.02 sec)

mysql> analyze table tm1;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| test.tm1 | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.01 sec)

mysql> select distinct c1 from tm1 where (c2='e' OR c3='q');
Empty set (0.01 sec)

mysql> explain select distinct c1 from tm1 where (c2='e' OR c3='q');
+----+-------------+-------+-------------+---------------+-------+---------+----
--+------+---------------------------------+
| id | select_type | table | type        | possible_keys | key   | key_len | ref
  | rows | Extra                           |
+----+-------------+-------+-------------+---------------+-------+---------+----
--+------+---------------------------------+
|  1 | SIMPLE      | tm1   | index_merge | k1,k2         | k1,k2 | 21,21   | NUL
L |    2 | Using union(k1,k2); Using where |
+----+-------------+-------+-------------+---------------+-------+---------+----
--+------+---------------------------------+
1 row in set (0.01 sec)

mysql> select distinct c1 from tm1 where (c4 != 0) AND (c2='e' OR c3='q');
Empty set (0.01 sec)

mysql> explain select distinct c1 from tm1 where (c4 != 0) AND (c2='e' OR c3='q
');
+----+-------------+-------+-------+---------------+------+---------+------+----
--+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+-------------+
|  1 | SIMPLE      | tm1   | range | k1,k2,k3      | k3   | 5       | NULL |   1
1 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+-------------+
1 row in set (0.01 sec)

Note that optimizer selected plan with 11 rows instead of 2! And k3 in NOT a primary key. It is a bug.

How to repeat:
create table tm1(c1 int auto_increment primary key, c2 char(20), c3 char
(20), c4 int);
alter table tm1 add key k1 (c2);
alter table tm1 add key k2 (c3);
alter table tm1 add key k3(c4);
insert into tm1 values(null, 'a', 'b', 0);
insert into tm1 values(null, 'c', 'b', 0);
insert into tm1 values(null, 'a', 'd', 0);
insert into tm1 values(null, 'ccc', 'qqq', 0);
insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
analyze table tm1;
explain select distinct c1 from tm1 where (c2='e' OR c3='q');
explain select distinct c1 from tm1 where (c4 != 0) AND (c2='e' OR c3='q');

Suggested fix:
Fix optimizer to use merge optimization + additional check of wgere clause in cases like this. It works perfectly if there is no index (`k3`) on column c4.

This is not a bug: See our 5.0 manual:
[7.2.6. Index Merge Optimization]
<<
Note: The Index Merge optimization algorithm has the following known deficiencies:
If a range scan is possible on some key, an Index Merge is not considered. For example, consider this query:
  SELECT * FROM t1 WHERE (goodkey1 < 10 OR goodkey2 < 20) AND badkey < 30;
For this query, two plans are possible:
  An Index Merge scan using the (goodkey1 < 10 OR goodkey2 < 20) condition.
  A range scan using the badkey < 30 condition.
However, the optimizer considers only the second plan. If that is not what you want, you can make the optimizer consider Index Merge by using IGNORE INDEX or FORCE INDEX.
>>

Valeriy,

For the test case you provided all rows have c4=0. So the optimizer choice for this  case looks right.
Probably you wanted to have several rows (10-20) with c4 != 0?

Igor.

Igor,

Indeed, in my original public test case optimizer makes decision that may be as good as merge if not better. We had discussed more complex test case from csutomer later, in private comments. Maybe you are not allowed to see them now.

So, I had modified my original test case a bit to show the real problem again:

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.1.38-debug |
+--------------+
1 row in set (0.00 sec)

mysql> create table tm1(c1 int auto_increment primary key, c2 char(20), c3 char
    -> (20), c4 int);
Query OK, 0 rows affected (0.02 sec)

mysql> alter table tm1 add key k1 (c2);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table tm1 add key k2 (c3);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table tm1 add key k3(c4);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into tm1 values(null, 'a', 'b', 0);
Query OK, 1 row affected (0.00 sec)

mysql> insert into tm1 values(null, 'c', 'b', 0);
Query OK, 1 row affected (0.00 sec)

mysql> insert into tm1 values(null, 'a', 'd', 0);
Query OK, 1 row affected (0.00 sec)

mysql> insert into tm1 values(null, 'ccc', 'qqq', 0);
Query OK, 1 row affected (0.01 sec)

mysql> insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
Query OK, 8 rows affected (0.01 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
Query OK, 16 rows affected (0.01 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql> insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
Query OK, 32 rows affected (0.00 sec)
Records: 32  Duplicates: 0  Warnings: 0

mysql> insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
Query OK, 64 rows affected (0.01 sec)
Records: 64  Duplicates: 0  Warnings: 0

mysql> insert into tm1 (c2,c3) select c2,c3 from tm1 where c2 != 'a';
Query OK, 128 rows affected (0.04 sec)
Records: 128  Duplicates: 0  Warnings: 0

This is where I stopped initially. Now, as you suggested, I am adding some rows where c4 != 0, a lot of them actually:

mysql> insert into tm1 (c2,c3,c4) select c2,c3,1 from tm1 where c2 != 'a';
Query OK, 256 rows affected (0.07 sec)
Records: 256  Duplicates: 0  Warnings: 0

mysql> insert into tm1 (c2,c3,c4) select c2,c3,2 from tm1 where c2 != 'a';
Query OK, 512 rows affected (0.09 sec)
Records: 512  Duplicates: 0  Warnings: 0

mysql> insert into tm1 (c2,c3,c4) select c2,c3,3 from tm1 where c2 != 'a';
Query OK, 1024 rows affected (0.16 sec)
Records: 1024  Duplicates: 0  Warnings: 0

mysql> insert into tm1 (c2,c3,c4) select c2,c3,4 from tm1 where c2 != 'a';
Query OK, 2048 rows affected (0.36 sec)
Records: 2048  Duplicates: 0  Warnings: 0

mysql> select count(*) from tm1 where (c2='e' OR c3='q');
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from tm1 where c4 != 0;
+----------+
| count(*) |
+----------+
|     3840 |
+----------+
1 row in set (0.02 sec)

With smaller number of rows with c4 != 0 optimizer may choose a rangle scan on k3, but it istill may be worse that index merge. But this is extreme case: 

mysql> explain select distinct c1 from tm1 where (c4 != 0) AND (c2='e' OR c3='q');
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | tm1   | ALL  | k1,k2,k3      | NULL | NULL    | NULL | 4098 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select distinct c1 from tm1 force index(k1,k2) where (c4 != 0) AND (c2='e' OR c3='q');
+----+-------------+-------+-------------+---------------+-------+---------+------+------+---------------------------------+
| id | select_type | table | type        | possible_keys | key   | key_len | ref  | rows | Extra                           |
+----+-------------+-------+-------------+---------------+-------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | tm1   | index_merge | k1,k2         | k1,k2 | 21,21   | NULL |    2 | Using union(k1,k2); Using where |
+----+-------------+-------+-------------+---------------+-------+---------+------+------+---------------------------------+
1 row in set (0.00 sec)

As you can see, because estimated number of rows for range scan is too big, optimizer just prefers ALL over range, but still ignores muuch more eeficient index merge:

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 21    |
| Handler_read_next     | 3879  |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 8826  |
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> select distinct c1 from tm1 force index(k1,k2) where (c4 != 0) AND (c2='e' OR c3='q');
Empty set (0.01 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 23    |
| Handler_read_next     | 3879  |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 8826  |
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> select distinct c1 from tm1 where (c4 != 0) AND (c2='e' OR c3='q');
Empty set (0.01 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 23    |
| Handler_read_next     | 3879  |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 12925 |
+-----------------------+-------+
6 rows in set (0.01 sec)

This is what happned to customer. And this is the problem, as full table scan (or biug range scan) will become thousands of times slower eventually in some cases than index_merge.

Duplicate of BUG#30151, fixed in 5.6.6