MySQL Bugs: #68554: Optimizer wrongly choses covering index over ICP

Bug #68554	Optimizer wrongly choses covering index over ICP
Submitted:	3 Mar 2013 16:16	Modified:	18 Mar 2013 12:42
Reporter:	Jaime Crespo (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.6.10	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	condition, covering, ICP, Optimizer, pushdown, USING

Description:
Index condition pushdown seems to have lower precedence/cannot be used by the optimizer when covering index technique is available (at least in my scenario, see details below). In many cases, working at engine level instead of going back and forth to the SQL layer will be preferred, specially if data is cached.

I would say that it is an (of course understandable) limitation/lack of proper tuning of the newly introduced optimizer features.

How to repeat:
Configuration:

All defaults execept basic innodb parameters:
cat /etc/my.cnf
[mysqld]
server-id = 1
#log-bin = binlog
datadir=/usr/local/mysql/data/
socket=/home/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

#innodb
innodb_file_per_table
innodb_buffer_pool_size = 5G
innodb_log_file_size = 128M

[mysqld_safe]
log-error=/usr/local/mysql/data/mysqld.err
pid-file=/usr/local/mysql/data/mysqld.pid

[client]
user=root
socket=/home/mysql/mysql.sock
 

mysql> show variables like 'optimizer_switch'\G
*************************** 1. row ***************************
Variable_name: optimizer_switch
        Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on
1 row in set (0.00 sec)

Table:

CREATE TABLE `cast_info` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `person_id` int(11) NOT NULL,
  `movie_id` int(11) NOT NULL,
  `person_role_id` int(11) DEFAULT NULL,
  `note` varchar(250),
  `nr_order` int(11) DEFAULT NULL,
  `role_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `role_id_note` (`role_id`,`note`)
) ENGINE=InnoDB AUTO_INCREMENT=22187769 DEFAULT CHARSET=utf8;

Data:

There are about 22M rows. Aproximatelly 8M of them have role_id = 1 . 266 have role_id = 1 *and* contain the word 'Jaime' in the field note.

Queries:

SELECT * FROM cast_info FORCE INDEX(role_id_note) WHERE role_id = 1 and note like '%Jaime%';
SELECT role_id FROM cast_info FORCE INDEX(role_id_note) WHERE role_id = 1 and note like '%Jaime%';

Results
=======

(Ok results)

In MySQL 5.5.30:

mysql-5.5> SELECT * FROM cast_info WHERE role_id = 1 and note like '%Jaime%'\G
266 rows in set (5.76 sec)

mysql-5.5> EXPLAIN SELECT * FROM cast_info WHERE role_id = 1 and note like '%Jaime%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cast_info
         type: ref
possible_keys: role_id_note
          key: role_id_note
      key_len: 4
          ref: const
         rows: 11553718
        Extra: Using where
1 row in set (0.01 sec)

mysql> show status like 'Hand%';
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 1       |
| Handler_delete             | 0       |
| Handler_discover           | 0       |
| Handler_prepare            | 0       |
| Handler_read_first         | 0       |
| Handler_read_key           | 1       |
| Handler_read_last          | 0       |
| Handler_read_next          | 8346769 |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 0       |
| Handler_read_rnd_next      | 0       |
| Handler_rollback           | 0       |
| Handler_savepoint          | 0       |
| Handler_savepoint_rollback | 0       |
| Handler_update             | 0       |
| Handler_write              | 0       |
+----------------------------+---------+
16 rows in set (0.00 sec)

In MySQL 5.6:

mysql-5.6> SELECT * FROM cast_info WHERE role_id = 1 and note like '%Jaime%';
266 rows in set (1.09 sec)

mysql-5.6> EXPLAIN SELECT * FROM cast_info WHERE role_id = 1 and note like '%Jaime%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cast_info
         type: ref
possible_keys: role_id_note
          key: role_id_note
      key_len: 4
          ref: const
         rows: 10259274
        Extra: Using index condition
1 row in set (0.00 sec)

mysql-5.6> show status like 'Hand%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 266   |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
18 rows in set (0.00 sec)

(Unexpected results)

mysql-5.6> SELECT role_id FROM cast_info WHERE role_id = 1 and note like '%Jaime%'\G
266 rows in set (1.82 sec)

mysql-5.6> EXPLAIN SELECT role_id FROM cast_info WHERE role_id = 1 and note like '%Jaime%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cast_info
         type: ref
possible_keys: role_id_note
          key: role_id_note
      key_len: 4
          ref: const
         rows: 10259274
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql-5.6> SELECT role_id FROM cast_info WHERE role_id = 1 and note like '%Jaime%';
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 1       |
| Handler_delete             | 0       |
| Handler_discover           | 0       |
| Handler_external_lock      | 2       |
| Handler_mrr_init           | 0       |
| Handler_prepare            | 0       |
| Handler_read_first         | 0       |
| Handler_read_key           | 1       |
| Handler_read_last          | 0       |
| Handler_read_next          | 8346769 |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 0       |
| Handler_read_rnd_next      | 0       |
| Handler_rollback           | 0       |
| Handler_savepoint          | 0       |
| Handler_savepoint_rollback | 0       |
| Handler_update             | 0       |
| Handler_write              | 0       |
+----------------------------+---------+
18 rows in set (0.00 sec)

Suggested fix:
Last query, while only selecting a single column *is actually slower than selecting all columns* because in the last case, the "using index" optimization is preferred over "using index condition". This should not be always the case, as even if "using index" avoids row reads, the number of sequential key reads are hugely different, taking almost twice the time to execute.

Another (related?) topic: Should "rows" for explain when using ICP be such a large number, considering the number of row operations seen on the Handlers is so different?

If Using index condition and Using index are mutually exclusive, ICP should be chosen when the number of index entries to read is orders of magnitude different.

I can attach an optimizer trace for the last query.

optimizer trace for single-column SELECT

Attachment: trace.json (application/octet-stream, text), 6.16 KiB.

Thanks for a very detailed bug report. Good to see the very positive
effect ICP has on the execution time on the first of your queries.

About your second query: Your observations are correct. If the
optimizer decides that the query can be executed by just reading the
index (ie "Using index") then ICP will not be used. And your are right
that for this query, using ICP would have improved the execution time
due to the high filtering effect of the index condition.

The decision to not use ICP for queries that only reads from the index
is how this is currently implemented. Some background on why I think
this choice was done is that the main goal of ICP is to reduce the
number of full records that needs to be read and thus potentially save
on IO operations. For the case where only the index needs to be read,
the saving will mostly be in reduced CPU usage from not having to
return to the server layer to evaluate the condition. This reduction
in CPU usage will be highly dependent on the amount of filtering the
condition will do. For queries where the condition does not filter out
any or just a small fraction of the records, there might even be a
small overhead by pushing the condition to the storage engine and
evaluate it there. Thus, the current ICP implementation is
conservativ and only uses ICP for the case where there is most to
gain.

Conceptually and coding wise it would be easy to enable ICP for being
used also in the case where only the index is needed. To ensure that
it would be used only when the likelihood of getting good filtering
effect would require to have an estimate on the selectivity of the
condition. We do not have this information in the optimizer today. In
your example query, we do not have any statistics or heuristics that
could tell us that the condition "note like '%Jaime%'" would filter
out most records.

As you said, ICP is a newly introduced optimizer feature. There are
rooms for improvements. Bug reports like the one you have made are
good since it gives us feedback with real examples where extending the
use of ICP would benefit the performance.

About the "rows" estimate for explain when using ICP: This is the
estimate that the optimizer has on how many rows that ref access on
"role_id = 1" will produce. The optimizer does not at the moment have
any estimate or statistics on how many of these that will be filtered
out by the "note like '%Jaime%'" condition. Thus, it is hard to take
the filtering effect of using ICP into account in the "rows"
estimate. And I am not sure it should take it into account
either. Using ICP will reduce the number of full records that need to
be read, but still the index entries for all these records will be
read and evaluated.

I have marked this as a "Feature request" to enable Index condition pushdown to be used for "index only" queries.

I hope this can implemented soon, on small table the performance difference is negligible but on large tables this is more problematic and non-deterministic performance.

I've created another test case on 5.6.29.

mysql [localhost] {msandbox} (test) > create table t (id int unsigned not null auto_increment primary key, x int not null default 0, key (x, id)) engine=innodb;
Query OK, 0 rows affected (0.03 sec)

mysql [localhost] {msandbox} (test) > insert into t (x) values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (10), (7), (13), (15), (16);
Query OK, 15 rows affected (0.01 sec)
Records: 15  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > insert into t (x) select x from t;
Query OK, 15 rows affected (0.01 sec)
Records: 15  Duplicates: 0  Warnings: 0
...
mysql [localhost] {msandbox} (test) > insert into t (x) select x from t;
Query OK, 491520 rows affected (1.90 sec)
Records: 491520  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > explain select * from t where x = 13 and id between 245760 and 245762;
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | t     | ref  | PRIMARY,x     | x    | 4       | const |    1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > explain select * from t where x = 13 and id between 245760 and 245761;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t     | range | PRIMARY,x     | x    | 8       | NULL |    1 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > explain select * from t where x = 13 and id between 245760 and 245762;
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | t     | ref  | PRIMARY,x     | x    | 4       | const |    1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)

I realized my test case is a little different so I created a new bug http://bugs.mysql.com/bug.php?id=80856

Related: Bug#104659.