MySQL Bugs: #80390: Clustered primary key included in index merge may cause higher execution times

Bug #80390	Clustered primary key included in index merge may cause higher execution times
Submitted:	16 Feb 2016 10:14	Modified:	16 Feb 2016 10:59
Reporter:	Olav Sandstå	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.6.30	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
This bug was found when analyzing Bug#73046. 	

For some queries where we have a clustered primary index and a secondary index which includes the columns of the primary key, and where both indexes can be used for range scan, the optimizer selects to use index merge for these indexes when it should be able to determine that using a range scan on only the secondary index would be more efficient. The situation occurs when the query condition for the range scan on the secondary index is able to utilize the complete key which includes the columns from the primary index. In this case, there is no benefit of adding the primary index to index merge. This issue will occur most frequently when using storage engines that adds the primary key columns to each secondary index (e.g. InnoDB).

Table definition:

CREATE TABLE t1 (
  pk INTEGER NOT NULL,
  i1 INTEGER NOT NULL,
  c1 CHAR(200) NOT NULL,
  PRIMARY KEY (pk),
  KEY idx (i1)
);

Query:

SELECT * FROM t1 WHERE pk > 90000 AND i1 = 100;

This query uses index merge of the primary index and the secondary index:

+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+-------------------------------------------+
| id | select_type | table | partitions | type        | possible_keys | key         | key_len | ref  | rows | filtered | Extra                                     |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+-------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | index_merge | PRIMARY,idx   | idx,PRIMARY | 8,4     | NULL |  454 |   100.00 | Using intersect(idx,PRIMARY); Using where |
+----+-------------+-------+------------+-------------+---------------+-------------+---------+------+------+----------+-------------------------------------------+

The query would be faster if range scan on the secondary index was used.

How to repeat:
CREATE TABLE t0 (
  i1 INTEGER NOT NULL
);

INSERT INTO t0 VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9); 

CREATE TABLE t1 (
  pk INTEGER NOT NULL,
  i1 INTEGER NOT NULL,
  c1 CHAR(200) NOT NULL,
  PRIMARY KEY (pk),
  KEY idx (i1)
);

INSERT INTO t1
SELECT a0.i1 + 10*a1.i1 + 100*a2.i1 + 1000*a3.i1 + 10000*a4.i1 + 100000*a5.i1,
       a0.i1 + 10*a1.i1 + 100*a2.i1,
       "Text"
FROM t0 a0, t0 a1, t0 a2, t0 a3, t0 a4, t0 a5;

explain
SELECT * FROM t1 WHERE pk > 90000 AND i1 = 100;

DROP TABLE t0, t1;

The number of handler read calls for the query:

SELECT * FROM t1 WHERE pk > 90000 AND i1 = 100;

mysql> show status like 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 911   |
| Handler_read_last     | 0     |
| Handler_read_next     | 910   |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 910   |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+

If I force the secondary index to be used, the number of handler read calls is reduced:

SELECT * FROM t1 FORCE INDEX(idx) WHERE pk > 90000 AND i1 = 100;

mysql> show status like 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 910   |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

Analysis:
=========

The query can be run as range scan on each of the two indexes:

-primary index: the range criteria "pk > 90000" will be used

-secondary index: the range criteria "i1 = 100 AND pk > 90000" will be used.

The reason "pk > 90000" can be included when using the secondary index
is that the InnoDB appends all primary key columns to each secondary
index.

When the optimizer checks for whether index merge can be used, it
includes first the secondary indexes which returns the records in
primary key order (row order). It then adds the clustered primary key
to the index merge if this results in a lower total cost. In the
current calculations for the number of rows returned by an index
merge, we assume that the number of selected records is reduced when
intersecting with an additional index. So in this case, adding the
primary key to the index merge, the resulting estimate for number of
records and cost is reduced. This causes index merge to be selected
over the range scan on the secondary index.

But since the primary index columns are included in the secondary
index AND the range optimizer is able to utilize the added columns,
there is no improvement to add the clustered primary index to the
index merge.

Suggestion: 
===========

Do not add the clustered primary index to an index merge when the
primary index columns are included in one of the secondary indexes in
the index merge AND these columns are used in the secondary index.

Testcase verified against 8.0.30

Attachment: index_merge_broken.sql (application/octet-stream, text), 103.77 KiB.

I have been affected by this issue quite a bit. I've uploaded a test-case which produces the following output:

$ mysql test < ~/Desktop/index_merge_broken.sql
Table	Op	Msg_type	Msg_text
test.t2	analyze	status	OK
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t2	NULL	index_merge	PRIMARY,id2_t	id2_t,PRIMARY	420,8	NULL	1	100.00	Using intersect(id2_t,PRIMARY); Using where; Using filesort
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t2	NULL	range	id2_t	id2_t	420	NULL	1	100.00	Using index condition
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t2	NULL	range	PRIMARY,id2_t	id2_t	420	NULL	1	100.00	Using index condition