Bug #68789 Optimiser prefers primary key over secondary index for a range where
Submitted: 27 Mar 2013 1:12 Modified: 24 Jun 2013 4:33
Reporter: Gareth Humphries Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.5.28 OS:Any
Assigned to: CPU Architecture:Any

[27 Mar 2013 1:12] Gareth Humphries
Description:
This probably applies to the optimser as well as InnoDB, but it is an InnoDB feature I'm trying to take advantage of.

I have a where that include a range on the primary key ( val <= p_key) and a range on a secondary column (val >= col). 

I would expect the query to use the secondary index on col, as due to Innodb using clustered indexes, that includes the primary key as well.

How to repeat:
mysql> create table repro (startip integer unsigned not null primary key, endip integer unsigned not null, value char(1));

mysql> create unique index ix_end on repro (endip);

$ ( echo "begin work;"; perl -e 'print "INSERT INTO repro (startip,endip,value) values (" . ($_ * 0x100) . ", " . (($_  * 0x100) + 0xff) . ", \"X\");\n" foreach ( 0x010000 .. 0xdfffff)' ; echo "commit;" ) | mysql -u repro repro

mysql> explain select value from repro where startip < inet_aton("50.50.50.50") and endip > inet_aton("50.50.50.50");
+----+-------------+-------+-------+----------------+---------+---------+------+---------+-------------+
| id | select_type | table | type  | possible_keys  | key     | key_len | ref  | rows    | Extra       |
+----+-------------+-------+-------+----------------+---------+---------+------+---------+-------------+
|  1 | SIMPLE      | repro | range | PRIMARY,ix_end | PRIMARY | 4       | NULL | 6232598 | Using where |
+----+-------------+-------+-------+----------------+---------+---------+------+---------+-------------+

I would expect to see this plan use ix_endip, so it can use startip at the index level instead of column level.
[29 Mar 2013 18:50] MySQL Verification Team
Hi,

Would you please let us know the results of the following two queries:

select count(*) from repro where startip < inet_aton("50.50.50.50");

and

select count(*) from repro where endip > inet_aton("50.50.50.50");
[29 Mar 2013 22:34] Gareth Humphries
mysql> select count(*) from repro where startip < inet_aton("50.50.50.50");
+----------+
| count(*) |
+----------+
|  3224115 |
+----------+
1 row in set (0.91 sec)

mysql> select count(*) from repro where startip > inet_aton("50.50.50.50");
+----------+
| count(*) |
+----------+
| 11390413 |
+----------+
1 row in set (2.85 sec)

mysql>
[29 Mar 2013 23:00] Gareth Humphries
Sorry, just noticed the where was slightly different between your queries:

mysql> select count(*) from repro where endip > inet_aton("50.50.50.50");
+----------+
| count(*) |
+----------+
| 11390414 |
+----------+
1 row in set (2.46 sec)
[11 Apr 2013 9:03] Jørgen Løland
Hi Gareth,

Thank you for the bug report.

You're right that the primary key columns are part of InnoDB secondary keys, but prior to version 5.6, MySQL was in most cases (including range access) not able to use these primary key fields. For more info:  http://glukhsv.blogspot.co.uk/2012/12/innodb-extended-secondary-keys.html

However, switching to MySQL 5.6 will not help with this query because the range access method is only able to make use of the second keypart (the PK in your case) if the comparison operator on the first keypart is either =, <=>, or IS NULL. Your query uses the ">" operator on the first keypart. For more info: http://jorgenloland.blogspot.co.uk/2011/08/mysql-range-access-method-explained.html

The conclusion is therefore that this is not a bug.
[24 Jun 2013 4:28] Gareth Humphries
Thanks  Jørgen.  I thought the loose index scan feature added in 5.5 would have applied in this case?  Is there scope for it to be expanded so it will, as it seems valuable for performance.
[24 Jun 2013 4:33] Gareth Humphries
Oh, sorry - I just saw that the link you provided only applies from 5.6.9 onward.

So to summarise: I could add a secondary key that explicitly includes the primary key (and duplicates it, due to the clustering), then the loose index scan would work.  As of 5.6.9, it can use the implicit primary key as if it were added in this way.

Thanks.