MySQL Bugs: #79990: index used to return data not range scanned

Bug #79990	index used to return data not range scanned
Submitted:	14 Jan 2016 16:28	Modified:	5 Jul 2018 15:42
Reporter:	Steven Hartland	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.6.27	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	INDEX, log_queries_not_using_indexes, performance, range

Description:
When there's a index on a table that starts with the field to filter and then contains the field to return the index is used but instead of doing a range scan it results in partial table scan, with the optimiser reporting "Using where, Using index"

The expected result would to be to perform a range scan.

How to repeat:
-- ----------------------------
-- Table structure for index_test
-- ----------------------------
DROP TABLE IF EXISTS `index_test`;
CREATE TABLE `index_test` (
  `pk1` int(11) NOT NULL AUTO_INCREMENT,
  `id1` int(11) NOT NULL,
  `val1` int(11) NOT NULL,
  `val2` int(11) NOT NULL,
  PRIMARY KEY (`pk1`),
  UNIQUE KEY `unq_id1_val1_val2` (`id1`,`val1`,`val2`) USING BTREE,
  UNIQUE KEY `unq_id1` (`id1`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;

-- ----------------------------
-- Records of index_test
-- ----------------------------
INSERT INTO `index_test` VALUES ('1', '0', '1', '1');
INSERT INTO `index_test` VALUES ('2', '1', '1', '2');
INSERT INTO `index_test` VALUES ('3', '2', '2', '2');
INSERT INTO `index_test` VALUES ('4', '4', '2', '3');
INSERT INTO `index_test` VALUES ('7', '25', '4', '4');
INSERT INTO `index_test` VALUES ('6', '29', '3', '4');
INSERT INTO `index_test` VALUES ('5', '55', '3', '3');
INSERT INTO `index_test` VALUES ('8', '237', '4', '5');

-- ----------------------------
-- Run the following SELECT
-- ----------------------------
EXPLAIN SELECT val1
FROM index_test i
WHERE i.id1 IN ( 0, 237 )
ORDER BY id1 DESC
LIMIT 1;

+----+-------------+-------+-------+---------------------------+-------------------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys             | key               | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------------------+-------------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | i     | index | unq_id1_val1_val2,unq_id1 | unq_id1_val1_val2 | 12      | NULL |    4 | Using where; Using index |
+----+-------------+-------+-------+---------------------------+-------------------+---------+------+------+--------------------------+

As you can see the "type" is "index" and the rows accessed is 4.

-- ----------------------------
-- Run the following SELECT which forces the use of index unq_id1, we get the expected results accessing only 2 rows.
-- ----------------------------
EXPLAIN SELECT val1
FROM index_test i FORCE INDEX(unq_id1)
WHERE i.id1 IN ( 0, 237 )
ORDER BY id1 DESC
LIMIT 1;

+----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------+
|  1 | SIMPLE      | i     | range | unq_id1       | unq_id1 | 4       | NULL |    2 | Using index condition |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-----------------------+

-- ----------------------------
-- In the original statement is logged to the slow query log when log_queries_not_using_indexes=ON, again unexpected
-- ----------------------------

Suggested fix:
Even when using the index for data the optimiser should still perform a range scan.

The rows reported by the explain being inflated seems to be related to the "IN" clause containing exactly 2 items.

In addition the rows reported in this case appear to be ~ 1/2 the number of rows in the table.

If the "IN" contains 0 the type is "const" and only 1 row is accessed.

If the "IN" contains more than 2 items then the number of rows reported matches the number of items in the "IN" clause and the "type" is "range".

Hi,

Thank you for your report.

Actually, your conclusions are wrong. This is because the chosen algorithm, which is "Using where, Using index" is absolutely the most optimal one. That means that no data pages are touched at all. The algorithm finds the first entry in the index that satisfies the condition and then it only reads index, never ever touching table data, unless the chosen index is the PRIMARY.

In short, this is the fastest way of returning the result set to the client side.

If there is anything that I missed, please, let me know.

Unfortunately, running a real world example with significant amounts of data, disagrees with your conclusion that using a partial table scan was quicker; this was why we raised the issue originally.

Hi,

First of all, what values are we talking about ???

Give us timings with default choice of the optimiser and and timings with FORCE INDEX. Next, beside FORCE INDEX try using optimiser hints, as described in our chapter 8.9.3. Try any of those and see if performance improves. Those hints are there because our optimiser can not always come with the perfect plan. Which is why we have those hints. That way the performance issues are resolved.

If nothing of this helps, we need a test case. Minimum dataset that would reproduce the behaviour. We need to verify the behaviour ourselves and not to rely on the "real world" examples.

Unfortunately this was over 2 years ago so we won't have the reproduction case any more.

That said we did use the hit to fix the performance issue, however if the optimiser did the expected as detailed in the ticket the hit wouldn't have been needed.

This could be a matter of the original reason for the to always do the index optimisation vs range scan is no longer relevant?

Hi,

We can not verify any bug without reproducing it.

Also, we do not have optimiser hits, but optimiser hints. These hints are the part of the optimiser so if you get good results with those, that this is not a bug.