MySQL Bugs: #67815: MySQL consistently uses a low cardinality non-referenced column in a index scan

Bug #67815	MySQL consistently uses a low cardinality non-referenced column in a index scan
Submitted:	5 Dec 2012 8:51	Modified:	6 Dec 2012 6:15
Reporter:	Ovais Tariq	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.5.28, 5.6.7	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
I have MySQL version 5.5.28 running (Server version: 5.5.28 MySQL Community Server (GPL)), and for one of the queries its consistently using a low cardinality index for a index_scan plan, when the column is not even referenced in the query. The side affect is that instead of using the PRIMARY index for the index scan its using a secondary index, scanning the PRIMARY index is going to be much faster since its a clustered index (as the table is InnoDB). This results in query performance degradation.

You can take a look at the test case below that shows that the key on the column "products.run_flat" is used however, the query clearly only accesses the column "products.id". Although the key on the column "run_flat" would be containing the primary key column "id" in the leaf nodes of the index, still its obvious that the PRIMARY INDEX itself should have been used.

How to repeat:
-- create the required tables

CREATE TABLE `products` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `run_flat` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `run_flat` (`run_flat`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=latin1;

CREATE TABLE `products_mapping` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `product_id` (`product_id`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=latin1

-- populate the table with some data
insert into products values(null, 0); <-- execute this 10 times
insert into products_mapping values(null, 1);
insert into products_mapping values(null, 1);
insert into products_mapping values(null, 2);
insert into products_mapping values(null, 2);
insert into products_mapping values(null, 3);
insert into products_mapping values(null, 3);
insert into products_mapping values(null, 3);
insert into products_mapping values(null, 3);
insert into products_mapping values(null, 3);
insert into products_mapping values(null, 3);
insert into products_mapping values(null, 4);
insert into products_mapping values(null, 5);
insert into products_mapping values(null, 6);
insert into products_mapping values(null, 6);
insert into products_mapping values(null, 6);
insert into products_mapping values(null, 6);
insert into products_mapping values(null, 7);
insert into products_mapping values(null, 7);
insert into products_mapping values(null, 8);
insert into products_mapping values(null, 9);
insert into products_mapping values(null, 10);
insert into products_mapping values(null, 10);
insert into products_mapping values(null, 10);

-- Analyze the table and check the query plan
mysql> analyze table products;
+---------------+---------+----------+----------+
| Table         | Op      | Msg_type | Msg_text |
+---------------+---------+----------+----------+
| test.products | analyze | status   | OK       |
+---------------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> show indexes from products;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| products |          0 | PRIMARY  |            1 | id          | A         |          10 |     NULL | NULL   |      | BTREE      |         |               |
| products |          1 | run_flat |            1 | run_flat    | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

mysql> explain extended select product_id from products_mapping join products on products.id = products_mapping.product_id;
+----+-------------+------------------+-------+---------------+------------+---------+------------------+------+----------+--------------------------+
| id | select_type | table            | type  | possible_keys | key        | key_len | ref              | rows | filtered | Extra                    |
+----+-------------+------------------+-------+---------------+------------+---------+------------------+------+----------+--------------------------+
|  1 | SIMPLE      | products         | index | PRIMARY       | run_flat   | 1       | NULL             |   10 |   100.00 | Using index              |
|  1 | SIMPLE      | products_mapping | ref   | product_id    | product_id | 4       | test.products.id |    1 |   100.00 | Using where; Using index |
+----+-------------+------------------+-------+---------------+------------+---------+------------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.01 sec)

mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                     |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`products_mapping`.`product_id` AS `product_id` from `test`.`products_mapping` join `test`.`products` where (`test`.`products`.`id` = `test`.`products_mapping`.`product_id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Same with 5.6.8:

...
mysql> show indexes from products\G
*************************** 1. row ***************************
        Table: products
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 10
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
*************************** 2. row ***************************
        Table: products
   Non_unique: 1
     Key_name: run_flat
 Seq_in_index: 1
  Column_name: run_flat
    Collation: A
  Cardinality: 2
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
2 rows in set (0.00 sec)

mysql> explain extended select product_id from products_mapping join products on products.id = products_mapping.product_id;
+----+-------------+------------------+-------+---------------+------------+---------+------------------+------+----------+--------------------------+
| id | select_type | table            | type  | possible_keys | key        | key_len | ref              | rows | filtered | Extra                    |
+----+-------------+------------------+-------+---------------+------------+---------+------------------+------+----------+--------------------------+
|  1 | SIMPLE      | products         | index | PRIMARY       | run_flat   | 1       | NULL             |   10 |   100.00 | Using index              |
|  1 | SIMPLE      | products_mapping | ref   | product_id    | product_id | 4       | test.products.id |    1 |   100.00 | Using where; Using index |
+----+-------------+------------------+-------+---------------+------------+---------+------------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `test`.`products_mapping`.`product_id` AS `product_id` from `test`.`products_mapping` join `test`.`products` where (`test`.`products`.`id` = `test`.`products_mapping`.`product_id`)
1 row in set (0.00 sec)

There is an explanation of the behavior.

Since 5.5, optimization exists related to the choice of the scanning of secondary indices. As every leaf of secondary index carries a PK as well, it is usually MUCH faster to scan secondary index then to scan PK. This is because scan of PK means scan of the entire table. The only problem here is that secondary index is as big as the row itself, which is not something that is taken into consideration.

But, choice of non-referenced indexed column is not a real bug here. A real bug is how can that secondary index have a cardinality of 2, when it is all 0.

In regard to the part "this results in query performance degradation", I tried to scale the provided test to 100K/200K and 1M/2M records in the table, and still haven't got any perceivable difference in execution time between the default plan and forced primary key. Of course, it was just a rough measurement through a client, not an accurate benchmark, but still, if the performance were badly affected, it should have been noticeable.

100010 / 279864 records:

mysql> select product_id from products_mapping join products on products.id = products_mapping.product_id;
Empty set (0.23 sec)

mysql> select product_id from products_mapping join products force index (primary) on products.id = products_mapping.product_id;
Empty set (0.25 sec)

=============

1000010 / 2238912 records:

mysql> select product_id from products_mapping join products on products.id = products_mapping.product_id;
Empty set (2.42 sec)

mysql> explain extended select product_id from products_mapping join products on products.id = products_mapping.product_id;
+----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+
| id | select_type | table            | type  | possible_keys | key        | key_len | ref              | rows    | filtered | Extra                    |
+----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+
|  1 | SIMPLE      | products         | index | PRIMARY       | run_flat   | 1       | NULL             | 1000521 |   100.00 | Using index              |
|  1 | SIMPLE      | products_mapping | ref   | product_id    | product_id | 4       | test.products.id |       1 |   100.00 | Using where; Using index |
+----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                     |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`products_mapping`.`product_id` AS `product_id` from `test`.`products_mapping` join `test`.`products` where (`test`.`products`.`id` = `test`.`products_mapping`.`product_id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select product_id from products_mapping join products force index (primary) on products.id = products_mapping.product_id;
Empty set (2.47 sec)

mysql> explain extended select product_id from products_mapping join products force index (primary) on products.id = products_mapping.product_id;
+----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+
| id | select_type | table            | type  | possible_keys | key        | key_len | ref              | rows    | filtered | Extra                    |
+----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+
|  1 | SIMPLE      | products         | index | PRIMARY       | PRIMARY    | 4       | NULL             | 1000521 |   100.00 | Using index              |
|  1 | SIMPLE      | products_mapping | ref   | product_id    | product_id | 4       | test.products.id |       1 |   100.00 | Using where; Using index |
+----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                           |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`products_mapping`.`product_id` AS `product_id` from `test`.`products_mapping` join `test`.`products` FORCE INDEX (PRIMARY) where (`test`.`products`.`id` = `test`.`products_mapping`.`product_id`) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

(In the results above for each select the test was run from the beginning to the end, with cache clearing / restarting server on an empty datadir. I removed the preparation part and some EXPLAINs from the comment because otherwise it gets too long and I can't add it.)

Elena,

The table structure was a simplified version of the actual tables. WIth real tables that contain many more columns, and 400K rows I noticed the query time increasing to 200%, with forcing the PRIMARY INDEX the query took 50% of the time as compared to the query that used the secondary index. I will try to come up with a test case much similar to the actual table. Note however, that the secondary index in question was still a single column index on a tinyint column.

See also BUG#26447 and BUG#35850.

Long story short:
- Fix for BUG#26447 made the optimizer to use clustered PK instead of secondary indexes.
- Right after that, there were complaints, BUG#35850 and its duplicates, so the change was reverted.

Apparently, there are cases where clustered PK is faster than secondary index, and there are cases where it is slower.

Check out this comment to BUG#35850:

[23 Apr 2008 3:25] Sergey Petrunya
Summarizing the previous post: for the query

explain select count(*) from ontime_all where origin = destination;

We get: 

5.0.58: full index scan on index idx_origin:
Cold - 3 min 26.98 sec
Warm - 11.15 sec

5.1.24: full index scan on index PRIMARY:
Cold 1 min 8.11 sec
Warm 1 min 13.88 sec

It seems, in that example, Clustered PK was faster for cold buffer pool, and slower for hot buffer pool.

Herre we have

[5 Dec 15:11] Ovais Tariq>  forcing the PRIMARY INDEX the query took 50% of the time as compared to the query that used the secondary index.

Ovais, is your query over actual tables also IO-bound?

This is a bug.

But, not because an index on non-referenced column is used. That is not a reason.

A reason for this bug is that index scanning is used although all indexed columns provided sufficient info for the optimizer to conclude that the empty result should be returned WITHOUT executing the query !!!! Simply, the equality expression for the common column is not satisfied for a single row of the joined entity.

I think it would be more appropriate to fix this in 5.6, but the decision has yet to be made.

Sergey,

I checked out the two bugs BUG#26447 and BUG#35850. I think there are three things here:
- table scans or instances where columns selected are not present in the secondary index, in such cases Primary Key is going to give much better performance, because of clustering
- In some cases secondary keys can give better performance when the query can be satisfied from the index alone
- Using Primary Key gives better performance for IO bound workload

I find that the fix for BUG#26447 is still a valid one for the original case defined in the bug, table scans are always going to give consistent performance when a Primary Key is used. Though there might need to be some addition to the rule added by the fix for BUG#26447, so that a PK is only selected for table scan and/or in cases where the query accesses columns other then the one present in the secondary index.

Sergey,

Yes the query over actual table is also IO bound. So the optimizer would need to be aware of whether the data to be retrieved is in memory or not, I saw the same behaviour when testing the new optimizer features, some of the features provided benefit only on IO bound workload. How difficult would it be for the optimizer to know before hand if the data is in cache or not?

Here is a test case that I think reproduces this issue:

CREATE TABLE t0 (
  i1 INTEGER NOT NULL
);

INSERT INTO t0 VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);

CREATE TABLE t1 (
  pk INTEGER NOT NULL,
  i1 INTEGER NOT NULL,
  PRIMARY KEY (pk),
  KEY k1 (i1)
) ENGINE=InnoDB;

# Insert 1.000.000 records
INSERT INTO t1 
SELECT (a.i1 + b.i1*10 + c.i1*100 + d.i1*1000 + e.i1*10000 + f.i1*100000) * 10, 
       FLOOR (RAND() * 1000000)
FROM t0 as a, t0 as b, t0 as c, t0 as d, t0 as e, t0 as f;

CREATE TABLE t2 (
  i1 INTEGER NOT NULL,
  i2 INTEGER NOT NULL,
  KEY k1 (i1)
) ENGINE=InnoDB;

# Insert 10.000.000 records
INSERT INTO t2
SELECT FLOOR (RAND() * 10000000), 42 
FROM t0 as a, t0 as b, t0 as c, t0 as d, t0 as e, t0 as f, t0 as g;

# This should do an index scan on t1's k1 index as the first table
SELECT count(*) FROM t1, t2 where t1.pk = t2.i1;

# This should do an index scan on t1's primary key as the first table
SELECT count(*) FROM t1 IGNORE INDEX(k1), t2 where t1.pk = t2.i1;

DROP TABLE t0, t1, t2;

When running this with all data in the InnoDB buffer pool I see the following execution times:

 using secondary index on table t1: 3,35 s
 using primary index on table t1  : 2,53 s

So with all data in memory, the heuristics that switches to use the secondary index causes an increase in execution time at about 30%.

The really large difference can be observed by making this test IO bound (by setting the InnoDB buffer to 50 MB and using O_DIRECT to avoid the file system cache). With this setting I see the following numbers:

 using secondary index on table t1: 50 minutes
 using primary index on table t1  : 58 seconds

The large increase in execution time comes from how we access the second table of the join:

 * When reading the first table using the secondary index, the ref-access on the second table will be "random accesses" into the second table - and random disk io.

 * When reading the first table using the primary key (which is the join condition) the ref-access on the second table will the more "sequential access" (we will access the index on the second table in ascending order). This reduces the number of blocks that need to be read from disk and likely triggers pre-fetching of index block in InnoDB.