Bug #67815 | MySQL consistently uses a low cardinality non-referenced column in a index scan | ||
---|---|---|---|
Submitted: | 5 Dec 2012 8:51 | Modified: | 6 Dec 2012 6:15 |
Reporter: | Ovais Tariq | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.5.28, 5.6.7 | OS: | Linux |
Assigned to: | CPU Architecture: | Any |
[5 Dec 2012 8:51]
Ovais Tariq
[5 Dec 2012 9:33]
Valeriy Kravchuk
Same with 5.6.8: ... mysql> show indexes from products\G *************************** 1. row *************************** Table: products Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: id Collation: A Cardinality: 10 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: Index_comment: *************************** 2. row *************************** Table: products Non_unique: 1 Key_name: run_flat Seq_in_index: 1 Column_name: run_flat Collation: A Cardinality: 2 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: Index_comment: 2 rows in set (0.00 sec) mysql> explain extended select product_id from products_mapping join products on products.id = products_mapping.product_id; +----+-------------+------------------+-------+---------------+------------+---------+------------------+------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------+-------+---------------+------------+---------+------------------+------+----------+--------------------------+ | 1 | SIMPLE | products | index | PRIMARY | run_flat | 1 | NULL | 10 | 100.00 | Using index | | 1 | SIMPLE | products_mapping | ref | product_id | product_id | 4 | test.products.id | 1 | 100.00 | Using where; Using index | +----+-------------+------------------+-------+---------------+------------+---------+------------------+------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec) mysql> show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `test`.`products_mapping`.`product_id` AS `product_id` from `test`.`products_mapping` join `test`.`products` where (`test`.`products`.`id` = `test`.`products_mapping`.`product_id`) 1 row in set (0.00 sec)
[5 Dec 2012 14:41]
MySQL Verification Team
There is an explanation of the behavior. Since 5.5, optimization exists related to the choice of the scanning of secondary indices. As every leaf of secondary index carries a PK as well, it is usually MUCH faster to scan secondary index then to scan PK. This is because scan of PK means scan of the entire table. The only problem here is that secondary index is as big as the row itself, which is not something that is taken into consideration. But, choice of non-referenced indexed column is not a real bug here. A real bug is how can that secondary index have a cardinality of 2, when it is all 0.
[5 Dec 2012 15:02]
Elena Stepanova
In regard to the part "this results in query performance degradation", I tried to scale the provided test to 100K/200K and 1M/2M records in the table, and still haven't got any perceivable difference in execution time between the default plan and forced primary key. Of course, it was just a rough measurement through a client, not an accurate benchmark, but still, if the performance were badly affected, it should have been noticeable. 100010 / 279864 records: mysql> select product_id from products_mapping join products on products.id = products_mapping.product_id; Empty set (0.23 sec) mysql> select product_id from products_mapping join products force index (primary) on products.id = products_mapping.product_id; Empty set (0.25 sec) ============= 1000010 / 2238912 records: mysql> select product_id from products_mapping join products on products.id = products_mapping.product_id; Empty set (2.42 sec) mysql> explain extended select product_id from products_mapping join products on products.id = products_mapping.product_id; +----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+ | 1 | SIMPLE | products | index | PRIMARY | run_flat | 1 | NULL | 1000521 | 100.00 | Using index | | 1 | SIMPLE | products_mapping | ref | product_id | product_id | 4 | test.products.id | 1 | 100.00 | Using where; Using index | +----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec) mysql> show warnings; +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`products_mapping`.`product_id` AS `product_id` from `test`.`products_mapping` join `test`.`products` where (`test`.`products`.`id` = `test`.`products_mapping`.`product_id`) | +-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> select product_id from products_mapping join products force index (primary) on products.id = products_mapping.product_id; Empty set (2.47 sec) mysql> explain extended select product_id from products_mapping join products force index (primary) on products.id = products_mapping.product_id; +----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+ | 1 | SIMPLE | products | index | PRIMARY | PRIMARY | 4 | NULL | 1000521 | 100.00 | Using index | | 1 | SIMPLE | products_mapping | ref | product_id | product_id | 4 | test.products.id | 1 | 100.00 | Using where; Using index | +----+-------------+------------------+-------+---------------+------------+---------+------------------+---------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec) mysql> show warnings; +-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`products_mapping`.`product_id` AS `product_id` from `test`.`products_mapping` join `test`.`products` FORCE INDEX (PRIMARY) where (`test`.`products`.`id` = `test`.`products_mapping`.`product_id`) | +-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) (In the results above for each select the test was run from the beginning to the end, with cache clearing / restarting server on an empty datadir. I removed the preparation part and some EXPLAINs from the comment because otherwise it gets too long and I can't add it.)
[5 Dec 2012 15:11]
Ovais Tariq
Elena, The table structure was a simplified version of the actual tables. WIth real tables that contain many more columns, and 400K rows I noticed the query time increasing to 200%, with forcing the PRIMARY INDEX the query took 50% of the time as compared to the query that used the secondary index. I will try to come up with a test case much similar to the actual table. Note however, that the secondary index in question was still a single column index on a tinyint column.
[5 Dec 2012 16:45]
Sergei Petrunia
See also BUG#26447 and BUG#35850. Long story short: - Fix for BUG#26447 made the optimizer to use clustered PK instead of secondary indexes. - Right after that, there were complaints, BUG#35850 and its duplicates, so the change was reverted. Apparently, there are cases where clustered PK is faster than secondary index, and there are cases where it is slower.
[5 Dec 2012 18:20]
Sergei Petrunia
Check out this comment to BUG#35850: [23 Apr 2008 3:25] Sergey Petrunya Summarizing the previous post: for the query explain select count(*) from ontime_all where origin = destination; We get: 5.0.58: full index scan on index idx_origin: Cold - 3 min 26.98 sec Warm - 11.15 sec 5.1.24: full index scan on index PRIMARY: Cold 1 min 8.11 sec Warm 1 min 13.88 sec It seems, in that example, Clustered PK was faster for cold buffer pool, and slower for hot buffer pool. Herre we have [5 Dec 15:11] Ovais Tariq> forcing the PRIMARY INDEX the query took 50% of the time as compared to the query that used the secondary index. Ovais, is your query over actual tables also IO-bound?
[5 Dec 2012 18:40]
MySQL Verification Team
This is a bug. But, not because an index on non-referenced column is used. That is not a reason. A reason for this bug is that index scanning is used although all indexed columns provided sufficient info for the optimizer to conclude that the empty result should be returned WITHOUT executing the query !!!! Simply, the equality expression for the common column is not satisfied for a single row of the joined entity. I think it would be more appropriate to fix this in 5.6, but the decision has yet to be made.
[6 Dec 2012 6:11]
Ovais Tariq
Sergey, I checked out the two bugs BUG#26447 and BUG#35850. I think there are three things here: - table scans or instances where columns selected are not present in the secondary index, in such cases Primary Key is going to give much better performance, because of clustering - In some cases secondary keys can give better performance when the query can be satisfied from the index alone - Using Primary Key gives better performance for IO bound workload I find that the fix for BUG#26447 is still a valid one for the original case defined in the bug, table scans are always going to give consistent performance when a Primary Key is used. Though there might need to be some addition to the rule added by the fix for BUG#26447, so that a PK is only selected for table scan and/or in cases where the query accesses columns other then the one present in the secondary index.
[6 Dec 2012 6:15]
Ovais Tariq
Sergey, Yes the query over actual table is also IO bound. So the optimizer would need to be aware of whether the data to be retrieved is in memory or not, I saw the same behaviour when testing the new optimizer features, some of the features provided benefit only on IO bound workload. How difficult would it be for the optimizer to know before hand if the data is in cache or not?
[13 Jun 2013 12:56]
Olav Sandstå
Here is a test case that I think reproduces this issue: CREATE TABLE t0 ( i1 INTEGER NOT NULL ); INSERT INTO t0 VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9); CREATE TABLE t1 ( pk INTEGER NOT NULL, i1 INTEGER NOT NULL, PRIMARY KEY (pk), KEY k1 (i1) ) ENGINE=InnoDB; # Insert 1.000.000 records INSERT INTO t1 SELECT (a.i1 + b.i1*10 + c.i1*100 + d.i1*1000 + e.i1*10000 + f.i1*100000) * 10, FLOOR (RAND() * 1000000) FROM t0 as a, t0 as b, t0 as c, t0 as d, t0 as e, t0 as f; CREATE TABLE t2 ( i1 INTEGER NOT NULL, i2 INTEGER NOT NULL, KEY k1 (i1) ) ENGINE=InnoDB; # Insert 10.000.000 records INSERT INTO t2 SELECT FLOOR (RAND() * 10000000), 42 FROM t0 as a, t0 as b, t0 as c, t0 as d, t0 as e, t0 as f, t0 as g; # This should do an index scan on t1's k1 index as the first table SELECT count(*) FROM t1, t2 where t1.pk = t2.i1; # This should do an index scan on t1's primary key as the first table SELECT count(*) FROM t1 IGNORE INDEX(k1), t2 where t1.pk = t2.i1; DROP TABLE t0, t1, t2; When running this with all data in the InnoDB buffer pool I see the following execution times: using secondary index on table t1: 3,35 s using primary index on table t1 : 2,53 s So with all data in memory, the heuristics that switches to use the secondary index causes an increase in execution time at about 30%. The really large difference can be observed by making this test IO bound (by setting the InnoDB buffer to 50 MB and using O_DIRECT to avoid the file system cache). With this setting I see the following numbers: using secondary index on table t1: 50 minutes using primary index on table t1 : 58 seconds The large increase in execution time comes from how we access the second table of the join: * When reading the first table using the secondary index, the ref-access on the second table will be "random accesses" into the second table - and random disk io. * When reading the first table using the primary key (which is the join condition) the ref-access on the second table will the more "sequential access" (we will access the index on the second table in ascending order). This reduces the number of blocks that need to be read from disk and likely triggers pre-fetching of index block in InnoDB.