Bug #67980 | MySQL not utilising full index because of possible bad cardinality stats | ||
---|---|---|---|
Submitted: | 27 Dec 2012 14:54 | Modified: | 19 Jan 2013 10:52 |
Reporter: | Ovais Tariq | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.5.28 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[27 Dec 2012 14:54]
Ovais Tariq
[27 Dec 2012 14:54]
Ovais Tariq
Dump file to test the queries with
Attachment: test_t1.dump (application/octet-stream, text), 33.31 KiB.
[27 Dec 2012 15:32]
Valeriy Kravchuk
Looks like a kind of good old bug #12113 (that nobody actually is going to fix it seems, even though fix for post 5.1 was once planned).
[28 Dec 2012 11:04]
Ovais Tariq
Valerii, Yes it looks similar, but there is no order by and limit involved. Anyhow it looks very wrong for the optimizer to not use the index completely in this kind of query, by default.
[29 Dec 2012 12:39]
Valeriy Kravchuk
It's easy to verify that the same problem happens with recent 5.5.29 and with MyISAM table instead of InnoDB, even after ANALYZE. So, this is more fundamental than just incorrect statistics estimation at InnoDB level. For current simplified test case the problem may not be that visible or serious: C:\Program Files\MySQL\MySQL Server 5.5\bin>mysqlslap -uroot -proot -P3312 --con currency=1 --create-schema=test --no-drop --number-of-queries=1000 --iterations= 10 --query="select * from t1 where c1='d' and i2 in (1,2,3,4,5,6,7,8,9,10) and i 3 in (9,11,15,20)" Benchmark Average number of seconds to run all queries: 3.807 seconds Minimum number of seconds to run all queries: 2.841 seconds Maximum number of seconds to run all queries: 4.091 seconds Number of clients running queries: 1 Average number of queries per client: 1000 C:\Program Files\MySQL\MySQL Server 5.5\bin>mysqlslap -uroot -proot -P3312 --con currency=1 --create-schema=test --no-drop --number-of-queries=1000 --iterations= 10 --query="select * from t1 force index(c1) where c1='d' and i2 in (1,2,3,4,5,6 ,7,8,9,10) and i3 in (9,11,15,20)" Benchmark Average number of seconds to run all queries: 3.615 seconds Minimum number of seconds to run all queries: 2.810 seconds Maximum number of seconds to run all queries: 4.075 seconds Number of clients running queries: 1 Average number of queries per client: 1000 but in general case it may become serious.
[29 Dec 2012 17:04]
MySQL Verification Team
Looks fixed in mysql-trunk, have to find out which bugfix it was. mysql> explain select * from t1 where c1='d' and i2 in (1,2,3,4,5,6,7,8,9,10) and i3 in (9,11,15,20); +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ | 1 | SIMPLE | t1 | range | c1 | c1 | 14 | NULL | 120 | Using index condition | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ 1 row in set (0.00 sec) mysql> select version(); +-----------------+ | version() | +-----------------+ | 5.7.1-m11-debug | +-----------------+ 1 row in set (0.00 sec)
[8 Jan 2013 8:47]
Jørgen Løland
Thank you for the bug report. Verified in 5.1, 5.5 and 5.6 (using eq_range_index_dive_limit > 40).
[19 Jan 2013 10:00]
Ovais Tariq
Hi Jørgen, I used optimizer_tracing to understand why Optimizer is choosing the plan it is choosing. I set the variable eq_range_index_dive_limit to both 0 and 1 to test what happens when index dives are used to compute cost or when statistics are used to compute cost. In both cases the optimizer is computing the cost of ref access (using only 1 column of the index) to be 6x to 8x lower as compared to range access (when all the columns of the index are used). Furthermore, the cost of range access is calculated to be 1.5x to 2x more as compared to table_scan. -- use index statistics when estimating the number of rows in the ranges of the query "rows_estimation": [ { "table": "`t1`", "range_analysis": { "table_scan": { "rows": 2048, "cost": 418.7 }, … "analyzing_range_alternatives": { "range_scan_alternatives": [ { "index": "c1", "ranges": [ "d <= c1 <= d AND 1 <= i2 <= 1", "d <= c1 <= d AND 2 <= i2 <= 2", "d <= c1 <= d AND 3 <= i2 <= 3", "d <= c1 <= d AND 4 <= i2 <= 4", "d <= c1 <= d AND 5 <= i2 <= 5", "d <= c1 <= d AND 6 <= i2 <= 6", "d <= c1 <= d AND 7 <= i2 <= 7", "d <= c1 <= d AND 8 <= i2 <= 8", "d <= c1 <= d AND 9 <= i2 <= 9", "d <= c1 <= d AND 10 <= i2 <= 10" ], "index_dives_for_eq_ranges": false, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 510, "cost": 622.01, "chosen": false, "cause": "cost" } … "considered_access_paths": [ { "access_type": "ref", "index": "c1", "rows": 510, "cost": 123, "chosen": true }, { "access_type": "scan", "rows": 510, "cost": 416.6, "chosen": false } ] -- do index dives "rows_estimation": [ { "table": "`t1`", "range_analysis": { "table_scan": { "rows": 2048, "cost": 418.7 }, … "analyzing_range_alternatives": { "range_scan_alternatives": [ { "index": "c1", "ranges": [ "d <= c1 <= d AND 1 <= i2 <= 1", "d <= c1 <= d AND 2 <= i2 <= 2", "d <= c1 <= d AND 3 <= i2 <= 3", "d <= c1 <= d AND 4 <= i2 <= 4", "d <= c1 <= d AND 5 <= i2 <= 5", "d <= c1 <= d AND 6 <= i2 <= 6", "d <= c1 <= d AND 7 <= i2 <= 7", "d <= c1 <= d AND 8 <= i2 <= 8", "d <= c1 <= d AND 9 <= i2 <= 9", "d <= c1 <= d AND 10 <= i2 <= 10" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 703, "cost": 853.61, "chosen": false, "cause": "cost" } ] … "considered_access_paths": [ { "access_type": "ref", "index": "c1", "rows": 703, "cost": 161.6, "chosen": true }, { "access_type": "scan", "rows": 703, "cost": 416.6, "chosen": false } ] So it appears to me that there is something wrong with the cost calculation. The cost of doing range scan is not 6x to 8x times more, certainly that is not shown in the query execution times. +----------+------------+---------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+---------------------------------------------------------------------------------------------------------------------+ | 26 | 0.00205925 | select sql_no_cache * from t1 where c1='d' and i2 in (1,2,3,4,5,6,7,8,9,10) | | 27 | 0.00193075 | select sql_no_cache * from t1 force index(c1) where c1='d' and i2 in (1,2,3,4,5,6,7,8,9,10) | | 28 | 0.00203975 | select sql_no_cache * from t1 ignore index(c1) where c1='d' and i2 in (1,2,3,4,5,6,7,8,9,10) | +----------+------------+---------------------------------------------------------------------------------------------------------------------+ Btw thanks for the informative blog post: http://jorgenloland.blogspot.com/2012/04/on-queries-with-many-values-in-in.html
[19 Jan 2013 10:52]
Ovais Tariq
Though I must say that the variable eq_range_index_dive_limit does have affect on the query execution plan: mysql [localhost] {msandbox} (test) > set session eq_range_index_dive_limit=10; Query OK, 0 rows affected (0.00 sec) mysql [localhost] {msandbox} (test) > explain select sql_no_cache * from t1 where c1='d' and i2 in (1,2,3,4,5,6,7); +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ | 1 | SIMPLE | t1 | range | c1 | c1 | 9 | NULL | 304 | Using index condition | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > set session eq_range_index_dive_limit=1; Query OK, 0 rows affected (0.00 sec) mysql [localhost] {msandbox} (test) > explain select sql_no_cache * from t1 where c1='d' and i2 in (1,2,3,4,5,6,7); +----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+ | 1 | SIMPLE | t1 | ref | c1 | c1 | 4 | const | 357 | Using index condition | +----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+ 1 row in set (0.00 sec) As you can see that when index dives are used the full index is used, however when pre-calculated index_statistics are used then a single column is used from the index.