Bug #51930 | MySQL not optimizing query on two part (char,int) index | ||
---|---|---|---|
Submitted: | 10 Mar 2010 19:13 | Modified: | 7 May 2015 5:08 |
Reporter: | Scott Nebor | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.1.43, 5.1.44, 5.1.46-bzr | OS: | Linux (Ubuntu 8.04) |
Assigned to: | CPU Architecture: | Any |
[10 Mar 2010 19:13]
Scott Nebor
[10 Mar 2010 19:14]
Scott Nebor
sql to create the table and insert test data
Attachment: tableAndData.sql (text/plain), 12.78 KiB.
[10 Mar 2010 19:25]
Scott Nebor
Changing severity
[11 Mar 2010 11:16]
Valeriy Kravchuk
Verified just as described with recent 5.4.6 from bzr also: openxs@suse:/home2/openxs/dbs/5.1> bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.1.46-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> explain -> SELECT * from mysqlreport -> WHERE intcolumn>=1267451485 and intcolumn<1268398800 AND -> CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE') -> order by intcolumn DESC -> limit 0 ,1 -> ; +----+-------------+-------------+-------+-----------------------+----------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+-----------------------+----------+---------+------+------+-------------+ | 1 | SIMPLE | mysqlreport | range | CharIntIndex,intIndex | intIndex | 4 | NULL | 106 | Using where | +----+-------------+-------------+-------+-----------------------+----------+---------+------+------+-------------+ 1 row in set (0.03 sec) mysql> select count(*) from mysqlreport where intcolumn>=1267451485 and intcolumn<1268398800; +----------+ | count(*) | +----------+ | 110 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from mysqlreport where intcolumn>=1267451485 and intcolumn<1268398800 and CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE'); +----------+ | count(*) | +----------+ | 10 | +----------+ 1 row in set (0.00 sec) So, even with additional filesort step it likely makes sence to use multiple-column index: mysql> show session status like 'Handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 0 | | Handler_read_key | 3 | | Handler_read_next | 120 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 23 | +-----------------------+-------+ 6 rows in set (0.01 sec) mysql> SELECT * from mysqlreport WHERE intcolumn>=1267451485 and intcolumn<1268398800 AND CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE') order by intcolumn DESC limit 0 ,1; +------------+------------+------------+ | pkField | intcolumn | CharColumn | +------------+------------+------------+ | 1267486704 | 1267486704 | TESTVALUEB | +------------+------------+------------+ 1 row in set (0.03 sec) mysql> show session status like 'Handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 0 | | Handler_read_key | 4 | | Handler_read_next | 120 | | Handler_read_prev | 100 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 23 | +-----------------------+-------+ 6 rows in set (0.00 sec) mysql> SELECT * from mysqlreport FORCE INDEX(`CharIntIndex`) WHERE intcolumn>=1267451485 and intcolumn<1268398800 AND CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE') order by intcolumn DESC limit 0 ,1; +------------+------------+------------+ | pkField | intcolumn | CharColumn | +------------+------------+------------+ | 1267486704 | 1267486704 | TESTVALUEB | +------------+------------+------------+ 1 row in set (0.00 sec) mysql> show session status like 'Handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 0 | | Handler_read_key | 6 | | Handler_read_next | 130 | | Handler_read_prev | 100 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 23 | +-----------------------+-------+ 6 rows in set (0.01 sec) Moreover, opimizer itself decides so in many cases that differs slightly (as you correctly pointed out): mysql> explain SELECT * from mysqlreport WHERE intcolumn>=1267451485 and intcolumn<1268398800 AND CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE') order by intcolumn DESC limit 0 ,2; +----+-------------+-------------+-------+-----------------------+--------------+---------+------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+-------+-----------------------+--------------+---------+------+------+-----------------------------+ | 1 | SIMPLE | mysqlreport | range | CharIntIndex,intIndex | CharIntIndex | 38 | NULL | 11 | Using where; Using filesort | +----+-------------+-------------+-------+-----------------------+--------------+---------+------+------+-----------------------------+ 1 row in set (0.00 sec)
[17 Oct 2014 16:12]
Scott Nebor
This issue seems to have gotten worse with mysql 5.6.13 and above (5.6.12 behaves the same as originally described in this bug) In the test cases that I originally noted in this bug report, this issue was sometimes happening based on the limit clause used (ex: limit 0,2 might optimize well, but limit 0,1 might not - it was pretty random) As of mysql 5.6.13, this issue now happens all of the time so long as a limit clause is in the query. i.e. For the test cases that I noted, if a limit clause exists, then it optimizes poorly. If a limit clause does not exist, then it optimizes correctly My hunch is that this is related to the fix in bug 69410. Given that this is worse now, can the priority of this issue be bumped up?
[7 May 2015 5:08]
Erlend Dahl
[4 May 2015 22:20] Chaithra Gopalareddy For the query in question: explain SELECT * from mysqlreport WHERE intcolumn>=1267451485 and intcolumn<1268398800 AND CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE') order by intcolumn DESC limit 0 ,1; We have two choices for the indexes 1.charintindex which is multi-column index on both charcolumn and intcolumn 2. intindex which is on intcolumn. For the given table the switch to use index for order by happens because of small limit. And it is correct as optimizer thinks that data is randomly distributed and if index for order by is chosen against filesorting, it might have to read only few rows because limit is small. But it so happens that the data is actually present at the end as order by is done desc. Hence the number of handler reads are more. Here are the answers for the following questions raised: Raising the limit clause should not be required in order for the query to be optimized correctly. Furthermore, you can also trigger mysql to optimize the query correctly by doing any one of the following *removing the limit clause *removing the order clause *removing the 'NONEXISTANTVALUE' in the where clause If limit is increased/removed, it will not make the switch because of cost. If desc is taken out, handler reads are reduced substantially as qualifying rows are found at the beginning of the index. If charcolumn does not have 'NONEXISTANTVALUE' in the ranges, even then charint index is used. From explain we see that switch is not happening. But actually the switch happens because in this case we have constant on the first key_part of the charint index. And charint index can be chosen for giving ordered output on intcolumn. But for the case where we have multiple values for the range, the first keypart is not constant. So the index will not be considered for order by. Problem faced w.r.t 5.6 has been fixed with wl#6986 in 5.7. We agree that the optimizer does not select the optimal plan for this query . But based on the optimizations that we have implemented and the available statistics, the optimizer behaves as it should in this case (at least for 5.5 and 5.7, maybe not for 5.6? (even though it behaves correctly also in 5.6 since the choice is not cost based)). We currently do not have any idea to fix such bugs as optimizer (in at least 5.7) is behaving "correctly". It is not finding the best plan, but it is finding the "correct plan".