MySQL Bugs: #51930: MySQL not optimizing query on two part (char,int) index

Bug #51930	MySQL not optimizing query on two part (char,int) index
Submitted:	10 Mar 2010 19:13	Modified:	7 May 2015 5:08
Reporter:	Scott Nebor	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.1.43, 5.1.44, 5.1.46-bzr	OS:	Linux (Ubuntu 8.04)
Assigned to:		CPU Architecture:	Any

Description:
Under certain conditions, mysql is not properly optimizing a query that should utilize a two part (char,int) index

The table looks like this:
CREATE TABLE `mysqlreport` (
  `pkField` varchar(50) NOT NULL,
  `intcolumn` int(11) NOT NULL DEFAULT '0',
  `CharColumn` varchar(32) NOT NULL DEFAULT '',
  PRIMARY KEY (`pkField`,`intcolumn`),
  KEY `CharIntIndex` (`CharColumn`,`intcolumn`),
  KEY `intIndex` (`intcolumn`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 CHECKSUM=1;

The following two queries demonstrate the problem:
explain 
SELECT * from mysqlreport 
 WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND 
 CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')
 order by intcolumn DESC
 limit 0 ,1;
This query shows that it is using the "intIndex", and it is not using "CharIntIndex" as it should

explain 
SELECT * from mysqlreport 
 WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND 
 CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')
 order by intcolumn DESC
 limit 0 ,2;
This query shows that it is using the "CharIntIndex", and that the rows analyzed is much lower (resulting in a faster query).  All that was changed between the two queries is the limit clause

Raising the limit clause should not be required in order for the query to be optimized correctly.  Furthermore, you can also trigger mysql to optimize the query correctly by doing any one of the following
*removing the limit clause
*removing the order clause
*removing the 'NONEXISTANTVALUE' in the where clause

How to repeat:
The following creates the necessary tables and data

I will attach the sql queries to create the table and insert test data required to repeat this bug.  Run those queries first

Take the following example queries:
explain 
SELECT * from mysqlreport 
 WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND 
 CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')
 order by intcolumn DESC
 limit 0 ,1
- This runs slow - explain shows 109 rows returned

explain 
SELECT * from mysqlreport 
 WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND 
 CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')
 order by intcolumn DESC
 limit 0 ,2
- raise the limit
- This runs fast - explain shows 11 rows returned

explain 
SELECT * from mysqlreport 
 WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND 
 CharColumn IN ('TESTVALUEB')
 order by intcolumn DESC
 limit 0 ,1
- remove the 'NONEXISTANTVALUE'
- This also runs fast - explain shows 11 rows returned

explain 
SELECT * from mysqlreport 
 WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND 
 CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')
 limit 0 ,1
- remove the order by
- This also runs fast - explain shows 11 rows returned

explain 
SELECT * from mysqlreport 
 WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND 
 CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')
 order by intcolumn DESC
- remove the limit
- This also runs fast - explain shows 11 rows returned

Suggested fix:
The following query should use the "CharIntIndex" index:

SELECT * from mysqlreport 
 WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND 
 CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')
 order by intcolumn DESC
 limit 0 ,1;

sql to create the table and insert test data

Attachment: tableAndData.sql (text/plain), 12.78 KiB.

Changing severity

Verified just as described with recent 5.4.6 from bzr also:

openxs@suse:/home2/openxs/dbs/5.1> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.1.46-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> explain
    -> SELECT * from mysqlreport
    ->  WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND
    ->  CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')
    ->  order by intcolumn DESC
    ->  limit 0 ,1
    -> ;
+----+-------------+-------------+-------+-----------------------+----------+---------+------+------+-------------+
| id | select_type | table       | type  | possible_keys         | key      | key_len | ref  | rows | Extra       |
+----+-------------+-------------+-------+-----------------------+----------+---------+------+------+-------------+
|  1 | SIMPLE      | mysqlreport | range | CharIntIndex,intIndex | intIndex | 4       | NULL |  106 | Using where |
+----+-------------+-------------+-------+-----------------------+----------+---------+------+------+-------------+
1 row in set (0.03 sec)

mysql> select count(*) from mysqlreport where intcolumn>=1267451485 and intcolumn<1268398800;
+----------+
| count(*) |
+----------+
|      110 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from mysqlreport where intcolumn>=1267451485 and intcolumn<1268398800 and CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE');
+----------+
| count(*) |
+----------+
|       10 |
+----------+
1 row in set (0.00 sec)

So, even with additional filesort step it likely makes sence to use multiple-column index:

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 3     |
| Handler_read_next     | 120   |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 23    |
+-----------------------+-------+
6 rows in set (0.01 sec)

mysql> SELECT * from mysqlreport   WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND   CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')  order by intcolumn DESC  limit 0 ,1;
+------------+------------+------------+
| pkField    | intcolumn  | CharColumn |
+------------+------------+------------+
| 1267486704 | 1267486704 | TESTVALUEB |
+------------+------------+------------+
1 row in set (0.03 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 4     |
| Handler_read_next     | 120   |
| Handler_read_prev     | 100   |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 23    |
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> SELECT * from mysqlreport FORCE INDEX(`CharIntIndex`) WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND   CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')  order by intcolumn DESC  limit 0 ,1;
+------------+------------+------------+
| pkField    | intcolumn  | CharColumn |
+------------+------------+------------+
| 1267486704 | 1267486704 | TESTVALUEB |
+------------+------------+------------+
1 row in set (0.00 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 6     |
| Handler_read_next     | 130   |
| Handler_read_prev     | 100   |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 23    |
+-----------------------+-------+
6 rows in set (0.01 sec)

Moreover, opimizer itself decides so in many cases that differs slightly (as you correctly pointed out):

mysql> explain  SELECT * from mysqlreport   WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND   CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')  order by intcolumn DESC  limit 0 ,2;
+----+-------------+-------------+-------+-----------------------+--------------+---------+------+------+-----------------------------+
| id | select_type | table       | type  | possible_keys         | key          | key_len | ref  | rows | Extra                       |
+----+-------------+-------------+-------+-----------------------+--------------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | mysqlreport | range | CharIntIndex,intIndex | CharIntIndex | 38      | NULL |   11 | Using where; Using filesort |
+----+-------------+-------------+-------+-----------------------+--------------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

This issue seems to have gotten worse with mysql 5.6.13 and above (5.6.12 behaves the same as originally described in this bug)

In the test cases that I originally noted in this bug report, this issue was sometimes happening based on the limit clause used (ex: limit 0,2 might optimize well, but limit 0,1 might not - it was pretty random)

As of mysql 5.6.13, this issue now happens all of the time so long as a limit clause is in the query.  i.e. For the test cases that I noted, if a limit clause exists, then it optimizes poorly.  If a limit clause does not exist, then it optimizes correctly

My hunch is that this is related to the fix in bug 69410.   

Given that this is worse now, can the priority of this issue be bumped up?

[4 May 2015 22:20] Chaithra Gopalareddy

For the query in question:
explain
SELECT * from mysqlreport
 WHERE  intcolumn>=1267451485 and intcolumn<1268398800 AND
 CharColumn IN ('TESTVALUEB','NONEXISTANTVALUE')
 order by intcolumn DESC
 limit 0 ,1;

We have two choices for the indexes
1.charintindex which is multi-column index on both charcolumn and intcolumn
2. intindex which is on intcolumn.

For the given table the switch to use index for order by happens because of
small limit. And it is correct as optimizer thinks that data is randomly
distributed and if index for order by is chosen against filesorting, it might
have to read only few rows because limit is small. But it so happens that the
data is actually present at the end as order by is done desc. Hence the
number of handler reads are more.

Here are the answers for the following questions raised:

Raising the limit clause should not be required in order for the query to be
optimized correctly.  Furthermore, you can also trigger mysql to optimize the
query correctly by doing any one of the following
*removing the limit clause
*removing the order clause
*removing the 'NONEXISTANTVALUE' in the where clause

If limit is increased/removed, it will not make the switch because of cost.

If desc is taken out, handler reads are reduced substantially as qualifying
rows are found at the beginning of the index.

If charcolumn does not have 'NONEXISTANTVALUE' in the ranges, even then
charint index is used. From explain we see that switch is not happening. But
actually the switch happens because in this case we have constant on the
first key_part of the charint index. And charint index can be chosen for
giving ordered output on intcolumn.
But for the case where we have multiple values for the range, the first
keypart is not constant. So the index will not be considered for order by.

Problem faced w.r.t 5.6 has been fixed with wl#6986 in 5.7.

We agree that the optimizer does not select the optimal plan for this query .
But based on the optimizations that we have implemented and the available
statistics, the optimizer behaves as it should in this case (at least for 5.5
and 5.7, maybe not for 5.6? (even though it behaves correctly also in 5.6
since the choice is not cost based)).

We currently do not have any idea to fix such bugs as optimizer (in at least
5.7) is behaving "correctly". It is not finding the best plan, but it is
finding the "correct plan".