MySQL Bugs: #78325: wrong index being used in select

Bug #78325	wrong index being used in select
Submitted:	4 Sep 2015 11:35	Modified:	8 Oct 2015 8:30
Reporter:	Jay Jay	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	7.4.7/5.6.25	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
A table with 8 million rows where I perform a select on 47 rows

1.
select * from table where something1=123 and something2=456 and something_else order by id

returns 47 rows

2.
select * from table where something1=123 and something2=456 and something_else order by id limit 13

3.
select * from table where something1=123 and something2=456 and something_else order by id limit 400

4.
select * from table where something1=123 and something2=456 and something_else  limit 13

Query #2 use the wrong index (primary key instead of another key) which results in full tablescan.

If the limit is unset or is higher than 96, the correct index is used.
If limit is set and is 96 or lower, the incorrect index is used.
If the order by id is removed, the correct index is used regardless of limit.

How to repeat:
Pseudo-example table:

create table test (

id int unsigned not null,
something1 int unsigned not null,
something2 int unsigned not null,
something_else1 bool not null,
something_else2 bool not null,
primary key (id, something1, something2),
key (something1, something2)

);

Query #2 will use primary key and will eat a lot more resources than using key (something1, something2)

myisam table

Thank you for your bug report.

First of all, good that you have sent us a CREATE TABLE, but this is not enough. We must have the data that are very similar to the ones that you actually have in the problematic table. You see, we must be able to repeat the problem, in order to verify the bug or find possible duplicate(s).

Second, can you run ANALYZE TABLE and repeat the tests.

mysql> analyze table TABLE;
+-------------+---------+----------+----------+
| Table       | Op      | Msg_type | Msg_text |
+-------------+---------+----------+----------+
| TABLE       | analyze | status   | OK       |
+-------------+---------+----------+----------+
1 row in set (48.75 sec)

I did a mysqlcheck -r TABLE before I posted this bugreport
I also ran the queries on our replicating server instance with same result there.

Using a describe on query #2 shows:
mysql> describe select id from .... order by id limit 86;    
+----+-------------+-------+-------+---------------------------------------------------+---------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys                                     | key     | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------------------------------------------+---------+---------+------+-------+-------------+
|  1 | SIMPLE      | TABLE | index | account_id_3,account_id_4,account_id,account_id_2 | PRIMARY | 12      | NULL | 28772 | Using where |
+----+-------------+-------+-------+---------------------------------------------------+---------+---------+------+-------+-------------+

while all the other queries say:

mysql> describe select .... order by id limit 822342343;
+----+-------------+-------+------+---------------------------------------------------+--------------+---------+-------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys                                     | key          | key_len | ref         | rows  | Extra                       |
+----+-------------+-------+------+---------------------------------------------------+--------------+---------+-------------+-------+-----------------------------+
|  1 | SIMPLE      | TABLE | ref  | account_id_3,account_id_4,account_id,account_id_2 | account_id_4 | 8       | const,const | 30279 | Using where; Using filesort |
+----+-------------+-------+------+---------------------------------------------------+--------------+---------+-------------+-------+-----------------------------+

Hi,

First of all, we would need to have your data in order to duplicate the bug. If you need time to provide us with data, meanwhile, you can do the following:

* Many optimizer bugs are fixed in 5.7, so can you test whether your query works fine or not with latest 5.7 release, 5.7.8

* It could be InnoDB bug and not MyISAM bug. Can you make an InnoDB table with different name, but with same data and test whether a bug persists

* Last, but not least, we would need the output from :

EXPLAIN EXTENDED SELECT ........

where ... means your problematic query.

I cant give you the data because it is classified, but here is the explain extended:

+----+-------------+-------+-------+---------------------------------------------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | type  | possible_keys                                     | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+-------+---------------------------------------------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | TABLE | index | account_id_3,account_id_4,account_id,account_id_2 | PRIMARY | 12      | NULL |  663 |  4837.71 | Using where |
+----+-------------+-------+-------+---------------------------------------------------+---------+---------+------+------+----------+-------------+

and with correct use of index:

+----+-------------+-------+------+---------------------------------------------------+------------+---------+-------------+-------+----------+-----------------------------+
| id | select_type | table | type | possible_keys                                     | key        | key_len | ref         | rows  | filtered | Extra                       |
+----+-------------+-------+------+---------------------------------------------------+------------+---------+-------------+-------+----------+-----------------------------+
|  1 | SIMPLE      | TABLE | ref  | account_id_3,account_id_4,account_id,account_id_2 | account_id | 8       | const,const | 32073 |   100.00 | Using where; Using filesort |
+----+-------------+-------+------+---------------------------------------------------+------------+---------+-------------+-------+----------+-----------------------------+

The table is myisam only.

I'm running NDB on some other servers, so I'm trying to stick to the same mysql version on all servers. If it is already fixed in 5.7/8 then just close this bug, but that would be difficult to know I guess :)

Thank you for the output of EXPLAIN EXTENDED.

As you are not sending us your data we need you to do other checks. We need to know whether it is MyISAM only or not and whether is it fixed in 5.7 or not.

We can not know that without having your data, so your feedback is truly necessary.

Hi,
I see that you mention here 7.4.7, you are sure this is not ndbcluster table but MyISAM table?

Can you provide us with show table status?

As Sinisa asked, is there a chance you can copy this table to a 5.7 server (some local testing one) and test if 5.7 shows the same bug. Without your data we can't.

kind regards
Bogdan Kecman

Hi, we got back from our dev team with more details.

For the problematic query, the optimizer will consider the following alternative query plans:

a) Execute the query using the index on (something1, something2), collect all qualifying records, and use file sort.
b) Execute the query using the the primary key which gives the records in the order specified by "order by"

When there is a "limit n" on the query, the alternative (b) execution can stop as soon as we have found "n" records satisfying the query condition. The lower "n" is, the quicker the query will execute.

As this report shows, when n becomes lower than a given value, the optimizer changes to run the query using "plan b" above. In many cases this is a good choice (after all we can stop after having found 13 records). But in this case it causes worse performance. But we have also many cases where the choice to switch to use "plan b" is a great performance improvement. It is really hard to get the point where we switches between these two alternative plans correct in all cases.

For understanding exactly why we misses in this particular case, we would need to be able to run this query with data that is representative/similar to the user data. If this is not possible, and you said you can't provide us the data, we need you to run the query with optimizer trace enabled and give us the resulting trace. This could help (note that the trace would reveal more information about the data and query than is in the bug report) us figure out how to improve.

In 5.7 we have done some improvements to the code that decides between these two plans. We should be able to make a better estimate on how many records that is needed to read from the "order by" friendly index before we have found the number of records requested by "limit n". If you can run the same test with same data on 5.7 this would help us a lot also.

best regards
Bogdan Kecman
Olav Sandstaa

Great, thanks for the additional data.

all best
Bogdan Kecman

This actually is not a bug ....

Let us explain why, based on how the optimizer works.  Here is our analysis:

The query is:

 select id from email where account_id=11382 and folder_id=0 order by id limit 3;

There exists several indexes, the two most relevant are:

 index on (account_id, folder_id)  -- this is a secondary index named account_id_4
 index on (id) -- this is the primary key for the table

The two most relevant plans the optimizer considers are:

1. Ref access in the account_id_4 index:

   According to the optimizer trace this will return 19227 records. So in order to execute the query using this index we will need to:

   "cost for query": read 19227 records, sort all these records, and return 3 to the user

   (this is the plan the bug reporter want to have)

2. Index scan on the primary index:

   We will read records using the primary index until we have found 3 records that satisfies: "account_id=11382 and folder_id=0"

   From the optimizer trace we have that there are 10660656 records in the table. Of these will 19227 satisfy the query condition. So one record out of 554 records will satisfy the where condition. If we assume the records that satisfy the where condition is uniformly distributed we will find the first qualifying records after having read about 3 * 554 = 1662 records.

   "cost for query": read 1662 records

It is obvious from the above analysis that the plan (2) does look most promising !!!

The optimizer selects to use plan 2 (since it has much lower estimated cost). We switch to use an index that provides records in sorted order given that the query only need the 3 first records.

Where this sometimes fails (as in this bug), is that we assume that the qualifying records are uniformly distributed in the table. We can not from the data we have in the bug report say that this is the case, but our guess is that when we execute according to "plan 2" above we end up reading a lot more of the table before we find the three first qualifying records. And this causes the performance issue. Having a special feature that stores key value distribution over the entire range is something that is not even planned yet, although it is one of the features that we are still designing. 

It would likely be possible to verify that this is what actually happens if you would provide information about the number of handler calls this query caused:

  FLUSH STATUS;
  query;
  SHOW STATUS LIKE 'Handler%*;

Thank you for your bug report, but it is not a bug.

The data changes constantly, but from the query I reported with todays data:

+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 0       |
| Handler_delete             | 0       |
| Handler_discover           | 0       |
| Handler_external_lock      | 2       |
| Handler_mrr_init           | 0       |
| Handler_prepare            | 0       |
| Handler_read_first         | 1       |
| Handler_read_key           | 0       |
| Handler_read_last          | 0       |
| Handler_read_next          | 4948828 |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 0       |
| Handler_read_rnd_next      | 0       |
| Handler_rollback           | 0       |
| Handler_savepoint          | 0       |
| Handler_savepoint_rollback | 0       |
| Handler_update             | 0       |
| Handler_write              | 0       |
+----------------------------+---------+

Using a quite similar query on another database-server that also have the same problem has now been and still is running for 2 hours, although the hardware is significantly slower than the first one.

These data confirm our ideas of what happened with the optimizer plan choice. 

MySQL is supposing that data is uniformly distributed, which simply might not be the case. Changing the decision to use another index, might prove to solve your problem, but will create a problem for those for whom current choice works better. So, instead of this bug, we would get a new bug.

The only proper solution is to have a complete redesign of how are index values treated. A new design would require many, many men-months and is not yet planned.

I understand.

The only strange part is that this system has been running for 10years++ with mySQL and no change to this query or DB structure. mySQL-upgrades has been randomly done.

It looks like the problem may have started from 5.6.21, but hard to tell.

A solution would indeed be to tell the system to use a certain named index, but the indexnames are not consistent. Renaming indexes without rebuild seems to be supported only from 5.7, so have to wait until upgrade :)

First of all, it is very nice that you understand, which also speaks volumes about you as MySQL DBA.

Regarding a possibility that query was faster before certain version, that is possible, but not very much probable. It is possible that a feature was developed or that bug was fixed in some other part of the optimizer, like in data types processing or buffering or ORDER BY optimizations, which had repercussions in other optimizer modules. This is unlikely, but still possible to cause that plans are now preferring uniform over skewed distribution of the values in the index.

Once again, it is highly improbable, but not impossible !!!