Bug #26966 Optimizer does not select range access over ref in some specific cases
Submitted: 8 Mar 2007 17:27 Modified: 7 Sep 2007 9:13
Reporter: Guillaume Lefranc Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.38-enterprise-gpl-log OS:Linux (Ubuntu 6.06 LTS)
Assigned to: CPU Architecture:Any
Tags: column, indexes, multi, Optimizer, range, ref

[8 Mar 2007 17:27] Guillaume Lefranc
Description:
When using a query that potentially mixes range or ref access, the optimizer chooses the ref access by default, even if the number of rows examined by ref access is way higher. This is a specific case with a multi-column index which matches the ref access by many rows.

How to repeat:
 CREATE TABLE `usertest` (
  `created` datetime NOT NULL,
  `active` enum('y','n') DEFAULT NULL,
  `status_disabled` tinyint(10) NOT NULL DEFAULT '0',
  KEY `created_idx` (`created`),
  KEY `active_created_idx` (`active`,`status_disabled`,`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Feed 10000 rows in the table.

mysql> explain SELECT COUNT(*) AS 'new active users' FROM usertest WHERE  created >= CONCAT(date_sub(curdate(), interval 1 day)," 00:00:00") AND created < CONCAT(curdate(), " 00:00:00") AND active='y';
+----+-------------+----------+------+--------------------------------+--------------------+---------+-------+------+--------------------------+
| id | select_type | table    | type | possible_keys                  | key                | key_len | ref   | rows | Extra                    |
+----+-------------+----------+------+--------------------------------+--------------------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | usertest | ref  | created_idx,active_created_idx | active_created_idx | 2       | const | 5232 | Using where; Using index | 
+----+-------------+----------+------+--------------------------------+--------------------+---------+-------+------+--------------------------+
1 row in set (0.01 sec)

mysql> explain SELECT COUNT(*) AS 'new active users' FROM usertest FORCE INDEX (created_idx) WHERE  created >= CONCAT(date_sub(curdate(), interval 1 day)," 00:00:00") AND created < CONCAT(curdate(), " 00:00:00") AND active='y';
+----+-------------+----------+-------+---------------+-------------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key         | key_len | ref  | rows | Extra       |
+----+-------------+----------+-------+---------------+-------------+---------+------+------+-------------+
|  1 | SIMPLE      | usertest | range | created_idx   | created_idx | 8       | NULL |   10 | Using where | 
+----+-------------+----------+-------+---------------+-------------+---------+------+------+-------------+
1 row in set (0.00 sec)

Query execution times are relevant to this optimization:

mysql> SELECT COUNT(*) AS 'new active users' FROM user  WHERE  created >= CONCAT(date_sub(curdate(), interval 1 day)," 00:00:00") AND created < CONCAT(curdate(), " 00:00:00") AND active='y';
+------------------+
| new active users |
+------------------+
|            20071 | 
+------------------+
1 row in set (1.87 sec)

mysql> SELECT COUNT(*) AS 'new active users' FROM user FORCE INDEX (created_idx) WHERE  created >= CONCAT(date_sub(curdate(), interval 1 day)," 00:00:00") AND created < CONCAT(curdate(), " 00:00:00") AND active='y';
+------------------+
| new active users |
+------------------+
|            20071 | 
+------------------+
1 row in set (0.11 sec)

Suggested fix:
Force the optimizer to use range optimization by default when the number of examined rows is lower by a certain weight determined by the difference of execution speed between ref and range queries. Thank you!
[24 Mar 2007 5:11] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 5.0.36/5.0.37, and inform about the results.
[10 Apr 2007 16:33] Guillaume Lefranc
Repeated vs. 5.0.38. Optimizer still picks the wrong index.
[4 May 2007 11:01] Valeriy Kravchuk
Why do you think that wrong index is used? Can you prove that with query execution times? If yes, please, upload the exact data for the table.

Have you tried to execute ANALYZE TABLE before running EXPLAINs?
[4 May 2007 12:01] Guillaume Lefranc
Valeriy,

I have posted explain and execution times in the first comment.
My point was that the optimizer seems to select its index based on ref over range,  not on the number of rows processed.

I can update you with more data if needed.
[4 May 2007 16:44] Valeriy Kravchuk
Please, upload your table data. I was not able to repeat on dumb data I created.
[7 Sep 2007 9:13] Sveta Smirnova
Thank you for the report.

I can not repeat described behaviour with current sources, although problem is repeatable with version 5.0.41. So I'll close the report as "Can't repeat".