Bug #33030 MySQL 5.1 query optimizer regression
Submitted: 6 Dec 2007 4:07 Modified: 25 Dec 2007 15:30
Reporter: Venu Anuganti Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.1.16 and latest 5.1.22 OS:Any
Assigned to: Evgeny Potemkin CPU Architecture:Any
Tags: regression

[6 Dec 2007 4:07] Venu Anuganti
Description:
Here is a case where the same query runs fine in about 0 secs with MySQL server 5.1.16 and the same one takes about 1m 20secs with the latest 5.1.22 version on the same box using the same hardware and configuration.

How to repeat:
Lets consider the following two tables urls and urls_categories…

 
| urls  | CREATE TABLE `urls` (
  `url_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `url` VARBINARY(2000) NOT NULL,
  `url_status` ENUM(‘EXISTS’,‘DEAD’,‘N/A’) NOT NULL,
  `yservice_id` SMALLINT(5) UNSIGNED DEFAULT NULL,
  `created_by` SMALLINT(5) UNSIGNED NOT NULL,
  `modified_by` SMALLINT(5) UNSIGNED NOT NULL,
  `created_on` DATETIME NOT NULL,
  `modified_on` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `nocheck` TINYINT(1) NOT NULL DEFAULT ‘0′,
  `url_md5` CHAR(32) NOT NULL,
  PRIMARY KEY (`url_id`),
  UNIQUE KEY `url_unique` (`url_md5`),
  KEY `modified_on` (`modified_on`),
  KEY `url` (`url`(50)),
  KEY `yservice_id` (`yservice_id`),
  KEY `modified_by` (`modified_by`),
  KEY `created_by` (`created_by`)
) ENGINE=INNODB DEFAULT CHARSET=latin1 |
 
 
| urls_categories | CREATE TABLE `urls_categories` (
  `url_id` INT(10) UNSIGNED NOT NULL,
  `category_id` TINYINT(3) UNSIGNED NOT NULL,
  `workflow` ENUM(‘NEW’,‘ACTIVATED’,‘DEACTIVATED’,‘REMOVED’) NOT NULL,
  `searchtype_id` TINYINT(3) UNSIGNED NOT NULL,
  `work_id` TINYINT(3) UNSIGNED NOT NULL,
  `created_by` SMALLINT(5) UNSIGNED NOT NULL,
  `modified_by` SMALLINT(5) UNSIGNED NOT NULL,
  `created_on` DATETIME NOT NULL,
  `modified_on` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `comment` VARCHAR(200) NOT NULL,
  PRIMARY KEY (`url_id`,`category_id`),
  KEY `search_index` (`modified_on`,`workflow`,`category_id`),
  KEY `rd_index` (`url_id`,`category_id`,`workflow`),
  KEY `category_id` (`category_id`),
  KEY `work_id` (`work_id`),
  KEY `modified_by` (`modified_by`),
  KEY `created_by` (`created_by`),
  KEY `searchtype_id` (`searchtype_id`)
) ENGINE=INNODB DEFAULT CHARSET=latin1 |
 
And here is the explain output for a simple inner join query when ran with MySQL version 5.1.16:

 
mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc  USING (url_id) ORDER BY uc.modified_on LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: index
possible_keys: PRIMARY,rd_index
          key: search_index
      key_len: 6
          ref: NULL
         rows: 3121058
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows IN SET (0.00 sec)
 
and here is what you get when ran with MySQL 5.1.22:

 
mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc  USING (url_id) ORDER BY uc.modified_on LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ALL
possible_keys: PRIMARY,rd_index
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3121045
        Extra: USING filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows IN SET (0.00 sec)
 
as you can cleary see, for urls_categories table, the key search_index is missing and optimizer is not picking that. Even if I force to use the key, still no effect:

 
mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc FORCE key(search_index,PRIMARY,rd_index)  USING (url_id) ORDER BY uc.modified_on LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ALL
possible_keys: PRIMARY,rd_index
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3121045
        Extra: USING filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows IN SET (0.00 sec)
 
mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc FORCE key(search_index)  USING (url_id) ORDER BY uc.modified_on LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3121045
        Extra: USING filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows IN SET (0.00 sec)
 
As a result of the above bad index use, the query takes about 1m 20 secs with 5.1.22 where as the same one runs in about 0.0 secs as shown below..

with 5.1.22:

 
mysql> SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc  USING (url_id) ORDER BY uc.modified_on LIMIT 10;
…
…
10 rows IN SET (1 min 14.55 sec)
 
with 5.1.16:

 
mysql> SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc  USING (url_id) ORDER BY uc.modified_on LIMIT 10;
…
…
10 rows IN SET (0.00 sec)
 

Suggested fix:
Optimizer to pick the right index...
[6 Dec 2007 4:40] Valeriy Kravchuk
Thank you for a problem report. Can you upload (as private file) dump of tables data to demonstrate the behaviour described?
[6 Dec 2007 4:53] Venu Anuganti
You don't need a table data to run a explain and see the difference between the versions
[6 Dec 2007 4:59] Valeriy Kravchuk
Yes, I saw a difference in EXPLAIN, on empty tables. But to call it a verified optimizer bug we need to prove that different plan leads to slower execution. On empty tables they are all equally good (or bad).

I had also found a bit similar (but simpler) case where optimizer currently do NOT use index for ORDER BY ... LIMIT N. It is described in bug #28404. That bug is already fixed, but fix will be included in 5.1.23.

So, is it possible for you to upload some data? If no, please, send the results of:

SELECT COUNT(*) FROM urls u INNER JOIN urls_categories uc USING (url_id);
SHOW TABLE STATUS LIKE urls;
SHOW TABLE STATUS LIKE urls_categories;
[7 Dec 2007 16:54] Valeriy Kravchuk
Verified just as described (on Solaris 10) with 5.1.22 on the data provided (see next private comment):

mysql> show table status\G
*************************** 1. row ***************************
           Name: urls
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 1493703
 Avg_row_length: 118
    Data_length: 176881664
Max_data_length: 0
   Index_length: 288243712
      Data_free: 0
 Auto_increment: 3001829
    Create_time: 2007-12-07 06:26:27
    Update_time: NULL
     Check_time: NULL
      Collation: ujis_japanese_ci
       Checksum: NULL
 Create_options:
        Comment:
*************************** 2. row ***************************
           Name: urls_categories
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 3121309
 Avg_row_length: 46
    Data_length: 146440192
Max_data_length: 0
   Index_length: 306184192
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2007-12-07 06:26:46
    Update_time: NULL
     Check_time: NULL
      Collation: ujis_japanese_ci
       Checksum: NULL
 Create_options:
        Comment:
2 rows in set (0.09 sec)

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc  USING (url_id) ORDER BY uc.modified_on LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1493703
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: ref
possible_keys: PRIMARY,rd_index
          key: PRIMARY
      key_len: 4
          ref: test.u.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

Query is executed for 1 min 23.36 sec (to get 10 rows). FORCE INDEX does NOT work:

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc FORCE INDEX FOR ORDER BY (search_index)  USING (url_id) ORDER BY uc.modified_on LIMIT 10;
+----+-------------+-------+------+------------------+---------+---------+---------------+---------+---------------------------------+
| id | select_type | table | type | possible_keys    | key     | key_len | ref           | rows    | Extra                           |
+----+-------------+-------+------+------------------+---------+---------+---------------+---------+---------------------------------+
|  1 | SIMPLE      | u     | ALL  | PRIMARY          | NULL    | NULL    | NULL          | 1493703 | Using temporary; Using filesort |
|  1 | SIMPLE      | uc    | ref  | PRIMARY,rd_index | PRIMARY | 4       | test.u.url_id |       1 |                                 |
+----+-------------+-------+------+------------------+---------+---------+---------------+---------+---------------------------------+
2 rows in set (0.00 sec)

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc FORCE INDEX FOR JOIN (search_index)  USING (url_id) ORDER BY uc.modified_on LIMIT 10;
+----+-------------+-------+--------+---------------+---------+---------+----------------+---------+----------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref            | rows    | Extra          |
+----+-------------+-------+--------+---------------+---------+---------+----------------+---------+----------------+
|  1 | SIMPLE      | uc    | ALL    | NULL          | NULL    | NULL    | NULL           | 3121309 | Using filesort |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | test.uc.url_id |       1 |                |
+----+-------------+-------+--------+---------------+---------+---------+----------------+---------+----------------+
2 rows in set (0.00 sec)

But actually using this index is very beneficial, as can be demonstrated by the following, a bit modified query that makes this index "covering" (and has to use STRAIGHT_JOIN hint):

mysql> EXPLAIN SELECT SQL_NO_CACHE u.*, uc.modified_on, uc.workflow, uc.category_id FROM urls_categories uc STRAIGHT_JOIN urls u WHERE uc.url_id = u.url_id ORDER BY uc.modified_on LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: index
possible_keys: PRIMARY,rd_index
          key: search_index
      key_len: 6
          ref: NULL
         rows: 3121309
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

It runs amazingly fast (0.00 sec), and additional 10 table accesses to get other columns should NOT make any big difference, comparing to scanning the ENTIRE table and sorting.
[7 Dec 2007 17:03] Valeriy Kravchuk
According to last Igor's comment at http://bugs.mysql.com/bug.php?id=28404:

"Now we can use an index for ORDER BY only if 
1. the index it's covering
or  
2. ORDER BY is used with LIMIT N  and the cost
of N random accesses is less than the cost of filesort.
The latter is always true for small N and big tables."

As we can see, item 1 "works", while item 2 is now ignored in case of join.

Separate problem (#2) here is that the above comments are NOT presented at http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html or at http://dev.mysql.com/doc/refman/5.1/en/limit-optimization.html

Problem #3 is that FORCE INDEX FOR ORDER BY is silently ignored.
[25 Dec 2007 15:30] Evgeny Potemkin
Repeatable on 5.1.22. Already fixed in 5.1.23.

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc  
USING (url_id) ORDER BY uc.modified_on LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: index
possible_keys: PRIMARY,rd_index
          key: search_index
      key_len: 6
          ref: NULL
         rows: 10
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

mysql> EXPLAIN SELECT SQL_NO_CACHE * FROM urls u INNER JOIN urls_categories uc F
ORCE INDEX FOR JOIN (search_index)  USING (url_id) ORDER BY uc.modified_on LIMIT
 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: uc
         type: index
possible_keys: NULL
          key: search_index
      key_len: 6
          ref: NULL
         rows: 10
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.uc.url_id
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

mysql> select version();
+-----------------+
| version()       |
+-----------------+
| 5.1.23-rc-debug | 
+-----------------+
1 row in set (0.00 sec)