Bug #83323 Optimizer chooses wrong plan when joining 2 tables
Submitted: 10 Oct 2016 18:02 Modified: 24 Apr 2017 21:32
Reporter: Alexander Rubin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7, 5.6, 8.0.1 OS:Any
Assigned to: CPU Architecture:Any
Tags: Optimizer

[10 Oct 2016 18:02] Alexander Rubin
Description:
Optimizer chooses wrong plan when joining 2 tables:

mysql> explain  select  * from events_new_distr e  join profiles p on e.profile_id = p.id  where event_date between '2016-01-01' and '2016-01-01' + interval 15 day and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16)  order by event_date desc limit  10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 15
     filtered: 10.00
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: ref
possible_keys: profile_id,event_date,comb
          key: comb
      key_len: 5
          ref: optimizer_issue.p.id
         rows: 75696
     filtered: 3.28
        Extra: Using where; Using index
2 rows in set, 1 warning (0.00 sec)

Response time: 40 seconds.

The problem here is the optimizer does not choose the right table order and decided to use temp table.

The right way is this:

mysql> explain  select STRAIGHT_JOIN * from events_new_distr e  join profiles p on e.profile_id = p.id  where event_date between '2016-01-01' and '2016-01-01' + interval 15 day and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16)  order by event_date desc limit  10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: range
possible_keys: profile_id,event_date,comb
          key: event_date
      key_len: 6
          ref: NULL
         rows: 1703196
     filtered: 50.00
        Extra: Using index condition; Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: optimizer_issue.e.profile_id
         rows: 1
     filtered: 10.00
        Extra: Using where
2 rows in set, 1 warning (0.01 sec)

With ORDER BY + LIMIT 10 MySQL can:
1. Avoid creating temp table and scan the table in the order of the index
2. Abort the query execution when it will find 10 rows.

(That is ORDER BY + LIMIT optimization, http://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html)

However,  75696*15 = 1135440 rows < 1703196 rows.
What is also interesting is that the second plan is not even considered in optimizer trace, attached.

How to repeat:
Smaller test case:

CREATE TABLE `profiles` (
  `id` int(11) NOT NULL DEFAULT '0' AUTO_INCREMENT,
  `email` varchar(255) DEFAULT NULL,
  `status` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

mysql> insert into profiles(status) values ('active');
Query OK, 1 row affected (0.00 sec)

...

mysql> insert into profiles (status)  select status from profiles;
Query OK, 16384 rows affected (0.20 sec)
Records: 16384  Duplicates: 0  Warnings: 0

CREATE TABLE `events` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `event_date` datetime DEFAULT NULL,
  `profile_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `profile_id` (`profile_id`),
  KEY `event_date` (`event_date`),
  KEY `comb` (`profile_id`,`event_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

mysql> insert into events (event_date, profile_id) select now() - interval id second, id from profiles;
Query OK, 32768 rows affected (0.93 sec)
Records: 32768  Duplicates: 0  Warnings: 0

...

up to 425984 rows 

mysql> update events set event_date = now() - interval id second;
Query OK, 425984 rows affected (15.88 sec)
Rows matched: 425984  Changed: 425984  Warnings: 0

mysql> update events set profile_id = 1;
Query OK, 425971 rows affected (15.41 sec)
Rows matched: 425984  Changed: 425971  Warnings: 0

mysql> update events set profile_id = 2 where mod(id, 1000) = 1;
Query OK, 426 rows affected (0.41 sec)
Rows matched: 426  Changed: 426  Warnings: 0

mysql> update events set profile_id = 3 where mod(id, 100) = 1;
Query OK, 4260 rows affected (0.55 sec)
Rows matched: 4260  Changed: 4260  Warnings: 0

mysql> update events set profile_id = 3 where mod(id, 5000) = 1;
Query OK, 0 rows affected (0.39 sec)
Rows matched: 84  Changed: 0  Warnings: 0

mysql> explain select  * from events e  join profiles p on e.profile_id = p.id  where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16)  order by event_date desc limit  10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 15
     filtered: 10.00
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: ref
possible_keys: profile_id,event_date,comb
          key: profile_id
      key_len: 5
          ref: test.p.id
         rows: 1
     filtered: 28.74
        Extra: Using where
2 rows in set, 1 warning (0.00 sec)

mysql> select  * from events e  join profiles p on e.profile_id = p.id  where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16)  order by event_date desc limit  10;
+----+---------------------+------------+----+-------+--------+
| id | event_date          | profile_id | id | email | status |
+----+---------------------+------------+----+-------+--------+
|  2 | 2016-10-10 17:50:04 |          1 |  1 | NULL  | active |
|  3 | 2016-10-10 17:50:03 |          1 |  1 | NULL  | active |
|  4 | 2016-10-10 17:50:02 |          1 |  1 | NULL  | active |
|  5 | 2016-10-10 17:50:01 |          1 |  1 | NULL  | active |
|  6 | 2016-10-10 17:50:00 |          1 |  1 | NULL  | active |
|  7 | 2016-10-10 17:49:59 |          1 |  1 | NULL  | active |
|  8 | 2016-10-10 17:49:58 |          1 |  1 | NULL  | active |
|  9 | 2016-10-10 17:49:57 |          1 |  1 | NULL  | active |
| 10 | 2016-10-10 17:49:56 |          1 |  1 | NULL  | active |
| 11 | 2016-10-10 17:49:55 |          1 |  1 | NULL  | active |
+----+---------------------+------------+----+-------+--------+
10 rows in set (0.84 sec)

mysql> explain select STRAIGHT_JOIN  * from events e  join profiles p on e.profile_id = p.id  where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16)  order by event_date desc limit  10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: index
possible_keys: profile_id,event_date,comb
          key: event_date
      key_len: 6
          ref: NULL
         rows: 394
     filtered: 14.30
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.e.profile_id
         rows: 1
     filtered: 10.00
        Extra: Using where
2 rows in set, 1 warning (0.00 sec)

mysql> select STRAIGHT_JOIN  * from events e  join profiles p on e.profile_id = p.id  where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16)  order by event_date desc limit  10;
+----+---------------------+------------+----+-------+--------+
| id | event_date          | profile_id | id | email | status |
+----+---------------------+------------+----+-------+--------+
|  2 | 2016-10-10 17:50:04 |          1 |  1 | NULL  | active |
|  3 | 2016-10-10 17:50:03 |          1 |  1 | NULL  | active |
|  4 | 2016-10-10 17:50:02 |          1 |  1 | NULL  | active |
|  5 | 2016-10-10 17:50:01 |          1 |  1 | NULL  | active |
|  6 | 2016-10-10 17:50:00 |          1 |  1 | NULL  | active |
|  7 | 2016-10-10 17:49:59 |          1 |  1 | NULL  | active |
|  8 | 2016-10-10 17:49:58 |          1 |  1 | NULL  | active |
|  9 | 2016-10-10 17:49:57 |          1 |  1 | NULL  | active |
| 10 | 2016-10-10 17:49:56 |          1 |  1 | NULL  | active |
| 11 | 2016-10-10 17:49:55 |          1 |  1 | NULL  | active |
+----+---------------------+------------+----+-------+--------+
10 rows in set (0.00 sec)

Suggested fix:
Apply the ORDER BY + LIMIT optimization to the table joins
[10 Oct 2016 18:06] Alexander Rubin
optimizer trace

Attachment: trace2.json (application/json, text), 24.50 KiB.

[12 Oct 2016 17:19] MySQL Verification Team
Alexander,

Thank you for your bug report. I managed to repeat it:

./client/mysql sinbas -e "select count(*) from events"
+----------+
| count(*) |
+----------+
|   557090 |
+----------+

mysql>  update events set event_date = now() - interval id second;
Query OK, 557090 rows affected (7 min 26.22 sec)
Rows matched: 557090  Changed: 557090  Warnings: 0

mysql> 
mysql> 
mysql> 
mysql> update events set profile_id = 1;
Query OK, 557073 rows affected (7 min 46.32 sec)
Rows matched: 557090  Changed: 557073  Warnings: 0

mysql> 
mysql> 
mysql> 
mysql> update events set profile_id = 2 where mod(id, 1000) = 1;
2016-10-12T17:05:41.405017Z 16419 [Warning] DEBUG: Full table scan on table events.  Reason: 'quick == 0'  Records2: 557397  Records3: 557397 Optimizer_flag: 523775  Query: update events set profile_id = 2 where mod(id, 1000) = 1
Query OK, 557 rows affected (44.08 sec)
Rows matched: 557  Changed: 557  Warnings: 0

mysql> update events set profile_id = 3 where mod(id, 100) = 1;
2016-10-12T17:08:09.129270Z 16419 [Warning] DEBUG: Full table scan on table events.  Reason: 'quick == 0'  Records2: 557397  Records3: 557397 Optimizer_flag: 523775  Query: update events set profile_id = 3 where mod(id, 100) = 1

Query OK, 5571 rows affected (42.44 sec)
Rows matched: 5571  Changed: 5571  Warnings: 0

mysql> 
mysql> update events set profile_id = 3 where mod(id, 5000) = 1;
2016-10-12T17:09:20.325390Z 16419 [Warning] DEBUG: Full table scan on table events.  Reason: 'quick == 0'  Records2: 557397  Records3: 557397 Optimizer_flag: 523775  Query: update events set profile_id = 3 where mod(id, 5000) = 1
Query OK, 0 rows affected (34.19 sec)
Rows matched: 111  Changed: 0  Warnings: 0

mysql> explain select  * from events e  join profiles p on e.profile_id
    -> = p.id  where event_date between now() - interval 1 day and now() and
    -> status = 'active' and profile_id in
    -> (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16)  order by event_date desc limit 
    -> 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 15
     filtered: 10.00
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: ref
possible_keys: profile_id,event_date,comb
          key: comb
      key_len: 5
          ref: sinbas.p.id
         rows: 12
     filtered: 20.10
        Extra: Using where; Using index
2 rows in set, 1 warning (0.01 sec)

mysql> explain select STRAIGHT_JOIN  * from events e  join profiles p on
    -> e.profile_id = p.id  where event_date between now() - interval 1 day and
    -> now() and status = 'active' and profile_id in
    -> (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16)  order by event_date desc limit 
    -> 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: index
possible_keys: profile_id,event_date,comb
          key: event_date
      key_len: 6
          ref: NULL
         rows: 536
     filtered: 9.97
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: sinbas.e.profile_id
         rows: 1
     filtered: 10.00
        Extra: Using where
2 rows in set, 1 warning (0.00 sec)

mysql>  select STRAIGHT_JOIN  * from events e  join profiles p on
    -> e.profile_id = p.id  where event_date between now() - interval 1 day and
    -> now() and status = 'active' and profile_id in
    -> (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16)  order by event_date desc limit 
    -> 10;
+----+---------------------+------------+----+-------+--------+
| id | event_date          | profile_id | id | email | status |
+----+---------------------+------------+----+-------+--------+
|  2 | 2016-10-12 18:49:03 |          1 |  1 | xx@yy | active |
|  3 | 2016-10-12 18:49:02 |          1 |  1 | xx@yy | active |
|  4 | 2016-10-12 18:49:01 |          1 |  1 | xx@yy | active |
|  5 | 2016-10-12 18:49:00 |          1 |  1 | xx@yy | active |
|  6 | 2016-10-12 18:48:59 |          1 |  1 | xx@yy | active |
|  7 | 2016-10-12 18:48:58 |          1 |  1 | xx@yy | active |
|  8 | 2016-10-12 18:48:57 |          1 |  1 | xx@yy | active |
|  9 | 2016-10-12 18:48:56 |          1 |  1 | xx@yy | active |
| 10 | 2016-10-12 18:48:55 |          1 |  1 | xx@yy | active |
| 11 | 2016-10-12 18:48:54 |          1 |  1 | xx@yy | active |
+----+---------------------+------------+----+-------+--------+
10 rows in set (0.01 sec)
[12 Oct 2016 17:20] MySQL Verification Team
Thank you for your report.

Fully verified.
[13 Feb 2017 15:35] Alexander Rubin
5.6 is also affected
[24 Apr 2017 21:32] Alexander Rubin
8.0.1 is also affected
[13 Apr 2018 3:59] Erlend Dahl
Bug#88786 Optimzer choose wrong index

was marked as a duplicate.