MySQL Bugs: #70002: very low performance join - eq_ref and ref access broken for compound indexes

Bug #70002	very low performance join - eq_ref and ref access broken for compound indexes
Submitted:	12 Aug 2013 19:23	Modified:	19 Aug 2013 7:38
Reporter:	Justin Swanhart	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.6.13, 5.5, 5.1	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Data is in support/incoming/70001.sql

mysql> explain select         cust_id, max(odometer_reading - issued_odometer) as mileage     from         cr as car_read    inner join cc as cust_car ON car_read.car_id = cust_car.car_id         AND read_datetime between assigned_datetime AND unassigned_datetime     group by cust_car.id, cust_id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cust_car
         type: ALL
possible_keys: cust_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25944
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: car_read
         type: ref
possible_keys: idx_car_read_datetime
          key: idx_car_read_datetime
      key_len: 4
          ref: test.cust_car.car_id
         rows: 168
        Extra: Using index condition
2 rows in set (0.00 sec)

mysql> show create table cc\G
*************************** 1. row ***************************
       Table: cc
Create Table: CREATE TABLE `cc` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cust_id` int(11) NOT NULL,
  `car_id` int(11) NOT NULL,
  `assigned_datetime` datetime DEFAULT NULL,
  `unassigned_datetime` datetime NOT NULL,
  `issued_odometer` bigint(20) NOT NULL,
  `merged` tinyint(1) DEFAULT '0',
  `create_manually` tinyint(1) DEFAULT '0',
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `cust_id` (`cust_id`,`issued_odometer`,`car_id`)
) ENGINE=InnoDB AUTO_INCREMENT=25555 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)

mysql> show create table cr\G
*************************** 1. row ***************************
       Table: cr
Create Table: CREATE TABLE `cr` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `car_id` int(11) NOT NULL,
  `reader_id` int(11) NOT NULL,
  `odometer_reading` bigint(20) NOT NULL,
  `longitude` float DEFAULT NULL,
  `latitude` float DEFAULT NULL,
  `read_datetime` datetime NOT NULL,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_car_read_datetime` (`car_id`,`read_datetime`),
  KEY `idx_reader_id` (`reader_id`)
) ENGINE=InnoDB AUTO_INCREMENT=255616 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)

The key_len is the second table should be longer than 4.  

The query takes >4 seconds on fastest available machines:
mysql> select ...
25554 rows in set (4.46 sec)

Performance is worse without ICP:
mysql> set optimizer_switch='index_condition_pushdown=off';
Query OK, 0 rows affected (0.00 sec)
mysql> select ...
25554 rows in set (1 min 11.45 sec)

mysql> set optimizer_switch='index_condition_pushdown=off,mrr_cost_based=off,mrr=on,batched_key_access=on';
Query OK, 0 rows affected (0.00 sec)

mysql> set read_rnd_buffer_size=4*1024*1024;
Query OK, 0 rows affected (0.00 sec)

mysql> set join_buffer_size=4*1024*1024;
Query OK, 0 rows affected (0.00 sec)

mysql> explain select         cust_id, max(odometer_reading - issued_odometer) as mileage     from         cr as car_read    inner join cc as cust_car ON car_read.car_id = cust_car.car_id         AND read_datetime between assigned_datetime AND unassigned_datetime     group by cust_car.id, cust_id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cust_car
         type: ALL
possible_keys: cust_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25944
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: car_read
         type: ref
possible_keys: idx_car_read_datetime
          key: idx_car_read_datetime
      key_len: 4
          ref: test.cust_car.car_id
         rows: 168
        Extra: Using where; Using join buffer (Batched Key Access)
2 rows in set (0.00 sec)
mysql> select ...
25554 rows in set (1 min 12.57 sec)

How to repeat:
This query comes from:
http://blog.geniedb.com/2013/08/08/aggregation-queries-10x-slower-on-mysql-compared-to-pos...

The post says postgres on same schema does query in 150ms.

I don't know why the BETWEEN condition on read_datetime can't use an index.

Not even a unique index can help (should be EQ_REF access at this point!):
mysql> alter table cr add unique key(car_id, read_datetime);
Query OK, 0 rows affected (0.61 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select cust_id, max(odometer_reading - issued_odometer) as mileage     from         cr as car_read    inner join cc as cust_car ON car_read.car_id = cust_car.car_id         AND read_datetime between assigned_datetime AND unassigned_datetime     group by cust_car.id, cust_id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cust_car
         type: ALL
possible_keys: cust_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25944
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: car_read
         type: ref
possible_keys: car_id,idx_car_read_datetime
          key: car_id
      key_len: 4
          ref: test.cust_car.car_id
         rows: 158
        Extra: Using index condition
2 rows in set (0.00 sec)

Just in case, I checked to see if it was nullability on the first table.  
Nope:
mysql>  alter table cc modify assigned_datetime datetime not null;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table cc\G
*************************** 1. row ***************************
       Table: cc
Create Table: CREATE TABLE `cc` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cust_id` int(11) NOT NULL,
  `car_id` int(11) NOT NULL,
  `assigned_datetime` datetime NOT NULL,
  `unassigned_datetime` datetime NOT NULL,
  `issued_odometer` bigint(20) NOT NULL,
  `merged` tinyint(1) DEFAULT '0',
  `create_manually` tinyint(1) DEFAULT '0',
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `cust_id` (`cust_id`,`issued_odometer`,`car_id`)
) ENGINE=InnoDB AUTO_INCREMENT=25555 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)

mysql> explain select         cust_id, max(odometer_reading - issued_odometer) as mileage     from         cr as car_read    inner join cc as cust_car ON car_read.car_id = cust_car.car_id         AND read_datetime between assigned_datetime AND unassigned_datetime     group by cust_car.id, cust_id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cust_car
         type: ALL
possible_keys: cust_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25584
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: car_read
         type: ref
possible_keys: car_id,idx_car_read_datetime
          key: car_id
      key_len: 4
          ref: test.cust_car.car_id
         rows: 158
        Extra: Using index condition
2 rows in set (0.00 sec)

Suggested fix:
Use the whole index.

Hello Justin,

Thank you for the report and test case.
I noticed during my tests that performance was worse without ICP and  partial/improper index use...

Is this similar to Bug #54808?

Thanks,
Umesh

In this case only one good index is available and the whole index isn't used, only a left prefix, so I don't think they are related.  That being said, I haven't personally examined the codepath so I can't rule it out either.

Hi Justin.

Thanks for the bug report. This looks like a duplicate of BUG#52030 but with a much less obscure reproducible. I'll link the two bugs together and request that the implementor verifies your test case as well when the issue has been fixed.

A short summary of what happens: MySQL can only do '[eq_]ref' access if the comparison operator is =. That happens to be the case for the first keypart. If MySQL is to make use of the BETWEEN predicate when looking up rows in the index, 'dynamic range' access has to be used. This is the kind that says "Range checked for each record (index 0x...)" in EXPLAIN. 'Dynamic range' is more costly to use than 'ref' because parts of the optimizer has to be invoked for every row in the referred-to table (cust_car in this case). Because of this higher cost, 'dynamic range' will not be chosen by MySQL if 'ref' access is possible. Admittedly, this heuristic sometimes fail like in this bug report.

Thanks,
Jørgen