Bug #36064 FR: MySQL can use eq_ref in more cases
Submitted: 14 Apr 2008 16:41 Modified: 15 Apr 2008 3:58
Reporter: Baron Schwartz (Basic Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: qc

[14 Apr 2008 16:41] Baron Schwartz
Description:
MySQL can sometimes use eq_ref for a join, instead of ref.  If the ref is on a leftmost prefix of the index that matches exactly another index that is unique, the access can be a ref.

I spoke to Monty and Igor about this today, and will alert them of this bug report.

How to repeat:
Create test data:

drop table if exists fact, dim;

create table fact(
  fact_id int not null auto_increment,
  dim_id int not null,
  primary key(fact_id)
) engine=myisam;

create table dim(
  dim_id int not null auto_increment,
  other char(5) not null,
  primary key(dim_id),
  key(dim_id, other)
) engine=myisam;

insert into fact(dim_id)
   values(1), (2), (3), (3), (4), (5);

insert into dim(other)
   values('hello'), ('world'), ('there'), ('is'), ('noone'), ('here');

Now that the test data is there, I'd like to do a full-scan join from fact to dim.  If I write it as follows, I won't get the best possible performance because MySQL doesn't use a covering index on the dimension table:

mysql> explain select fact.fact_id, dim.dim_id, dim.other
    -> from fact
    ->    inner join dim using(dim_id)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: fact
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 6
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: dim
         type: eq_ref
possible_keys: PRIMARY,dim_id
          key: PRIMARY
      key_len: 4
          ref: test.fact.dim_id
         rows: 1
        Extra: 

I can force the index to be used as a covering index:

mysql> explain select fact.fact_id, dim.dim_id, dim.other from fact    inner join dim force index(dim_id) using(dim_id)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: fact
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 6
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: dim
         type: ref
possible_keys: dim_id
          key: dim_id
      key_len: 4
          ref: test.fact.dim_id
         rows: 1
        Extra: Using index

However, MySQL has now changed the access type to the dim table, from an eq_ref to a ref.

As a result, the Handler activity changes when the query runs.  Here's the Handler activity for the first query plan, without the covering index (on a much larger dataset):

mysqladmin extended -r -i 10 | grep Handler
| Handler_read_key                  | 415678      |
| Handler_read_rnd_next             | 429224      |

With the covering index,

| Handler_read_key                  | 399826      |
| Handler_read_next                 | 390819      |
| Handler_read_rnd_next             | 382064      |

Notice that the query is about 10% slower now.
[14 Apr 2008 22:27] Sergei Golubchik
It's not a bug.

First, notice that the optimizer prefers a faster execution plan, manual tweaking (with FORCE INDEX) makes it slower.

Then, eq_ref is only used (in col=val) when optimizer knows that for every value of val, there could be only one value in col, that is only one row could be found (no more than one, to be precise). This only works when a unique index on (col) is used. You force the server to use the index (col, anothercol), this index cannot guarantee that all values of col are unique, and the server cannot use eq_ref anymore.

Of course, technically, it could've noticed that besides this, forced, index there's another index that, though unused in the query, guarantees uniqueness of all values of col. But the fact that optimizer doesn't do it is not a bug, at most it could be considered a feature request ("take into account information about existing unique indexes even if they cannot be used in the query")
[14 Apr 2008 23:07] Baron Schwartz
Right, this is definitely a feature request.  Sorry, I should have said that in the report.