Bug #37333 Optimizer uses wrong index for the right table in LEFT JOIN
Submitted: 11 Jun 2008 8:58 Modified: 7 Aug 2013 18:39
Reporter: Valeriy Kravchuk Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.60, 5.0.64, 5.1.26 OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[11 Jun 2008 8:58] Valeriy Kravchuk
Description:
Instead of index on incidents.c_id (or index merge on all the indexed columns) optimizer uses the index on column chacked for IS NULL:

mysql> explain select count(*) from contacts contacts LEFT JOIN incidents incidents ON
    -> contacts.c_id = incidents.c_id WHERE ((incidents.disp_lvl4_id IN (275)
    -> OR incidents.disp_lvl1_id IN (362)) AND ( ((incidents.cat_lvl2_id IS
    -> NULL)))) AND incidents.c_id=1061546\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: contacts
         type: const
possible_keys: contacts$c_id
          key: contacts$c_id
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: incidents
         type: ref
possible_keys: incidents$c_id,incidents$cat_lvl2,incidents$disp_lvl1
          key: incidents$cat_lvl2
      key_len: 3
          ref: const
         rows: 10
        Extra: Using where
2 rows in set (0.05 sec)

Note estimated number of rows, totally wrong. In reality this plan leads to scanning million(s) of rows:

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 3     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 26    |
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> select count(*) from contacts contacts LEFT JOIN incidents incidents ON contacts.c_id = incidents.c_id WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) AND ( ((incidents.cat_lvl2_id IS NULL)))) AND incidents.c_id=1061546\G
*************************** 1. row ***************************
count(*): 0
1 row in set (34.92 sec)

mysql> show status like 'Handler_read%';
+-----------------------+---------+
| Variable_name         | Value   |
+-----------------------+---------+
| Handler_read_first    | 0       |
| Handler_read_key      | 8       |
| Handler_read_next     | 1889342 |
| Handler_read_prev     | 0       |
| Handler_read_rnd      | 0       |
| Handler_read_rnd_next | 33      |
+-----------------------+---------+
6 rows in set (0.00 sec)

While when index on c_id is used query runs instantly:

mysql> select count(*) from contacts contacts LEFT JOIN incidents incidents FORCE INDEX(`incidents$c_id`) ON contacts.c_id = incidents.c_id WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) AND ( ((incidents.cat_lvl2_id IS NULL)))) AND incidents.c_id=1061546\G
*************************** 1. row ***************************
count(*): 0
1 row in set (0.09 sec)

Note also the following:

mysql> show create table incidents\G
*************************** 1. row ***************************
       Table: incidents
Create Table: CREATE TABLE `incidents` (
  `i_id` int(11) NOT NULL default '0',
  `cat_lvl2_id` smallint(6) default NULL,
  `c_id` int(11) default NULL,
  `disp_lvl1_id` smallint(6) default NULL,
  `disp_lvl4_id` smallint(6) default NULL,
  UNIQUE KEY `incidents$i_id` (`i_id`),
  KEY `incidents$c_id` (`c_id`),
  KEY `incidents$cat_lvl2` (`cat_lvl2_id`),
  KEY `incidents$disp_lvl1` (`disp_lvl1_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.04 sec)

mysql> select count(*) from incidents where c_id=1061546;
+----------+
| count(*) |
+----------+
|       52 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from incidents where cat_lvl2_id is null;
+----------+
| count(*) |
+----------+
|  1889342 |
+----------+
1 row in set (1.95 sec)

mysql> select count(*) from incidents where disp_lvl1_id IN (362);
+----------+
| count(*) |
+----------+
|     2915 |
+----------+
1 row in set (0.01 sec)

Same plan is generated for both InnoDB and MyISAM tables, ANALYZE does not help.

For inner JOIN optimizer does a good job:

mysql> explain select count(*) from contacts contacts JOIN incidents incidents ON contacts.c_id = incidents.c_id WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) AND ( ((incidents.cat_lvl2_id IS NULL)))) AND incidents.c_id=1061546\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: contacts
         type: const
possible_keys: contacts$c_id
          key: contacts$c_id
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: incidents
         type: ref
possible_keys: incidents$c_id,incidents$cat_lvl2,incidents$disp_lvl1
          key: incidents$c_id
      key_len: 5
          ref: const
         rows: 70
        Extra: Using where
2 rows in set (0.02 sec)

mysql> select count(*) from contacts contacts JOIN incidents incidents ON contacts.c_id = incidents.c_id WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) AND ( ((incidents.cat_lvl2_id IS NULL)))) AND incidents.c_id=1061546\G
*************************** 1. row ***************************
count(*): 0
1 row in set (0.00 sec)

but OUTER should NOT matter in this case.

How to repeat:
Load data (uploaded to the FTP server). Then:

analyze table incidents;
analyze table contacts;

explain select count(*) from contacts contacts 
LEFT JOIN incidents FORCE INDEX (`incidents$disp_lvl1`) 
ON contacts.c_id = incidents.c_id 
WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) 
AND ( ((incidents.cat_lvl2_id IS NULL))))\G

explain select count(*) from contacts contacts 
LEFT JOIN incidents 
ON contacts.c_id = incidents.c_id 
WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) 
AND ( ((incidents.cat_lvl2_id IS NULL))))\G

explain select count(*) from contacts contacts 
JOIN incidents 
ON contacts.c_id = incidents.c_id 
WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) 
AND ( ((incidents.cat_lvl2_id IS NULL))))\G

select count(*) from contacts contacts 
LEFT JOIN incidents FORCE INDEX (`incidents$disp_lvl1`) 
ON contacts.c_id = incidents.c_id 
WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) 
AND ( ((incidents.cat_lvl2_id IS NULL))));

select count(*) from contacts contacts 
LEFT JOIN incidents 
ON contacts.c_id = incidents.c_id 
WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) 
AND ( ((incidents.cat_lvl2_id IS NULL))));

select count(*) from contacts contacts 
JOIN incidents 
ON contacts.c_id = incidents.c_id 
WHERE ((incidents.disp_lvl4_id IN (275) OR incidents.disp_lvl1_id IN (362)) 
AND ( ((incidents.cat_lvl2_id IS NULL))))\G

Compare plans and execution times (or number of rows accessed, based on 'Handler_read_%' values before and after each SELECT.

Suggested fix:
Use proper index(es) for right table in LEFT JOIN?
[11 Jun 2008 9:42] Valeriy Kravchuk
Recent 5.1.26 built from sources is also affected by this bug.
[17 Jun 2008 14:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48008

2664 Martin Hansson	2008-06-17
      Bug#37333: Optimizer uses wrong index for the right table in 
      LEFT JOIN
      
      The range optimizer assumed that an index can't be used on an
      IS NULL predicate on the outer table in an outer join, and refrained
      from calculating the correct cost. This lead to a wrong (too low) 
      cost value for this type of access.
      
      Fixed by not assuming that indexes can't be used for such cases.
[28 Sep 2010 8:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/119233

3204 Martin Hansson	2010-09-28
      Bug#37333: Optimizer uses wrong index for the right table in 
      LEFT JOIN
            
      The range optimizer assumed that an index can't be used on an IS NULL
      predicate on the outer table in an outer join, and refrained from calculating
      the cost for it. Since range access estimates are reused for ref type access,
      this lead to a wrong (too low) cost value for the latter access type.
            
      Fixed by removing the bespoke assumption and thus calculating a cost for the
      range scan anyway.
[7 Aug 2013 18:39] Paul DuBois
Noted in 5.7.2 changelog.

The range optimizer used the wrong prerequisite for concluding that a
table is the inner table of an outer join. This led to incorrect cost
estimates and choice of the wrong index for query processing.