Bug #9499 equal condition could allow substitution of columns for better key utilization
Submitted: 30 Mar 2005 20:17 Modified: 30 Mar 2005 22:19
Reporter: Timothy Smith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:4.1 OS:Any (all)
Assigned to: CPU Architecture:Any

[30 Mar 2005 20:17] Timothy Smith
Description:

In the following two queries, the only difference is that the first uses "b.b like '0%'", and the second uses "a.b like '0%'".  Since "a.b = b.b", these two could work identically.  However, the first query uses "key_len: 61", while the second query uses "key_len: 102".  

In a real-world case, this can make a tremendous difference in performance.

mysql> explain select count(*) from a4921 a, b4921 b where a.a in ('a', 'aba') and a.b = b.b and b.b like '0%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: a
          key: a
      key_len: 61
          ref: NULL
         rows: 4
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: ref
possible_keys: b
          key: b
      key_len: 41
          ref: test.a.b
         rows: 1
        Extra: Using where; Using index
2 rows in set (0.00 sec)

mysql> explain select count(*) from a4921 a, b4921 b where a.a in ('a', 'aba') and a.b = b.b and a.b like '0%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: a
          key: a
      key_len: 102
          ref: NULL
         rows: 2
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: ref
possible_keys: b
          key: b
      key_len: 41
          ref: test.a.b
         rows: 1
        Extra: Using where; Using index
2 rows in set (0.00 sec)

How to repeat:

CREATE TABLE `a4921` (
  `a` varchar(60) default NULL,
  `b` varchar(40) default NULL,
  KEY `a` (`a`,`b`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `b4921` (
  `b` varchar(40) default NULL,
  KEY `b` (`b`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

load data infile '/usr/share/dict/words' into table a4921 (a);

update a4921 set b = md5(a);

insert into b4921 select b from a4921 limit 500;

explain select count(*) from a4921 a, b4921 b where a.a in ('a', 'aba') and a.b = b.b and b.b like '0%'\G

explain select count(*) from a4921 a, b4921 b where a.a in ('a', 'aba') and a.b = b.b and a.b like '0%'\G

Suggested fix:

Workaround is simple, once you spot it (but this is easy to overlook).  Just change b.b to a.b in the query.
[30 Mar 2005 20:50] Timothy Smith
Just to be clear, the two queries are logically identical, because of the transitive relationship (if a.b = b.b and b.b like '0%' then a.b like '0%').  The two queries could have the same plan.

Timothy
[30 Mar 2005 22:19] Timothy Smith
This has been implemented as a new feature in 5.0.  It may be in the 5.0.4 release.
[30 Mar 2005 22:26] Dale Johnson
According to mysql documentation:
http://dev.mysql.com/doc/mysql/en/where-optimizations.html

 Some of the optimizations performed by MySQL are listed here:

    *

      Removal of unnecessary parentheses:

   ((a AND b) AND c OR (((a AND b) AND (c AND d))))
-> (a AND b AND c) OR (a AND b AND c AND d)

    *

      Constant folding:

   (a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5

This is done.  So this is clearly a bug and NOT an optimization.
[30 Mar 2005 23:45] Timothy Smith
Hi, Dale!

This is actually very different from the constant propogation that is referred to in the manual.  This new optimization has not yet been documented, and is new in MySQL 5.0.  It took several logged months of work to produce.