MySQL Bugs: #9499: equal condition could allow substitution of columns for better key utilization

Bug #9499	equal condition could allow substitution of columns for better key utilization
Submitted:	30 Mar 2005 20:17	Modified:	30 Mar 2005 22:19
Reporter:	Timothy Smith	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:	4.1	OS:	Any (all)
Assigned to:		CPU Architecture:	Any

Description:

In the following two queries, the only difference is that the first uses "b.b like '0%'", and the second uses "a.b like '0%'".  Since "a.b = b.b", these two could work identically.  However, the first query uses "key_len: 61", while the second query uses "key_len: 102".  

In a real-world case, this can make a tremendous difference in performance.

mysql> explain select count(*) from a4921 a, b4921 b where a.a in ('a', 'aba') and a.b = b.b and b.b like '0%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: a
          key: a
      key_len: 61
          ref: NULL
         rows: 4
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: ref
possible_keys: b
          key: b
      key_len: 41
          ref: test.a.b
         rows: 1
        Extra: Using where; Using index
2 rows in set (0.00 sec)

mysql> explain select count(*) from a4921 a, b4921 b where a.a in ('a', 'aba') and a.b = b.b and a.b like '0%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: a
          key: a
      key_len: 102
          ref: NULL
         rows: 2
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: ref
possible_keys: b
          key: b
      key_len: 41
          ref: test.a.b
         rows: 1
        Extra: Using where; Using index
2 rows in set (0.00 sec)

How to repeat:

CREATE TABLE `a4921` (
  `a` varchar(60) default NULL,
  `b` varchar(40) default NULL,
  KEY `a` (`a`,`b`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `b4921` (
  `b` varchar(40) default NULL,
  KEY `b` (`b`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

load data infile '/usr/share/dict/words' into table a4921 (a);

update a4921 set b = md5(a);

insert into b4921 select b from a4921 limit 500;

explain select count(*) from a4921 a, b4921 b where a.a in ('a', 'aba') and a.b = b.b and b.b like '0%'\G

explain select count(*) from a4921 a, b4921 b where a.a in ('a', 'aba') and a.b = b.b and a.b like '0%'\G

Suggested fix:

Workaround is simple, once you spot it (but this is easy to overlook).  Just change b.b to a.b in the query.

Just to be clear, the two queries are logically identical, because of the transitive relationship (if a.b = b.b and b.b like '0%' then a.b like '0%').  The two queries could have the same plan.

Timothy

This has been implemented as a new feature in 5.0.  It may be in the 5.0.4 release.

According to mysql documentation:
http://dev.mysql.com/doc/mysql/en/where-optimizations.html

 Some of the optimizations performed by MySQL are listed here:

    *

      Removal of unnecessary parentheses:

   ((a AND b) AND c OR (((a AND b) AND (c AND d))))
-> (a AND b AND c) OR (a AND b AND c AND d)

    *

      Constant folding:

   (a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5

This is done.  So this is clearly a bug and NOT an optimization.

Hi, Dale!

This is actually very different from the constant propogation that is referred to in the manual.  This new optimization has not yet been documented, and is new in MySQL 5.0.  It took several logged months of work to produce.