Bug #15907 Binary comparison ignores indexes
Submitted: 21 Dec 2005 15:23 Modified: 15 Dec 2009 17:37
Reporter: Jon Bright Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.0.19-BK, 5.0.13, 6.0.14-bzr OS:Linux (Linux)
Assigned to: CPU Architecture:Any
Triage: Triaged: D3 (Medium)

[21 Dec 2005 15:23] Jon Bright
Description:
This seems to be related to bug #6045, but despite the resolution of that bug, it doesn't seem to be fixed.  Doing a join on binary index_in_a_table = binary index_in_the_other_table results in MySQL not using the indexes.  The SQL below reproduces it for me in 5.0.13.

Execute the SQL in the reproduction instructions.  
The first explain gives:
+----+-------------+-------+-------+---------------+---------+---------+-----------------------+------+-------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref                   | rows | Extra                   |
+----+-------------+-------+-------+---------------+---------+---------+-----------------------+------+-------------------------+
|  1 | SIMPLE      | b     | index | NULL          | PRIMARY | 12      | NULL                  |    2 | Using index             |
|  1 | SIMPLE      | a     | ref   | PRIMARY       | PRIMARY | 12      | testquotedb.b.somekey |    1 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+-----------------------+------+-------------------------+

All is well, it's using b.somekey.  The second gives:

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                   |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------------------+
|  1 | SIMPLE      | b     | index | NULL          | PRIMARY | 12      | NULL |    2 | Using index             |
|  1 | SIMPLE      | a     | ALL   | NULL          | NULL    | NULL    | NULL |    1 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------------------+

Not good, no indexes.  Particularly painful when we ran this on our equivalents of a and b which had 300,000 and 60,000 rows respectively.  The first query on equivalent tables completes in <2s, the second was still going after 2 hours.

How to repeat:
create table a (somekey varchar(10) not null primary key, foo varchar(12), bar varchar(12));
create table b (somekey varchar(10) not null primary key, foo varchar(12), bar varchar(12));
insert a values ('ABCD','bdhgdfg','fswrewr');
insert b values ('ABCD','gertiuh','dfgowen');
insert b values ('WXYZ','dfjgrew','oinewrw');
explain select a.somekey,a.foo from a right join b on a.somekey=b.somekey where a.somekey is null;
explain select a.somekey,a.foo from a right join b on binary a.somekey=binary b.somekey where a.somekey is null;

In case it's relevant, the create table above results in:

+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+----------
-----------+---------------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_ti
me         | Update_time         | Check_time | Collation         | Checksum | Create_options | Comment |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+----------
-----------+---------------------+------------+-------------------+----------+----------------+---------+
| a    | MyISAM |      10 | Dynamic    |    1 |             28 |          28 | 281474976710655 |         2048 |         0 |           NULL | 2005-12-2
1 16:09:55 | 2005-12-21 16:10:12 | NULL       | latin1_swedish_ci |     NULL |                |         |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+----------
-----------+---------------------+------------+-------------------+----------+----------------+---------+

b is the same.

Suggested fix:
We're using the following workaround:

explain select a.somekey,a.foo from a right join b on a.somekey=b.somekey and binary a.somekey=binary b.somekey where a.somekey is null;
+----+-------------+-------+-------+---------------+---------+---------+-----------------------+------+-------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref                   | rows | Extra                   |
+----+-------------+-------+-------+---------------+---------+---------+-----------------------+------+-------------------------+
|  1 | SIMPLE      | b     | index | NULL          | PRIMARY | 12      | NULL                  |    2 | Using index             |
|  1 | SIMPLE      | a     | ref   | PRIMARY       | PRIMARY | 12      | testquotedb.b.somekey |    1 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+-----------------------+------+-------------------------+

This uses the index and runs in much the same time as the first query explain-ed above, but has the effect of the second query.  Clearly, now we know about this, this isn't a big issue for us.  But it seems like MySQL should work out for itself that this is the quicker way to run the query.
[21 Dec 2005 16:05] Jon Bright
I've just checked, this behaviour is the same on 5.0.16.
[21 Dec 2005 16:07] Jon Bright
We originally experienced the behaviour on 4.1.12 - I've only done simple 2-row reproductions with 5.x.  I've checked on 4.1.12, though, and the explain output isn't noticeably different.
[22 Dec 2005 17:51] Valeriy Kravchuk
Thank you for a problem report. Verified just as described with 5.0.19-BK () build on Linux:

mysql> explain select a.somekey,a.foo from a right join b on a.somekey=b.somekey where
    -> a.somekey is null;
+----+-------------+-------+-------+---------------+---------+---------+----------------+------+-------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref        | rows | Extra                   |
+----+-------------+-------+-------+---------------+---------+---------+----------------+------+-------------------------+
|  1 | SIMPLE      | b     | index | NULL          | PRIMARY | 12      | NULL        |    2 | Using index             |
x|  1 | SIMPLE      | a     | ref   | PRIMARY       | PRIMARY | 12      | test.b.somekey |    1 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+----------------+------+-------------------------+
2 rows in set (0.00 sec)

mysql> explain select a.somekey,a.foo from a right join b on binary a.somekey=binary
    -> b.somekey where a.somekey is null;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                   |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------------------+
|  1 | SIMPLE      | b     | index | NULL          | PRIMARY | 12      | NULL |
   2 | Using index             |
|  1 | SIMPLE      | a     | ALL   | NULL          | NULL    | NULL    | NULL |
   1 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------------------+
2 rows in set (0.00 sec)

Your workaround from the older bug works (that bug is fixed in 5.0.19, I had checked):

mysql> explain select a.somekey,a.foo from a right join b on a.somekey=b.somekey and
    -> binary a.somekey=binary b.somekey where a.somekey is null;
+----+-------------+-------+-------+---------------+---------+---------+----------------+------+-------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref        | rows | Extra                   |
+----+-------------+-------+-------+---------------+---------+---------+----------------+------+-------------------------+
|  1 | SIMPLE      | b     | index | NULL          | PRIMARY | 12      | NULL        |    2 | Using index             |
|  1 | SIMPLE      | a     | ref   | PRIMARY       | PRIMARY | 12      | test.b.somekey |    1 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+----------------+------+-------------------------+
2 rows in set (0.00 sec)

But I do not think it is a real problem. This query is slightly weird, I believe. For all other similar, but more reasonable queries there are no problems:

mysql> explain select a.somekey,a.foo from a right join b on a.somekey=b.somekey and
    -> binary a.somekey=binary b.somekey where a.somekey is not null;
+----+-------------+-------+--------+---------------+---------+---------+-------+------+--------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref| rows | Extra                    |
+----+-------------+-------+--------+---------------+---------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | a     | system | PRIMARY       | NULL    | NULL    | NULL|    1 |                          |
|  1 | SIMPLE      | b     | ref    | PRIMARY       | PRIMARY | 12      | const |    1 | Using where; Using index |
+----+-------------+-------+--------+---------------+---------+---------+-------+------+--------------------------+
2 rows in set (0.00 sec)

mysql> explain select a.somekey,a.foo from a right join b on a.somekey=b.somekey and binary a.somekey=binary b.somekey where a.somekey = 'ABCD';
+----+-------------+-------+--------+---------------+---------+---------+-------+------+--------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref | rows | Extra                    |
+----+-------------+-------+--------+---------------+---------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | a     | system | PRIMARY       | NULL    | NULL    | NULL |    1 |                          |
|  1 | SIMPLE      | b     | ref    | PRIMARY       | PRIMARY | 12      | const |    1 | Using where; Using index |
+----+-------------+-------+--------+---------------+---------+---------+-------+------+--------------------------+
2 rows in set (0.01 sec)
[31 Aug 2006 16:22] Igor Babaev
This performance problem can be also demonstrated with the following
simple example:

mysql> CREATE TABLE t1 (a char(10) PRIMARY KEY);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t1 VALUES ('aaaa'), ('bbbb'), ('dddd'), ('cccc');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> CREATE TABLE t2 (a char(10) PRIMARY KEY);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t2 VALUES ('aaaa'), ('bbbb'), ('cccc'), ('eeee'), ('gggg'), ('hhhh');
Query OK, 6 rows affected (0.00 sec)

Records: 6  Duplicates: 0  Warnings: 0
mysql> EXPLAIN SELECT t1.a FROM t1,t2 WHERE t1.a=t2.a;
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
|  1 | SIMPLE      | t1    | index  | PRIMARY       | PRIMARY | 10      | NULL      |    4 | Using index |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY | 10      | test.t1.a |    1 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
2 rows in set (0.00 sec)

mysql> EXPLAIN SELECT t1.a FROM t1,t2 WHERE binary t1.a = binary t2.a;
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t1    | index | NULL          | PRIMARY | 10      | NULL |    4 | Using index              |
|  1 | SIMPLE      | t2    | index | NULL          | PRIMARY | 10      | NULL |    6 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
2 rows in set (0.01 sec)

mysql> EXPLAIN SELECT t1.a FROM t1,t2 WHERE t1.a=t2.a AND binary t1.a = binary t2.a;
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+--------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | Extra                    |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+--------------------------+
|  1 | SIMPLE      | t1    | index  | PRIMARY       | PRIMARY | 10      | NULL      |    4 | Using index              |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY | 10      | test.t1.a |    1 | Using where; Using index |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+--------------------------+
2 rows in set (0.01 sec)

To fix this problem we have to be able to detect the cases
when we have in the WHERE condition a conjunctive predicate
of the form f1(t1.a)=f2(t2.a) such that
f1(t1.a)=f2(t2.a) => t1.a = t2.a and there is an
index that allow us to get the value t2.a from the value of
t1.a or vice versa.
In a general case this would require not a trivial check
procedure. That's why I mark this problem as 'To be fixed later'.
[28 Dec 2012 21:10] Hartmut Holzgraefe
Just to show that this is not specific to the BINARY comparison operator:
The same is happening with simple integer lookups in the form of

  indexed_column_value + 0 = some_constant

mysql> create table t1(id int primary key, val int);
Query OK, 0 rows affected (0.41 sec)

mysql> insert into t1 values (1,1),(2,2),(23,42);
Query OK, 3 rows affected (0.06 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> explain select * from t1 where id=23;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | t1    | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.01 sec)

mysql> explain select * from t1 where id + 0 = 23;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    3 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)