Bug #72501 joining NULLs + different collation is not efficient
Submitted: 1 May 2014 17:31
Reporter: Arnaud Adant Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.17 OS:Any
Assigned to: CPU Architecture:Any

[1 May 2014 17:31] Arnaud Adant
Description:
When joining NULL values using collate to align the joined column collation, the performance is dramatically slower causing a lot of handler_read_next (index scans).

This can cause big issues in updates or sub-selects if collate is used.

For example joining t1.c1 utf8_general_ci  and t2.c1 utf8_bin

using  

t1.c1 = t2.c1 collate utf8_general_ci

no null values :

| Handler_read_next          | 10    |

null values in t1.c1 :

| Handler_read_next          | 10    |

null values on t1.c1 and t2.c1 and :

| Handler_read_next          | 5242900 |

How to repeat:
drop table if exists t1, t2;
create table t1(id int auto_increment primary key, c1 varchar(255) collate utf8_general_ci, key(c1)) charset=utf8 engine=InnoDB;
create table t2(id int auto_increment primary key, t1_id int, c1 varchar(255) collate utf8_bin) charset=utf8 engine=InnoDB;

truncate table t1;
insert into t1(c1) values(rpad(rand()*100000,255,'0'));
insert into t1(c1) values(rpad(rand()*100000,255,'0'));
replace into t1(c1) select rpad(rand()*100000,255,'0') from t1 t1, t1 t2, t1 t3, t1 t4, t1 t5, t1 t6, t1 t7, t1 t8, t1 t9, t1 t10, t1 t11, t1 t12, t1 t13, t1 t14, t1 t15, t1 t16, t1 t17, t1 t18, t1 t19;

truncate table t2;

insert into t2(c1) select c1 from t1;

explain select (select count(*) from t1  where t1.c1 = t2.c1 collate utf8_general_ci) cnt from t2 limit 10;
flush status;
select (select count(*) from t1  where t1.c1 = t2.c1 collate utf8_general_ci) cnt from t2 limit 10;
show session status like 'Handler%';
update t2 set c1 = null;
flush status;
select (select count(*) from t1  where t1.c1 = t2.c1 collate utf8_general_ci) cnt from t2 limit 10;
show session status like 'Handler%';
update t1 set c1 = null;
flush status;
select (select count(*) from t1  where t1.c1 = t2.c1 collate utf8_general_ci) cnt from t2 limit 10;
show session status like 'Handler%';
alter table t2 convert to character set utf8;
flush status;
select (select count(*) from t1  where t1.c1 = t2.c1) cnt from t2 limit 10;
show session status like 'Handler%';

Suggested fix:
Fix the issue when collate is used.
[1 May 2014 18:12] Arnaud Adant
Confusing since the plan does not show anything problematic :

mysql> explain select (select count(*) from t1  where t1.c1 = t2.c1 collate utf8_general_ci) cnt from t2 limit 10;
+----+--------------------+-------+------+---------------+------+---------+------+--------+--------------------------+
| id | select_type        | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+--------------------+-------+------+---------------+------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | t2    | ALL  | NULL          | NULL | NULL    | NULL | 557166 | NULL                     |
|  2 | DEPENDENT SUBQUERY | t1    | ref  | c1            | c1   | 768     | func |      1 | Using where; Using index |
+----+--------------------+-------+------+---------------+------+---------+------+--------+--------------------------+
2 rows in set (0.01 sec)