| Bug #41350 | String expression collation determined only by the first equality operand | ||
|---|---|---|---|
| Submitted: | 10 Dec 2008 8:00 | Modified: | 10 Dec 2008 8:30 |
| Reporter: | Ilan Hazan | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 5.0.46-log | OS: | Linux |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | casting, type conversion | ||
[10 Dec 2008 8:30]
Valeriy Kravchuk
Thank you for a problem report. I do not see anything in the manual (including http://dev.mysql.com/doc/refman/5.0/en/charset-collate-tricky.html) that explains how collation is determined in cases like this (utf8_*_ci vs. utf8_bin), so this is at least a valid documentation request.
[19 Dec 2008 7:07]
Alexander Barkov
A full script repeating the same problem:
DROP TABLE IF EXISTS t1, t2;
CREATE TABLE t1 (
entry_id int(11) NOT NULL auto_increment PRIMARY KEY,
outsource_key varchar(255) character set utf8 collate utf8_bin NOT NULL,
UNIQUE KEY outsource_key (outsource_key)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO t1 VALUES (1,'key1'),(2,'key2'),(10006,'key10006');
CREATE TABLE t2 (
p_id char(10) NOT NULL PRIMARY KEY,
NAME char(89) default NULL,
KEY NAME (NAME)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO t2 VALUES ('key1','name1'),('key2','name2'),('key10006','name10006');
EXPLAIN
SELECT
t1.entry_id,
t1.outsource_key,
t2.*
FROM
t1 JOIN t2
WHERE
(t1.outsource_key = t2.p_id ) AND t1.entry_id =10006;
EXPLAIN
SELECT
t1.entry_id,
t1.outsource_key,
t2.*
FROM
t1 JOIN t2
WHERE
(t2.p_id = t1.outsource_key ) AND t1.entry_id =10006;
And the output is:
+----+-------------+-------+-------+-----------------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | t1 | const | PRIMARY,outsource_key | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | t2 | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using where |
+----+-------------+-------+-------+-----------------------+---------+---------+-------+------+-------------+
+----+-------------+-------+-------+-----------------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | t1 | const | PRIMARY,outsource_key | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | t2 | range | PRIMARY | PRIMARY | 30 | NULL | 1 | Using where |
+----+-------------+-------+-------+-----------------------+---------+---------+-------+------+-------------+
[19 Dec 2008 7:38]
Alexander Barkov
The order of arguments does not affect the way how the collation for the operation is determined. It is always utf8_bin in this case. I think there are two problems here: 1. This part seems to be missing in the manual: When mixing a _ci collation and a _bin collation, and coercibility of the operands are the same, the _bin collation wins. 2. Possibly it is a bug in the optimizer. The plan depends on the order of the arguments. The plan for the first query is not optimal. I think that in both cases the optimizer should choose the second plan.
[5 Jan 2009 19:44]
Paul DuBois
I have added Bar's information about mixing _bin/_ci collations to the tricky-collation section.
[13 Jan 2009 16:15]
Paul DuBois
I am unassigning myself from this report because it has been recategorized as an optimizer report.
[26 Feb 2009 22:09]
Omer Barnir
triage: setting tag to CHECKED (3144) - triaging as an optimization bug

Description: String expression collation determined only by the first equality expression operand. As a result the way MySQL evaluates the expression is depends by the order of the operands. How to repeat: I have two tables: CREATE TABLE `TABLEY` ( `MAIN` char(1) default NULL, `P_ID` char(10) NOT NULL, `NAME` char(89) default NULL, ... PRIMARY KEY (`P_ID`), KEY `NAME` (`NAME`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 CREATE TABLE `TABLEX` ( `ENTRY_ID` int(11) NOT NULL auto_increment, `OUTSOURCE_KEY` varchar(255) character set utf8 collate utf8_bin NOT NULL, ...... PRIMARY KEY (`ENTRY_ID`), UNIQUE KEY `OUTSOURCE_KEY` (`OUTSOURCE_KEY`) ) ENGINE=MyISAM AUTO_INCREMENT=70489 DEFAULT CHARSET=utf8 I have the query: SELECT x.ENTRY_ID, x.OUTSOURCE_KEY, y.* FROM TABLEX x JOIN TABLEY y WHERE (x.OUTSOURCE_KEY = y.P_ID ) AND x.ENTRY_ID =10006 mysql> EXPLAIN SELECT x.ENTRY_ID, x.OUTSOURCE_KEY, y.* FROM TABLEX x JOIN TABLEY y WHERE (x.OUTSOURCE_KEY = y.P_ID ) AND x.ENTRY_ID =10006\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: x type: const possible_keys: PRIMARY,OUTSOURCE_KEY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: y type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 1048582 Extra: Using where 2 rows in set (0.00 sec) In this case MySQL determine that the collation of the expression (x.OUTSOURCE_KEY = y.P_ID ) is utf8_bin hence it needs to convert the TABLEY.P_ID column to utf8_bin. This means full table scan. mysql> mysql> mysql> mysql> EXPLAIN SELECT x.ENTRY_ID, x.OUTSOURCE_KEY, y.* FROM TABLEX x JOIN TABLEY y WHERE (y.P_ID = x.OUTSOURCE_KEY ) AND x.ENTRY_ID =10006\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ds type: const possible_keys: PRIMARY,OUTSOURCE_KEY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: da type: range possible_keys: PRIMARY key: PRIMARY key_len: 30 ref: NULL rows: 1 Extra: Using where 2 rows in set (0.01 sec) In this case MySQL determine that the collation of the expression (y.P_ID = x.OUTSOURCE_KEY ) is utf8 (not bin) hence it needs to convert only the given TABLEX.OUTSOURCE_KEY value to utf8. Suggested fix: Need to determine a rule for String comparison involving different collation and force it.