Bug #28768 strings with different collations can't be JOIN'ed
Submitted: 30 May 2007 9:00 Modified: 5 Jun 2007 11:53
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: General Severity:S3 (Non-critical)
Version:5.0.37/41 - , 5.1, 4.1 OS:Any
Assigned to: CPU Architecture:Any

[30 May 2007 9:00] Peter Laursen
Description:
a SELECT query with a JOIN on two string columns with diffrent collations raises the 'illegal mix of collations' error.  However no such eror occurs if columns are different charset or even type!

I realize that there are more bugs reported (and confirmed) with the error ''illegal mix of collations'.  But I believe this is a new 'incarnation' or 'materialisation'.

How to repeat:
CREATE TABLE `parent` (`id_char` varchar(50) CHARACTER SET utf8 COLLATE utf8_danish_ci DEFAULT NULL  
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `parent_2` (`id_char` varchar(50) CHARACTER SET latin1 COLLATE latin1_danish_ci DEFAULT NULL  
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `child` (                                           
          `parent_id_no` bigint(20) DEFAULT NULL,                        
          `parent_id_char` varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL     
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

/* same charset - different collation */
SELECT
    child.id
    , parent.id_char
FROM
    test.child
    INNER JOIN test.parent 
        ON (child.parent_id_char = parent.id_char);
/*
Error Code : 1267
Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_danish_ci,IMPLICIT) for operation '='
*/

/* different charset - different collation */
SELECT
    child.id
    ,parent_2.id_char
FROM
    test.child
    INNER JOIN test.parent_2 
        ON (child.parent_id_char = parent_2.id_char);
/* no error */

/* different datatypes (number and string) */
SELECT
    child.id
    , parent.id_char
FROM
    test.child
    INNER JOIN test.parent 
        ON (child.parent_id_no = parent.id_char);
/* no error! */ 

Suggested fix:
allow for JOIN'ing strings with different collations
[30 May 2007 10:10] Sveta Smirnova
Thank you for the report.

Verified as described.

All versions are affected.
[4 Jun 2007 10:54] Sergei Golubchik
It's not a bug. You cannot compare strings that use different collations. The rules of comparison are defined by the collation.

In the second example both queries are compared using Danish rules.

In the third example values are compared as floating point numbers.
[5 Jun 2007 11:00] Sveta Smirnova
Additional description of this behaviour at http://dev.mysql.com/doc/refman/5.1/en/charset-metadata.html:

Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “subset” of Unicode. Because it is a well-known principle that “what applies to a superset can apply to a subset,” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.

For example, following code will fails:

CREATE TABLE `parent_2` (`id_char` varchar(50) CHARACTER SET latin1 COLLATE
latin1_spanish_ci DEFAULT NULL  
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `child` (                                           
          `parent_id_no` bigint(20) DEFAULT NULL,                        
          `parent_id_char` varchar(50) CHARACTER SET cp1251 COLLATE
cp1251_general_ci  DEFAULT NULL     
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

		
SELECT
    child.parent_id_no
    ,parent_2.id_char
FROM
    test.child
    INNER JOIN test.parent_2 
        ON (child.parent_id_char = parent_2.id_char);
[5 Jun 2007 11:53] Peter Laursen
This way of arguing just reduces to 'this is the way it is because that is the way it is' in my opinion.

CREATE TABLE `parent` (                                   
          `parent` varchar(20) character set latin5 default NULL  
        ) ENGINE=InnoDB DEFAULT CHARSET=latin1   

CREATE TABLE `child`                                                         
          `child` varchar(1120) character set latin1 collate latin1_german1_ci default NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1   

SELECT
    parent.parent
FROM
    peter.child
    INNER JOIN peter.parent 
        ON (child.child = parent.parent);

/* Error Code : 1267
Illegal mix of collations (latin1_german1_ci,IMPLICIT) and (latin5_turkish_ci,IMPLICIT) for operation '=' */

also after changing the Turkish table to latin1_spanish_ci

*/Error Code : 1267
Illegal mix of collations (latin1_german1_ci,IMPLICIT) and (latin1_spanish_ci,IMPLICIT) for operation '='*/

You can join a unicode and a non-unicode table.  You cannot join two unicode tables with different collations or two non-unicode tables with different charset or collation.

I still think it is inconsistent (and the above attempt to make it consistent is very abstract) but more important an unnessary limitation in usability. A very practical example: a procut catalogue where the name and details of the product is available in different languages in different tables or columns.  You would of course want to use a collation for each that is 'natural' for each language.  But then there are unnessary limitations on what queries can be done in practice.

I cannot understand what could possible go wrong with the query when collations are different (though it may affect the efficiency of that query if index performance is affected).  Actually I could better understand if different charsets could not be joined.  The binary/hex pattern of characters' mapping is not affected by collations - unlike charsets.  But I do not know how relevant that is!

I think I would be able to find at least 50 situations where MySQL is more 'relaxed' than standard SQL when 'nothing would go wrong with it'.
[26 Jun 2010 7:41] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (pib:16)
[4 Aug 2010 8:10] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:13] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:14] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:15] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:16] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:17] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:26] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 9:00] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:20)