MySQL Bugs: #28872: strings with different collation cannot be joined!

Bug #28872	strings with different collation cannot be joined!
Submitted:	4 Jun 2007 11:55	Modified:	19 Jun 2007 20:15
Reporter:	Peter Laursen (Basic Quality Contributor)	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	any	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
I am definitely not satified with the silly and robotized reply here:
http://bugs.mysql.com/bug.php?id=28768

If strings with different character sets can be joined is is NO LESS THAN SILLY that strings with differenct collations cannot! 

It may not be a bug in the code, but it is a (design) bug in the implementation of collations then!

I am definitely NOT a quality contributor here any more, if I continue to get such replies!

How to repeat:
see old post

Please do not submit the same bug more than once. An existing bug report already describes this very problem. Even if you feel that your issue is somewhat different, the resolution is likely
to be the same. Because of this, we hope you add your comments to the original bug instead.

Thank you for your interest in MySQL.

Additional information why current behaviour is correct is in the comment to bug #28768

Ok, let me try to explain why joining strings with different character sets is ok, while strings with different collations is not.

Example: two strings. One is "Maß", latin1_german2_ci, the other "Mass", utf8_unicode_ci.

When you compare them, the first string (in latin1) is converted to utf8 - it is safe, as latin1 is a subset of utf8. Then they are compared using utf8_unicode_ci rules. Both in latin1_german2_ci and in utf8_unicode_ci "ss" == "ß". So using utf8_unicode_ci is correct.

Similarly, in your example it was correct to convert a string from latin1 (latin1_danish_ci) to utf8, and use utf8_danish_ci for both strings.

A different example - same character set, different collations: one string is "Maß", latin1_german2_ci; the other "Mass", latin1_danish_ci.

They're both latin1, but if you compare them using danish rules the strings will be different, if you compare them using german rules they will be equal. The example is symmetrical, there's no reason to prefer danish collation over german or vice versa. Thus, MySQL has no other choice but to return an error.

To summarize: it is an error to compare strings in different collations. It is possible to compare strings in different character sets, if one is a superset of the other and both strings use the same (or compatible) collation.

I think I have understood what you think!
I think it is a silly usablility restriction!

motto: smart coders find smart ways :-)