Bug #28872 | strings with different collation cannot be joined! | ||
---|---|---|---|
Submitted: | 4 Jun 2007 11:55 | Modified: | 19 Jun 2007 20:15 |
Reporter: | Peter Laursen (Basic Quality Contributor) | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | any | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[4 Jun 2007 11:55]
Peter Laursen
[5 Jun 2007 11:02]
Sveta Smirnova
Please do not submit the same bug more than once. An existing bug report already describes this very problem. Even if you feel that your issue is somewhat different, the resolution is likely to be the same. Because of this, we hope you add your comments to the original bug instead. Thank you for your interest in MySQL. Additional information why current behaviour is correct is in the comment to bug #28768
[19 Jun 2007 17:33]
Sergei Golubchik
Ok, let me try to explain why joining strings with different character sets is ok, while strings with different collations is not. Example: two strings. One is "Maß", latin1_german2_ci, the other "Mass", utf8_unicode_ci. When you compare them, the first string (in latin1) is converted to utf8 - it is safe, as latin1 is a subset of utf8. Then they are compared using utf8_unicode_ci rules. Both in latin1_german2_ci and in utf8_unicode_ci "ss" == "ß". So using utf8_unicode_ci is correct. Similarly, in your example it was correct to convert a string from latin1 (latin1_danish_ci) to utf8, and use utf8_danish_ci for both strings. A different example - same character set, different collations: one string is "Maß", latin1_german2_ci; the other "Mass", latin1_danish_ci. They're both latin1, but if you compare them using danish rules the strings will be different, if you compare them using german rules they will be equal. The example is symmetrical, there's no reason to prefer danish collation over german or vice versa. Thus, MySQL has no other choice but to return an error. To summarize: it is an error to compare strings in different collations. It is possible to compare strings in different character sets, if one is a superset of the other and both strings use the same (or compatible) collation.
[19 Jun 2007 20:15]
Peter Laursen
I think I have understood what you think! I think it is a silly usablility restriction! motto: smart coders find smart ways :-)