Bug #64578 | UTF8 is not UTF8 ? | ||
---|---|---|---|
Submitted: | 7 Mar 2012 7:59 | Modified: | 6 Jul 2012 15:34 |
Reporter: | Miran Cvenkel | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 5.1.59-community-log, 5.1.63, 5.5.23, 5.6.6 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | utf8 |
[7 Mar 2012 7:59]
Miran Cvenkel
[7 Mar 2012 12:42]
Peter Laursen
Try instead: SELECT * FROM _tmp WHERE term = 'Meloe' COLLATE utf8_bin; .. and read about collations: http://dev.mysql.com/doc/refman/5.1/en/charset.html Peter (not a MySQL person)
[7 Mar 2012 12:53]
Peter Laursen
.. but I am not able to decide if the demonstrated behaviour of the Slovenian collation is correct or not. What are common alphabetization rules in Slovenian (in dictionaries, phone books etc)? You probably know better than I!
[7 Mar 2012 16:18]
Sveta Smirnova
Thank you for the report. Please send us output of SHOW VARIABLES LIKE 'col%': collation utf8_slovenian_ci does not have letter é. See http://www.collation-charts.org/mysql60/mysql604.utf8_slovenian_ci.html
[10 Mar 2012 1:49]
Miran Cvenkel
Here it is: collation_connection,utf8_general_ci collation_database,utf8_slovenian_ci collation_server,utf8_slovenian_ci
[10 Mar 2012 1:56]
Miran Cvenkel
I must say, I found 1 extra unexpected record, but someone would/could delete unexpected records this way. Some warning would be appropriate, if possible.
[10 Mar 2012 8:59]
Sveta Smirnova
Thank you for the report. Verified as described.
[10 Mar 2012 8:59]
Sveta Smirnova
test case for MTR
Attachment: bug64578.test (application/octet-stream, text), 410 bytes.
[6 Jul 2012 15:33]
Alexander Barkov
This is not a bug. MySQL utf8_language_ci collations are accent insensitive. They treat accented letter as equal to their non-accented counter parts, unless the language rules say otherwise. "LATIN LETTER E WITH ACUTE" does not have a special rule in Slovenian (it's even not a part of Slovenian alphabet), therefore it follows the default rules and compares as equal to non-accented "LATIN LETTER E".