Bug #71625 | lack of Unicode normalizatiosn also affects string comparison. | ||
---|---|---|---|
Submitted: | 7 Feb 2014 13:21 | Modified: | 18 Jan 2018 13:19 |
Reporter: | Peter Laursen (Basic Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
Version: | any | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[7 Feb 2014 13:21]
Peter Laursen
[7 Feb 2014 13:24]
Peter Laursen
Oops .. first sentence was goofed up! This is a follow-up to 2 bug reports .. I meant!
[7 Feb 2014 18:21]
Sveta Smirnova
Thank you for the report. According to http://collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html this is not a bug: C3AB is not equal to 65CC88
[7 Feb 2014 18:39]
Peter Laursen
If this is so easy why are http://bugs.mysql.com/bug.php?id=71563 and http://bugs.mysql.com/bug.php?id=71564 .. then also not closed as 'not a bug' several days ago? Besides I find it completely ridicolous to reject this bug report with reference to a documentation page that does nothing but document current behavior. It makes no sense as I am complaining about current behavior - ie. lack of any option (a sql_mode, specific collations, whatever) that make it possible to *compare unicode characters as equal* who also *print as equal*. This could be relevant if data are imported to the database from different sources using different ways to encode accented characters. At least please verify as *BOTH* a documentation request ("MySQL charsets and colations do not consider unicode combined charaters - ie. printing a_basic_character + a_backspace + an_accent - as a single character") *AND* a feature request ("there should be an option to compare characters that print the same also to compare as equal in string comparisons as well as deliver the same metadata such as string length").
[7 Feb 2014 18:57]
Sveta Smirnova
Thank you for the feedback. Bug #71563 and bug #71564 speak about wrong results and wrong formatting, but not about wrong sort order. But you are correct: they are technically feature requests still. I can verify this report as feature request "there should be an option to compare characters that print the same also to compare as equal in string comparisons as well as deliver the same metadata such as string length". Please open separate bug report about lack of documentation.
[10 Feb 2014 9:20]
Peter Laursen
Thanks for verification. Being not too proliferant in server internals, I now think (after sleeping on it for a few days) that this is simply a request for collations that handle multiple byte sequences resulting in same character as identical (in string comparisons and in metadata). (BTW: Vietnamese will be a challenge, I think!)
[10 Feb 2014 12:07]
Peter Laursen
Docs request posted at http://bugs.mysql.com/bug.php?id=71656
[18 Jan 2018 13:19]
Erlend Dahl
[17 Jan 2018 23:52] Xing Z Zhang Actually utf8_unicode_ci added in 5.0.44 can compare those kind of strings correctly.