Bug #21581 | unique key not properly working with utf8_*_ci | ||
---|---|---|---|
Submitted: | 11 Aug 2006 12:22 | Modified: | 22 Sep 2006 8:10 |
Reporter: | Justinas Liubinskas | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S4 (Feature request) |
Version: | 5.0.24 | OS: | FreeBSD (freebsd5.4) |
Assigned to: | CPU Architecture: | Any | |
Tags: | collation, lithuanian, unique key, utf8 |
[11 Aug 2006 12:22]
Justinas Liubinskas
[15 Aug 2006 16:36]
MySQL Verification Team
Thank you for the bug report. I consulted the developer responsible for the character stuff, below his comments: According to "The State Commition of the Lithianian Language" site: http://www.vlkk.lt/lithuanian-language/alphabet.htm only "C WITH CARON", "S WITH CARON" and "Z WITH CARON" are separate letters in Lithuanian. Other accented letters are a equal to their non-accented counterparts. Including the letter "U WITH OGONEK" which is equal to "U". utf8_lithuanian_ci is an accent insensitive collation. This is how it was designed. There are no bugs here. It looks like a feature request. The reporter seems to want a new collation: accent sensitive Lithuanian collation for utf8. Also this bug report might related to bug: http://bugs.mysql.com/bug.php?id=20247
[17 Aug 2006 8:53]
Algirdas Brazas
It isn't so indeed. Looking at the same link supplied by You, I can see that Lihuanian letters are sorted not equally. There isn't mentioned about equality of characters You mentuoned.
[17 Aug 2006 9:39]
Justinas Liubinskas
Miguel Solorzano is partly correct, saying that 'i' and 'į' or 'y' are pronounced similar, that is it would be true to say, that collation is accent-insensitive. However, I then would bring in a question, if collation and unique indexing should cover it, because: example 'siusti' and 'siųsti' are existing lithuanian words with different meaning, 'klausimas' and 'klausymas' also has different meaning, and main difference is that, 'ų' and 'y' are pronounced longer than their equivalents 'u' and 'i'.
[5 Sep 2006 11:08]
Domas Mituzas
This can be implemented only as utf8_lithuanian2_ci, it shouldn't modify utf8_lithuanian_ci behavior, which defines proper dictionary sorting, as defined by VLKK (national language authority)
[22 Sep 2006 8:10]
Domas Mituzas
Not a bug: According to LST 1285:1993 ("Pecularities of the lithuanian language") accented vowels in sorting are to be treated same as their non-accented counterparts. binary collations or data types, combined with canonical forms in database may allow to have intended effect - unique constraints based on non-standard ordering schemes.