Bug #55970 | incorrect implementation of sorting in utf8_slovak_ci | ||
---|---|---|---|
Submitted: | 13 Aug 2010 13:19 | Modified: | 18 Aug 2010 8:11 |
Reporter: | Stanislav LOFAJ | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server | Severity: | S4 (Feature request) |
Version: | 6.0.11-alpha, 5.1.49 etc. | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | collation, Contribution, server, slovak |
[13 Aug 2010 13:19]
Stanislav LOFAJ
[13 Aug 2010 18:22]
Sveta Smirnova
Thank you for the report. According to this table: http://www.collation-charts.org/mysql60/mysql604.utf8_slovak_ci.html this is not a bug, but I'll ask our collation experts to look into this report.
[16 Aug 2010 8:36]
Alexander Barkov
CLDR collation description for Slovak
Attachment: sk.xml (text/xml), 1.43 KiB.
[16 Aug 2010 9:01]
Alexander Barkov
utf8_slovak_ci is implemented according to Unicode's Common Locale Data Repository. See the definition file sk.xml in the "Files" section of this bug report. sk.xml has two version of collations. The first version is marked as <collation type="standard"> ... </colation> The second version is marked as: <collation type="standard" draft="true" alt="proposed"> ... </collation> MySQL implements the second version, which says that only letters ä,č,ô,š,ž are separate letters, and the other accented letters have their default Unicode sorting. Oracle agrees: http://www.collation-charts.org/oracle10g/ora10g.EE8MSWIN1250.XSLOVAK.html Microsoft agrees: http://www.collation-charts.org/vista/vista.041B.CP1250.Slovak_Slovakia.html The first version fron sk.xml additionally treats letters đ,ł,ř,ż as separate letters from their non-accented counterparts d,l,r,z. However, non of the two collations say that ď,ť,ň,ĺ,ľ,é,ŕ,ú must be separate letters. So from what I can see you need accent sensitive version of Slovak collation. Accent sensitive collations with good sorting are currently on our TODO and require this task to be done first: http://forge.mysql.com/worklog/task.php?id=896 In the meantime you can define your own version using Index.xml file, which will also redefine the order of the letters ď,ť,ň,ĺ,ľ,é,ŕ,ú.
[16 Aug 2010 9:12]
Alexander Barkov
I just noticed that the latest copy of sk.xml defines only a single collation version: http://unicode.org/cldr/trac/browser/trunk/common/collation/sk.xml which exactly what MySQL implements.
[16 Aug 2010 15:45]
Sveta Smirnova
Thank you for the report. This is feature request. Verifying it as such.