Bug #57731 | Latvian collation is wrong | ||
---|---|---|---|
Submitted: | 26 Oct 2010 8:08 | Modified: | 18 Jan 2018 13:16 |
Reporter: | Kaspars Foigts | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
Version: | 5.1.52 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | character set, collation |
[26 Oct 2010 8:08]
Kaspars Foigts
[27 Oct 2010 12:01]
Valeriy Kravchuk
This is what I've got: macbook-pro:5.1 openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.1.52-debug Source distribution Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. This software comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to modify and redistribute it under the GPL v2 license Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> SET NAMES 'utf8' COLLATE 'utf8_latvian_ci'; Query OK, 0 rows affected (0.00 sec) mysql> mysql> DROP TABLE IF EXISTS `ctest`; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> CREATE TABLE `ctest` ( -> `keyword` varchar(150) CHARACTER SET utf8 COLLATE utf8_latvian_ci DEFAULT NULL -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_latvian_ci; Query OK, 0 rows affected (0.07 sec) mysql> INSERT INTO `ctest` VALUES ('a'), ('ā'), ('a'), ('ā'), ('e'), ('ē'), ('e'), ('ē'), ('i'), ('ī'), ('i'), ('ī'), ('ņ'), ('n'), ('ņ'), ('o'), ('ō'), ('o'), ('ō'), ('u'), ('ū'), ('u'), ('ū'); Query OK, 23 rows affected (0.00 sec) Records: 23 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM `ctest` ORDER BY `keyword`; +---------+ | keyword | +---------+ | a | | ā | | a | | ā | | e | | ē | | e | | ē | | i | | ī | | i | | ī | | n | | ņ | | ņ | | o | | ō | | o | | ō | | u | | ū | | u | | ū | +---------+ 23 rows in set (0.00 sec)
[27 Oct 2010 18:18]
Peter Gulutzan
This appears to be about a secondary (for ORDER BY) weighting, and MySQL does only primary weighting (for WHERE etc.). See also http://forge.mysql.com/worklog/task.php?id=896. We know that others have had difficulties with Latvian collation, for example Microsoft http://blogs.msdn.com/b/michkap/archive/2010/07/13/10037608.aspx Knowledge of LVS 24:1993? would perhaps be of interest, but we prefer the Unicode Collation Algorithm and the Common Locale Data Repository. The relevant lv.xml file http://www.unicode.org/repos/cldr/tags/release-1-7-1-2/common/collation/lv.xml doesn't show a primary difference. I'll mark this as a feature request.
[28 Oct 2010 11:21]
Alexander Barkov
Multi-level collations are on our TODO: http://forge.mysql.com/worklog/task.php?id=896 In the meanwhile please use this workaround: mysql> SELECT * FROM `ctest` ORDER BY `keyword`, BINARY keyword; +---------+ | keyword | +---------+ | a | | a | | ā | | ā | | e | | e | | ē | | ē | | i | | i | | ī | | ī | | n | | ņ | | ņ | | o | | o | | ō | | ō | | u | | u | | ū | | ū | +---------+ 23 rows in set (0.00 sec)
[28 Oct 2010 11:56]
Kaspars Foigts
You should note, that Alexander's solution: *) does not work on variable byte encodings (as utf8); *) does not use indexes (because of "BINARY `eyword`")
[28 Oct 2010 12:04]
Kaspars Foigts
This one works well: select keyword kw from ctest order by CONVERT(keyword USING ucs2) COLLATE ucs2_bin ASC;
[28 Oct 2010 13:46]
Alexander Barkov
> You should note, that Alexander's solution: > *) does not work on variable byte encodings (as utf8); It does work with utf8. I tested with utf8. > *) does not use indexes (because of "BINARY `eyword`") That's true, index is not used for ORDER BY optimization in this case. > This one works well: > select keyword kw > from ctest > order by CONVERT(keyword USING ucs2) COLLATE ucs2_bin ASC; There is no need for CONVERT(). It gives exactly the same order with: select keyword kw from ctest order by keyword COLLATE utf8_bin ASC; or select keyword kw from ctest order by BINARY keyword ASC; But this will sort small and capital letters separately. The correct query is: select keyword kw from ctest order by keyword, BINARY keyword;
[3 Feb 2016 14:10]
Arnis Juraga
Sorting binary, does not help for variable length strings: ~~~ DROP TABLE IF EXISTS `ctest`; CREATE TABLE `ctest` (`keyword` varchar(150) CHARACTER SET utf8 COLLATE utf8_latvian_ci DEFAULT NULL) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_latvian_ci; INSERT INTO `ctest` (`keyword`) VALUES ('Media'), ('Mobile'), ('Mā'), ('MVAAA'), ('Mēbeļu'), ('Māa'), ('Mās'), ('Marl'), ('Myre'); SELECT * FROM `ctest` ORDER BY `keyword` asc, binary keyword; ~~~ Expected result: ~~~ Marl Mā Māa Mās Media Mēbeļu Mobile MVAAA Myre ~~~ Current result: ~~~ Mā Māa Marl Mās Mēbeļu Media Myre Mobile MVAAA ~~~
[18 Jan 2018 13:16]
Erlend Dahl
[8 Jan 2018 19:41] Xing Z Zhang Fixed in 8.0.1 with the introduction of utf8mb4_lv_0900_as_cs.
[24 Jun 2019 19:00]
Arnis Juraga
Does it require some more specific configuration for MySQL 8? It does not seem to work. I have tested on MySQL 8.0.12 (Test Online Fiddle https://www.db-fiddle.com/f/9phZ7EpRUS4FNRBZRdSYZa/2 ). Using the same test case except added new LV collationi and utf8mb4 sharacters encoding: ~~~ drop table if exists `ctest`; CREATE TABLE `ctest` ( `keyword` varchar(150) CHARACTER SET utf8mb4 COLLATE utf8mb4_lv_0900_as_cs ) DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_lv_0900_as_cs; INSERT INTO `ctest` (`keyword`) VALUES ('Media'), ('Mobile'), ('Mā'), ('MVAAA'), ('Mēbeļu'), ('Māa'), ('Mās'), ('Marl'), ('Myre'); ~~~ Expected result (As Generated by Microsoft Excel 2019 sorting ;) ): ~~~ Marl Mā Māa Mās Media Mēbeļu Mobile MVAAA Myre ~~~ Actual result: ~~~ Mā Māa Marl Mās Mēbeļu Media Myre Mobile MVAAA ~~~
[26 Sep 7:52]
Mārtiņš Ručevskis
Are there any workarounds for this? Provided solutions still aren't sorting correct.