| Bug #57731 | Latvian collation is wrong | ||
|---|---|---|---|
| Submitted: | 26 Oct 2010 8:08 | Modified: | 18 Jan 2018 13:16 | 
| Reporter: | Kaspars Foigts | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) | 
| Version: | 5.1.52 | OS: | Any | 
| Assigned to: | Assigned Account | CPU Architecture: | Any | 
| Tags: | character set, collation | ||
   [27 Oct 2010 12:01]
   Valeriy Kravchuk        
  This is what I've got:
macbook-pro:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.1.52-debug Source distribution
Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SET NAMES 'utf8' COLLATE 'utf8_latvian_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> 
mysql> DROP TABLE IF EXISTS `ctest`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> CREATE TABLE `ctest` (
    ->   `keyword` varchar(150) CHARACTER SET utf8 COLLATE utf8_latvian_ci DEFAULT NULL
    -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_latvian_ci;
Query OK, 0 rows affected (0.07 sec)
mysql> INSERT INTO `ctest` VALUES ('a'), ('ā'), ('a'), ('ā'),  ('e'), ('ē'), ('e'), ('ē'),  ('i'), ('ī'), ('i'), ('ī'),  ('ņ'), ('n'), ('ņ'), ('o'), ('ō'), ('o'), ('ō'),  ('u'), ('ū'), ('u'), ('ū');
Query OK, 23 rows affected (0.00 sec)
Records: 23  Duplicates: 0  Warnings: 0
mysql> SELECT * FROM `ctest` ORDER BY `keyword`;
+---------+
| keyword |
+---------+
| a       |
| ā      |
| a       |
| ā      |
| e       |
| ē      |
| e       |
| ē      |
| i       |
| ī      |
| i       |
| ī      |
| n       |
| ņ      |
| ņ      |
| o       |
| ō      |
| o       |
| ō      |
| u       |
| ū      |
| u       |
| ū      |
+---------+
23 rows in set (0.00 sec)
 
   [27 Oct 2010 18:18]
   Peter Gulutzan        
  This appears to be about a secondary (for ORDER BY) weighting, and MySQL does only primary weighting (for WHERE etc.). See also http://forge.mysql.com/worklog/task.php?id=896. We know that others have had difficulties with Latvian collation, for example Microsoft http://blogs.msdn.com/b/michkap/archive/2010/07/13/10037608.aspx Knowledge of LVS 24:1993? would perhaps be of interest, but we prefer the Unicode Collation Algorithm and the Common Locale Data Repository. The relevant lv.xml file http://www.unicode.org/repos/cldr/tags/release-1-7-1-2/common/collation/lv.xml doesn't show a primary difference. I'll mark this as a feature request.
   [28 Oct 2010 11:21]
   Alexander Barkov        
  Multi-level collations are on our TODO: http://forge.mysql.com/worklog/task.php?id=896 In the meanwhile please use this workaround: mysql> SELECT * FROM `ctest` ORDER BY `keyword`, BINARY keyword; +---------+ | keyword | +---------+ | a | | a | | ā | | ā | | e | | e | | ē | | ē | | i | | i | | ī | | ī | | n | | ņ | | ņ | | o | | o | | ō | | ō | | u | | u | | ū | | ū | +---------+ 23 rows in set (0.00 sec)
   [28 Oct 2010 11:56]
   Kaspars Foigts        
  You should note, that Alexander's solution: *) does not work on variable byte encodings (as utf8); *) does not use indexes (because of "BINARY `eyword`")
   [28 Oct 2010 12:04]
   Kaspars Foigts        
  This one works well: select keyword kw from ctest order by CONVERT(keyword USING ucs2) COLLATE ucs2_bin ASC;
   [28 Oct 2010 13:46]
   Alexander Barkov        
  > You should note, that Alexander's solution: > *) does not work on variable byte encodings (as utf8); It does work with utf8. I tested with utf8. > *) does not use indexes (because of "BINARY `eyword`") That's true, index is not used for ORDER BY optimization in this case. > This one works well: > select keyword kw > from ctest > order by CONVERT(keyword USING ucs2) COLLATE ucs2_bin ASC; There is no need for CONVERT(). It gives exactly the same order with: select keyword kw from ctest order by keyword COLLATE utf8_bin ASC; or select keyword kw from ctest order by BINARY keyword ASC; But this will sort small and capital letters separately. The correct query is: select keyword kw from ctest order by keyword, BINARY keyword;
   [3 Feb 2016 14:10]
   Arnis Juraga        
  Sorting binary, does not help for variable length strings:
~~~
DROP TABLE IF EXISTS `ctest`;
CREATE TABLE `ctest` (`keyword` varchar(150) CHARACTER SET utf8 COLLATE utf8_latvian_ci DEFAULT NULL) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_latvian_ci;
INSERT INTO `ctest` (`keyword`) VALUES ('Media'), ('Mobile'), ('Mā'), ('MVAAA'), ('Mēbeļu'), ('Māa'), ('Mās'), ('Marl'), ('Myre');
SELECT * FROM `ctest` ORDER BY `keyword` asc, binary keyword;
~~~
Expected result: 
~~~
Marl
Mā
Māa
Mās
Media
Mēbeļu
Mobile
MVAAA
Myre
~~~
Current result:
~~~
Mā
Māa
Marl
Mās
Mēbeļu
Media
Myre
Mobile
MVAAA
~~~
 
   [18 Jan 2018 13:16]
   Erlend Dahl        
  [8 Jan 2018 19:41] Xing Z Zhang Fixed in 8.0.1 with the introduction of utf8mb4_lv_0900_as_cs.
   [24 Jun 2019 19:00]
   Arnis Juraga        
  Does it require some more specific configuration for MySQL 8? It does not seem to work. I have tested on MySQL 8.0.12 (Test Online Fiddle https://www.db-fiddle.com/f/9phZ7EpRUS4FNRBZRdSYZa/2 ). Using the same test case except added new LV collationi and utf8mb4 sharacters encoding: ~~~ drop table if exists `ctest`; CREATE TABLE `ctest` ( `keyword` varchar(150) CHARACTER SET utf8mb4 COLLATE utf8mb4_lv_0900_as_cs ) DEFAULT CHARSET = utf8mb4 COLLATE = utf8mb4_lv_0900_as_cs; INSERT INTO `ctest` (`keyword`) VALUES ('Media'), ('Mobile'), ('Mā'), ('MVAAA'), ('Mēbeļu'), ('Māa'), ('Mās'), ('Marl'), ('Myre'); ~~~ Expected result (As Generated by Microsoft Excel 2019 sorting ;) ): ~~~ Marl Mā Māa Mās Media Mēbeļu Mobile MVAAA Myre ~~~ Actual result: ~~~ Mā Māa Marl Mās Mēbeļu Media Myre Mobile MVAAA ~~~
   [26 Sep 2024 7:52]
   Mārtiņš Ručevskis        
  Are there any workarounds for this? Provided solutions still aren't sorting correct.


Description: This sorting-related problem has bugged me (and other fellow Latvian developers) for quite some time and there is no explanation on why this should not work. In Latvian language there is specific order of characters. For example, 'a', 'ā', 'b', 'c', 'č', etc. MySQL does not get this sorting order right. For example, 'ā' should follow 'a', when sorted ASCending, but MySQL displays them in arbitrary order, as if they were identical (similar) characters. This inconsistency applies only to vowels - a, ā, e, ē, i, ī, o, ō, u, ū. How to repeat: SET NAMES 'utf8' COLLATE 'utf8_latvian_ci'; DROP TABLE IF EXISTS `ctest`; CREATE TABLE `ctest` ( `keyword` varchar(150) CHARACTER SET utf8 COLLATE utf8_latvian_ci DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_latvian_ci; INSERT INTO `ctest` VALUES ('a'), ('ā'), ('a'), ('ā'), -- ('b'), ('c'), ('č'), ('c'), ('č'), ('d'), ('e'), ('ē'), ('e'), ('ē'), -- ('f'), ('g'), ('ģ'),('g'), ('ģ'), ('h'), ('i'), ('ī'), ('i'), ('ī'), -- ('j'), ('k'), ('ķ'), ('k'), ('ķ'), ('l'), ('ļ'), ('l'), ('ļ'), ('m'), ('n'), ('ņ'), ('n'), ('ņ'), ('o'), ('ō'), ('o'), ('ō'), -- ('p'), ('r'), ('ŗ'), ('r'), ('ŗ'), ('s'), ('š'), ('s'), ('š'), ('t'), ('u'), ('ū'), ('u'), ('ū'), -- ('v'), ('z'), ('ž'), ('z'), ('ž') ; SELECT * FROM `ctest` ORDER BY `keyword`; Actual output: a,ā,a,ā,e,ē,e,ē,i,ī,i,ī,o,ō,o,ō,u,ū,u,ū Expected output: a,a,ā,ā,e,e,ē,ē,i,i,ī,ī,o,o,ō,ō,u,u,ū,ū Suggested fix: Fix collation tables for Latvian?