Bug #32404 | Cannot obtain accent sensitive czech collation | ||
---|---|---|---|
Submitted: | 15 Nov 2007 11:34 | Modified: | 18 Jan 2018 13:09 |
Reporter: | Petr Slavik | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
Version: | 5.0, 5.1, 6.0 | OS: | Linux (CentOS 4.3) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | accent sensitive, accent sensitivity, collation, cp1250, czech, latin2, utf, utf8 |
[15 Nov 2007 11:34]
Petr Slavik
[15 Nov 2007 11:46]
Sveta Smirnova
Thank you for the reasonable feature request.
[29 Sep 2008 8:13]
Susanne Ebrecht
Many thanks for writing a feature request. Please, can you look for a link where the rules are given.
[29 Sep 2008 9:14]
Susanne Ebrecht
I found rules here: http://cs.wikipedia.org/wiki/Abecední_řazení and at some other places. I will get deeper into this topic an analyze if there is a bug or not.
[29 Sep 2008 11:46]
Susanne Ebrecht
Ok, this comment will get long. The main rules first: Sorting rule: a < á < ä all three treated as 'a' b c č d < ď both treated as d e < é < ě all three treated as e f g h ch i < í both treated as i j k l < ĺ < ľ all three treated as l m n < ň both treated as n o < ó < ô < ö all four treated as o p q r < ŕ both treated as r ř s š t < ť both treated as t u < ú < ů < ü all for treated as u v w x y < ý both treated as y z ž Capitalisation that is different to English: á Á ä Ä č Č ď Ď é É ě Ě í Í ĺ Ĺ ... not used in Czech alphabet... this letter is for Slovak alphabet compatibility ľ Ľ ... not used in Czech alphabet... this letter is for Slovak alphabet compatibility ň Ň ó Ó ö Ö ř Ř š Š ť Ť ú Ú ů Ů ü Ü ý Ý ž Ž í == i is true, but in second phase when string are equal then í > i. See example: 'as' < 'ás' < 'aš' < 'áš' 'ea' < 'ěa' < 'er' < 'ěr'
[29 Sep 2008 12:00]
Susanne Ebrecht
Now the test: Use Terminal environment UTF8: SET NAMES UTF8; CREATE TABLE czechtestlatin2(v VARCHAR(10) CHARACTER SET latin2 COLLATE latin2_czech_cs); CREATE TABLE czechtestutf8(v VARCHAR(10) CHARACTER SET utf8 COLLATE utf8_czech_ci); INSERT INTO czechtestlatin2 VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('x'),('y'),('z'),('s'),('t'),('u'),('v'),('w'), ('á'),('ä'),('č'),('ď'),('é'),('ě'),('ch'),('í'),('ĺ'),('ľ'),('ň'),('ó'),('ô'),('ö'),('ŕ'),('ř'),('š'),('ť'),('ú'),('ü'),('ů'),('ý'),('ž'),('as'),('ás'),('aš'),('áš'),('ea'),('ěa'),('er'),('ěr'); INSERT INTO czechtestutf8 VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('x'),('y'),('z'),('s'),('t'),('u'),('v'),('w'), ('á'),('ä'),('č'),('ď'),('é'),('ě'),('ch'),('í'),('ĺ'),('ľ'),('ň'),('ó'),('ô'),('ö'),('ŕ'),('ř'),('š'),('ť'),('ú'),('ü'),('ů'),('ý'),('ž'),('as'),('ás'),('aš'),('áš'),('ea'),('ěa'),('er'),('ěr'); SELECT * FROM czechtestlatin2 ORDER BY v; SELECT * FROM czechtestutf8 ORDER BY v; Result: correct sorting SELECT * FROM czechtestlatin2 WHERE v LIKE 'i'; +------+ | v | +------+ | i | +------+ SELECT * FROM czechtestutf8 WHERE v LIKE 'i'; +------+ | v | +------+ | i | | í | +------+ SELECT * FROM czechtestlatin2 WHERE v='i'; +------+ | v | +------+ | i | +------+ SELECT * FROM czechtestutf8 WHERE v='i'; +------+ | v | +------+ | i | | í | +------+ SELECT * FROM czechtestlatin2 WHERE v LIKE 'í'; +------+ | v | +------+ | í | +------+ SELECT * FROM czechtestutf8 WHERE v LIKE 'í'; +------+ | v | +------+ | i | | í | +------+ SELECT * FROM czechtestlatin2 WHERE v='í'; +------+ | v | +------+ | í | +------+ SELECT * FROM czechtestutf8 WHERE v='í'; +------+ | v | +------+ | i | | í | +------+ Ok, Czech sorting rule says that in first instance i == í. Also capital for i and í is different (I and Í). Means the different behaviour in latin2_czech_cs and utf8_czech_ci is not because of same capital. The behaviour in SELECT LIKE or '=' should be the same for latin2_czech_cs and utf8_czech_ci. It should not be different and inconsistent. Czech folk is more familiar with behaviour of latin2_czech_cs so utf8_czech_ci should be fixed.
[23 Jun 2011 18:19]
Valeriy Kravchuk
Bug #61615 was marked as a duplicate of this one.
[18 Jan 2018 13:09]
Erlend Dahl
[29 Dec 2017 0:22] Xing Z Zhang From 8.0.1, we have utf8mb4_cs_0900_as_cs which is accent sensitive.