| Bug #16373 | problem with sorting croatian letters | ||
|---|---|---|---|
| Submitted: | 11 Jan 2006 13:56 | Modified: | 19 May 2006 11:28 |
| Reporter: | Tomislav Rajaković | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
| Version: | 5.0.19-BK, 5.0.16 | OS: | Linux (Linux, Windows) |
| Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[11 Jan 2006 17:11]
Valeriy Kravchuk
Thank you for the bug report. Verified just as described on latest 5.0.19-BK on Linux:
mysql> CREATE TABLE words (Word VARCHAR(40) NOT NULL, UNIQUE INDEX(Word(40))) ENGINE=MYISAM CHECKSUM=1 CHARACTER SET latin2 COLLATE latin2_croatian_ci;
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO words (Word) VALUES ('abc');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO words (Word) VALUES ('bbc');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO words (Word) VALUES ('čbc');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO words (Word) VALUES ('ćbc');
Query OK, 1 row affected, 1 warning (0.00 sec)
mysql> INSERT INTO words (Word) VALUES ('zzz');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO words (Word) VALUES ('žzz');
Query OK, 1 row affected, 1 warning (0.01 sec)
mysql> show warnings;
+---------+------+-------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------+
| Warning | 1265 | Data truncated for column 'Word' at row 1 |
+---------+------+-------------------------------------------+
1 row in set (0.00 sec)
mysql> select word from words order by word;
+------+
| word |
+------+
| ??zz |
| �?bc |
| abc |
| čbc |
| bbc |
| zzz |
+------+
6 rows in set (0.00 sec)
mysql> select version();
+-----------+
| version() |
+-----------+
| 5.0.19 |
+-----------+
1 row in set (0.00 sec)
So, there are obvious problems with this collation.
[21 Jan 2006 17:15]
Tomislav Rajaković
yeah,i've forgot smthg....
non-croatian letters are: ("Q","X","Y","W"), i forgot "Q"
[15 Mar 2006 16:27]
Vlatko Šurlan
Have found some info on this, perhaps even a workarround but haven't tested it: http://www.ambra.rs.ba/
[19 May 2006 11:28]
Alexander Barkov
Dear Tomislav, That's true, latin2_croatian_ci.html does not support double letters (know as "contractions"). This is a simplified version, which was intentionally written this way and which provides faster sorting that the version with contractions would do. However I do agree that it would be nice to have the "real" Croation collations. So one will be able to chose between correct sort order (which is a bit slower) and the faster version (which does not support contractions). So I added a "Create real Croatian collations" task into our TODO. Thanks for requesting this feature! About other letters, I don't agree that it sorts most of the letters in wrong order. It does sort all accented letters of their proper places, exactly like you describe: A,B,C,Č,Ć,D,Đ,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,Š,T,U,V,W,X,Y,Z,Ž Please see collation chart here: http://myoffice.izhnet.ru/bar/~bar/charts/latin2_croatian_ci.html If you get letters in a different order, most likely you have misconfigured character set settings. Please start checking with "show variables like 'character_set%'" I'm closing this report as not a bug.

Description: MySQL does not correctly handles croatian's diacryptics letters when sorting ("č","ć","đ","š","ž"). Almoust all letters are on wrong places. How to repeat: CREATE TABLE words ( Word VARCHAR(40) NOT NULL, UNIQUE INDEX(Word(40)), ) ENGINE=MYISAM CHECKSUM=1 CHARACTER SET latin2 COLLATE latin2_croatian_ci; INSERT INTO Words (Word) VALUES ('abc'); INSERT INTO Words (Word) VALUES ('bbc'); INSERT INTO Words (Word) VALUES ('čbc'); INSERT INTO Words (Word) VALUES ('ćbc'); INSERT INTO Words (Word) VALUES ('zzz'); INSERT INTO Words (Word) VALUES ('žzz'); SELECT Word FROM Words ORDER BY Word; Suggested fix: Croatian alphabet has 30 letters, + non croatians("X","Y","W") which we use with foreign phrazes,words so they are some kind part of alphabet and still need their places. Order should be: A,B,C,Č,Ć,D,DŽ,Đ,E,F,G,H,I,J,K,L,LJ,M,N,NJ,O,P,Q,R,S,Š,T,U,V,W,X,Y,Z,Ž note that: "DŽ" is one letter (sound) (same with "LJ" and "NJ" ) consist of two letters "D" and "Ž" and whenever "DŽ","LJ" or "NJ" occurs,we think of it as one letter and there's no exception