Bug #16107 utf8_turkish_ci collation sorts in wrong order
Submitted: 30 Dec 2005 21:17 Modified: 19 Jan 2006 12:09
Reporter: Koray Atalag Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:MySQL 5.0.17-nt via TCP/IP OS:Windows (WinXP)
Assigned to: Hakan Küçükyılmaz CPU Architecture:Any

[30 Dec 2005 21:17] Koray Atalag
Description:
When using utf8_turkish_ci collation with Turkish characters, the sort order is incorrect...

This is a critical issue for my application; and I believe anybody using MySQL with Turkish characters...

I have also tried Latin5 and it also sorts wrong...

Best regards,

Dr. Koray Atalag

How to repeat:
Both with Query Browser and via my application which uses ADO with OLEDB Provider for ODBC with ODBC 3.5 connector, the resultsets are in wrong order.

Special characters of the Turkish Alphabet is not sorted right. When doing a query with ORDER BY clause, the letters ŞİĞÜÇÖ şığüçö appears after the letter "Z".

Suggested fix:
Correct order should be:
A B C Ç D E F G H I İ J K L M N O Ö P R S Ş T U Ü V Y Z
a b c d e f g h ı i j k l m n o ö p r s ş t u ü v y z

Total of 29 letters in TR....
[7 Jan 2006 7:48] Koray Atalag
Hi, I see that the issue was assigned to a Turkish person (at least from the name I can infer this)...When I search the bug DB before posting my bug, I had seen that similar issues (with wrong resolution info!) were filed with some Russian guys...In the future I recommend that you do consult a Turkish person, or better with an organization for getting the correct answers. In this case I would direct you either to Turkish Standardisation Organisation (www.tse.gov.tr) or Turkish Informatics Society (www.tbd.org.tr) or practically to Turkish National Language Institution (www.tdk.gov.tr)

Best regards,

Dr. Koray Atalag
[7 Jan 2006 7:50] Koray Atalag
I just discovered that I had an error in alphabet order:

A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z 

a b c d e f g ğ h ı i j k l m n o ö p r s ş t u ü v y z

Total of 29 Letters....
[17 Jan 2006 23:48] Hakan Küçükyılmaz
Hello Koray and merhaba!

Sorry for replying so late but somehow our bugsystem did not alert me about this bug. Could you please provide us with an example which shows the wrong sorting with your given collation?

Thanks, Hakan
[18 Jan 2006 15:04] Hakan Küçükyılmaz
Sorting does work for me:

CREATE TABLE t1(
  a varchar(4) CHARACTER SET utf8 COLLATE utf8_turkish_ci
);

[12:40] hakan@test>show variables like '%char%';
+--------------------------+--------------------------------------------------+
| Variable_name            | Value                                            |
+--------------------------+--------------------------------------------------+
| character_set_client     | latin1                                           |
| character_set_connection | latin1                                           |
| character_set_database   | latin1                                           |
| character_set_results    | latin1                                           |
| character_set_server     | latin1                                           |
| character_set_system     | utf8                                             |
| character_sets_dir       | /usr/local/mysql-5.0-debug/share/mysql/charsets/ |
+--------------------------+--------------------------------------------------+
7 rows in set (0.02 sec)

5.0.19-debug-log
[12:41] hakan@test>set names utf8;
Query OK, 0 rows affected (0.00 sec)

5.0.19-debug-log
[12:41] hakan@test>show variables like '%char%';
+--------------------------+--------------------------------------------------+
| Variable_name            | Value                                            |
+--------------------------+--------------------------------------------------+
| character_set_client     | utf8                                             |
| character_set_connection | utf8                                             |
| character_set_database   | latin1                                           |
| character_set_results    | utf8                                             |
| character_set_server     | latin1                                           |
| character_set_system     | utf8                                             |
| character_sets_dir       | /usr/local/mysql-5.0-debug/share/mysql/charsets/ |
+--------------------------+--------------------------------------------------+
7 rows in set (0.03 sec)

5.0.19-debug-log

[13:02] hakan@test>insert into t1 values('a'), ('A'), ('b'), ('c'), ('C'), ('B'), ('Ç'), ('ç'), 
('y'), ('Z'), ('ğ');
Query OK, 11 rows affected (0.03 sec)
Records: 11  Duplicates: 0  Warnings: 0

[13:02] hakan@test>select * from t1;
+------+
| a    |
+------+
| a    |
| A    |
| b    |
| c    |
| C    |
| B    |
| Ç   |
| ç   |
| y    |
| Z    |
| ğ   |
+------+
11 rows in set (0.01 sec)

5.0.19-debug-log
[13:03] hakan@test>select * from t1 order by a;
+------+
| a    |
+------+
| a    |
| A    |
| B    |
| b    |
| c    |
| C    |
| Ç   |
| ç   |
| ğ   |
| y    |
| Z    |
+------+
[18 Jan 2006 20:34] Koray Atalag
Merhabalar Hakan,

When I do exactly you did I get the following resultsets:

Query show variables like '%char%', Wed Jan 18 22:29:40 2006
Variable_name	Value
character_set_client	utf8
character_set_connection	utf8
character_set_database	utf8
character_set_results	utf8
character_set_server	utf8
character_set_system	utf8
character_sets_dir	C:\Program Files\MySQL\MySQL Server 5.0\share\charsets\

Query select * from t1, Wed Jan 18 22:31:02 2006
a
a
A
b
c
C
B
Ç
ç
y
Z
ğ

Pretty strange ha? Maybe you do not have the version and platform I am using...
[18 Jan 2006 21:05] Hakan Küçükyılmaz
Sayin Dr. Atalag,

as you can see on my output, I used ORDER BY <column> in my statement. Your query has no ORDER BY clause. So it can't be in any order.

Please provide a full reproducable example. In the current situation it is very hard for us to reconstruct the behaviour you described.

However I will try to get a MySQL 5.0.17 in WinXP and try it again.
[19 Jan 2006 7:27] Koray Atalag
Merhabalar Hakan,

It really works fine on my system if I enter the data via INSERT INTO method....But when I looked at my data which I had imported from M$ SQL Server and Access they seem to have "codepage" problem...SO the original data that I had was not unicode at all....But a very bad thing is I use VB6 with ADO and use OLE DB Provider for ODBC and use ODBC driver of MySQL....When I enter new data from the application same thing happens again! So it looks the problem is on my side with non-proper handling of codepages and unicode...

Sorry for the mess and thanks for your support...I think you all are doing a great job there...

Sevgiler,

Dr. Koray Atalag
[19 Jan 2006 11:30] Hakan Küçükyılmaz
Hi,

do you think that we can close this bug. It looks like the problem is in your application stack and not in the MySQL server.
[19 Jan 2006 12:09] Koray Atalag
Thanks...
[17 Jul 2008 0:57] irfan bahadir
you should use

mysql_query("SET NAMES 'utf8'");
mysql_query("SET CHARACTER SET utf8");
mysql_query("SET COLLATION_CONNECTION = 'utf8_turkish_ci'");

if you use this you will get right order