Bug #55797 Collation utf8_polish_ci seems to not work when sorting
Submitted: 6 Aug 2010 8:43 Modified: 6 Aug 2010 12:27
Reporter: Łukasz Jarochowski Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.1.41 OS:Any
Assigned to: CPU Architecture:Any

[6 Aug 2010 8:43] Łukasz Jarochowski
Description:
I create new database, and insert some records in it - and it seems, that sorting doesn't work. It's strange, because it's a vannila install on Ubuntu 8.10.

It should return rows properly sorted but it doesn't.

How to repeat:
lukasz@lukasz-desktop:~$ locale
LANG=pl_PL.utf8
LANGUAGE=pl_PL:pl:en_GB:en
LC_CTYPE="pl_PL.utf8"
LC_NUMERIC="pl_PL.utf8"
LC_TIME="pl_PL.utf8"
LC_COLLATE="pl_PL.utf8"
LC_MONETARY="pl_PL.utf8"
LC_MESSAGES="pl_PL.utf8"
LC_PAPER="pl_PL.utf8"
LC_NAME="pl_PL.utf8"
LC_ADDRESS="pl_PL.utf8"
LC_TELEPHONE="pl_PL.utf8"
LC_MEASUREMENT="pl_PL.utf8"
LC_IDENTIFICATION="pl_PL.utf8"
LC_ALL=

lukasz@lukasz-desktop:~$ mysql -V
mysql  Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (i486) using readline 6.1

lukasz@lukasz-desktop:~$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 47
Server version: 5.1.41-3ubuntu12.6 (Ubuntu)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (i486) using readline 6.1

Connection id:		46
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.1.41-3ubuntu12.6 (Ubuntu)
Protocol version:	10
Connection:		Localhost via UNIX socket
Client characterset:	latin1
Server characterset:	latin1
UNIX socket:		/var/run/mysqld/mysqld.sock
Uptime:			1 hour 43 sec

Threads: 1  Questions: 253  Slow queries: 0  Opens: 807  Flush tables: 1  Open tables: 64  Queries per second avg: 0.69
--------------

mysql> set names utf8 collate utf8_polish_ci;
Query OK, 0 rows affected (0.00 sec)

mysql> create database test default character set utf8 default collate utf8_polish_ci;
Query OK, 1 row affected (0.00 sec)

mysql> show create database test;
CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_polish_ci */
1 row in set (0.00 sec)

mysql> create table test ( x varchar(255) character set utf8 collate utf8_polish_ci ) engine=myisam charset utf8 collate utf8_polish_ci;
Query OK, 0 rows affected (0.04 sec)

mysql> show create table test;

CREATE TABLE `test` (
  `x` varchar(255) COLLATE utf8_polish_ci DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_polish_ci |

1 row in set (0.00 sec)

mysql> insert into test values ('zarządzenie a'), ('zarzadzenie a'), ('zarządzenie b'), ('zarzadzenie b');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from test order by x;
+----------------+
| x              |
+----------------+
| zarzadzenie a  |
| zarzadzenie b  |
| zarządzenie a |
| zarządzenie b |
+----------------+
4 rows in set (0.00 sec)

mysql> select * from test order by x collate utf8_polish_ci;
+----------------+
| x              |
+----------------+
| zarzadzenie a  |
| zarzadzenie b  |
| zarządzenie a |
| zarządzenie b |
+----------------+
4 rows in set (0.00 sec)
[6 Aug 2010 8:44] Łukasz Jarochowski
I have Ubuntu 10.04.1 LTS instead of 8.10.
[6 Aug 2010 10:16] Susanne Ebrecht
Hello Lukaz,

| zarzadzenie a  |
| zarzadzenie b  |
| zarządzenie a |
| zarządzenie b |

I am not able to see what is wrong here.

mysq1> select * from test order by x;
+------+
| x    |
+------+
| a    |
| ą   |
| b    |
| c    |
| k    |
| l    |
| ł   |
| o    |
| ó   |
| z    |
+------+

zarza comes before zarzą

the sorting is totally correct.

To make it more clear for my next example I just too 'a' and 'b':

select * from test order by x;
+---------+
| x       |
+---------+
| abcad a |
| abcad b |
| abcbd a |
| abcbd b |
+---------+

This is totally correct first sort all which starts with abca and then all which starts with abcb.

Same in your example:
first all will get sorted starting with zarza and then all starting with zarzą

I am really not able to see the bug here.
[6 Aug 2010 12:27] Łukasz Jarochowski
My bad - it is of course perfectly ok, i was suggested by our client and didn't check assuming his right, lol. Sorry :)