Bug #65899 "order by" clause returns wrong order result data (BIG5 COLLATE big5_chinese_ci)
Submitted: 14 Jul 2012 15:19 Modified: 18 Jul 2012 5:22
Reporter: Michael Lee Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.x, 5.5.26 OS:Any
Assigned to: CPU Architecture:Any
Tags: BIG5, big5_chinese_ci, order by, sorting

[14 Jul 2012 15:19] Michael Lee
Description:
This bug will only occur on BIG5 COLLATE big5_chinese_ci.
If I change the charset/collate to "ucs2_unicode_ci" or "utf8_unicode_ci", the same SQL statement will return a dataset with correct order (under the same server data).

However, there is no workaround if I don't change the charset.

How to repeat:
DROP TABLE IF EXISTS `test_sort`;

CREATE TABLE `test_sort` (
`id` INT NOT NULL,
`p_grp` VARCHAR( 30 ) CHARACTER SET big5 COLLATE big5_chinese_ci NOT NULL default '',
 PRIMARY KEY (id)
) ENGINE = InnoDB DEFAULT CHARSET=big5;

insert into test_sort (id,p_grp) values 
(1,'上午好'),
(2,'下午好'),
(3,'上午好'),
(4,'下午好'),
(5,'上午好'),
(6,'下午好');

select * from test_sort order by p_grp;

Suggested fix:
Storage engine does not affect the result (MyISAM tested)

It seems that '上' and '下' have the same order in "big5_chinese_ci".
[14 Jul 2012 15:52] Valeriy Kravchuk
Thank you for the problem report. Verified with 5.5.26 on Mac OS X:

macbook-pro:5.5 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.5.26-debug Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE `test_sort` (
    -> `id` INT NOT NULL,
    -> `p_grp` VARCHAR( 30 ) CHARACTER SET big5 COLLATE big5_chinese_ci NOT NULL default '',
    ->  PRIMARY KEY (id)
    -> ) ENGINE = InnoDB DEFAULT CHARSET=big5;
Query OK, 0 rows affected (0.86 sec)

mysql> 
mysql> insert into test_sort (id,p_grp) values 
    -> (1,'上午好'),
    -> (2,'下午好'),
    -> (3,'上午好'),
    -> (4,'下午好'),
    -> (5,'上午好'),
    -> (6,'下午好');
Query OK, 6 rows affected (1.86 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> 
mysql> select * from test_sort order by p_grp;
+----+-----------+
| id | p_grp     |
+----+-----------+
|  1 | 上午好    |
|  2 | 下午好    |
|  3 | 上午好    |
|  4 | 下午好    |
|  5 | 上午好    |
|  6 | 下午好    |
+----+-----------+
6 rows in set (0.12 sec)

mysql> explain select * from test_sort order by p_grp;+----+-------------+-----------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-----------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | test_sort | ALL  | NULL          | NULL | NULL    | NULL |    6 | Using filesort |
+----+-------------+-----------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.04 sec)
[17 Jul 2012 13:44] Arnaud Adant
>However, there is no workaround if I don't change the charset.

You can still use this query as a workaround :

select * from test_sort order by p_grp collate big5_bin;
[18 Jul 2012 5:22] Michael Lee
The statement "select * from test_sort order by p_grp collate big5_bin" only works for pure Chinese characters(the example data).

I need the collate "big5_chinese_ci" to get case insensitive result.

Anyway, thanks for your suggestion.