Bug #33073 Character sets: ordering fails with utf32
Submitted: 7 Dec 2007 19:38 Modified: 28 Jul 2008 20:45
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:6.0.5-alpha-debug OS:Linux (SUSE 10 64-bit)
Assigned to: Alexander Barkov CPU Architecture:Any

[7 Dec 2007 19:38] Peter Gulutzan
Description:
I create a table with a utf32 column.
I insert four rows.
I select with an ORDER BY clause.
The results aren't in order.

I see no similar problem with other character sets.

How to repeat:
mysql> create table t (s1 varchar(2) character set utf32) engine=myisam;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t values ('ab'),('AE'),('ab'),('AE');
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from t order by s1;
+------+
| s1   |
+------+
| ab   |
| AE   |
| ab   |
| AE   |
+------+
4 rows in set (0.00 sec)
[8 Dec 2007 19:25] Sveta Smirnova
Thank you for the report.

Verified as described.
[23 Jul 2008 5:18] Alexander Barkov
This demo script repeats the same problem with utf16:

mysql> create table t1 (s1 varchar(2) character set utf16 not null);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values ('ab'),('AE'),('ab'),('AE');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> set max_sort_length=4;select * from t1 order by s1;
Query OK, 0 rows affected (0.00 sec)

+----+
| s1 |
+----+
| ab | 
| AE | 
| ab | 
| AE | 
+----+
4 rows in set (0.00 sec)

Note, 4 bytes should be enough to sort 2-character-long utf16 strings.
[23 Jul 2008 5:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50251

2737 Alexander Barkov	2008-07-23
      Bug#33073 Character sets: ordering fails with utf32
      Problem: un-indexed ORDER BY didn't work on short UTF32 columns,
      and didn't work with UTF16 columns with short max_sort_length value.
      Fix: condition to break the loop in my_strnxfrm_utf16/_utf32 was
      incorrect - the very last character didn't get into the output sort string.
      
      Adding tests:
        mysql-test/r/ctype_utf16.result
        mysql-test/r/ctype_utf32.result
        mysql-test/t/ctype_utf16.test
        mysql-test/t/ctype_utf32.test
      
      Fixing loop break condition:
        strings/ctype-ucs2.c
[24 Jul 2008 12:21] Alexander Barkov
A fix for "Bug#33791 Wrong ORDER BY with latin2_czech_cs"
fixed this problem as well.
Will recommit a new patch shortly, consisting of the tests only.
No code change required anymore.
[24 Jul 2008 12:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50401

2744 Alexander Barkov	2008-07-24
      Bug#33073 Character sets: ordering fails with utf32
      
      A fix for Bug # 33791 "Wrong ORDER BY with latin2_czech_cs"
      fixed this problem as well.
      This patch consists of test cases only.
      
      No code change required anymore.
      
      Adding tests:
        mysql-test/r/ctype_utf16.result
        mysql-test/r/ctype_utf32.result
        mysql-test/t/ctype_utf16.test
        mysql-test/t/ctype_utf32.test
[24 Jul 2008 12:24] Alexander Barkov
Pushed into mysql-6.0.7-bugteam.
[28 Jul 2008 14:45] Bugs System
Pushed into 6.0.7-alpha  (revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (version source revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (pib:3)
[28 Jul 2008 20:45] Paul DuBois
Noted in 6.0.7 changelog.

Unindexed ORDER BY did not work on short utf32 columns, or on utf16
columns with a short max_sort_length value.
[14 Sep 2008 5:13] Bugs System
Pushed into 6.0.7-alpha  (revid:bar@mysql.com-20080724121524-1kibpsdxiit19l9j) (version source revid:john.embretsen@sun.com-20080724122511-9c0oudz1xrdrs6y6) (pib:3)