Bug #33791 Wrong ORDER BY with latin2_czech_cs
Submitted: 10 Jan 2008 11:23 Modified: 28 Jul 2008 20:41
Reporter: Alexander Barkov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any

[10 Jan 2008 11:23] Alexander Barkov
Description:
Secondary level weights seem to be ignored in latin2_czech_cs
when using ORDER BY with filesort. 

At the same time, indexed ORDER BY seems to work fine.

How to repeat:
set names utf8;
create table t1 (
ch varchar(1),
name varchar(64)
) character set latin2 collate latin2_czech_cs;
insert into t1 values (0x4F,'LATIN CAPITAL LETTER O');
insert into t1 values (0xD3,'LATIN CAPITAL LETTER O WITH ACUTE');
insert into t1 values (0xD4,'LATIN CAPITAL LETTER O WITH CURCUMFLEX');
insert into t1 values (0xD6,'LATIN CAPITAL LETTER O WITH DIAERESIS');
insert into t1 values (0xD5,'LATIN CAPITAL LETTER O WITH DOUBLE ACUTE');
insert into t1 values (0x75,'LATIN _SMALL_ LETTER U');
insert into t1 values (0xFA,'LATIN _SMALL_ LETTER U WITH ACUTE');
insert into t1 values (0xF9,'LATIN _SMALL_ LETTER U WITH RING ABOVE');
insert into t1 values (0xFC,'LATIN _SMALL_ LETTER U WITH DIAERESIS');
insert into t1 values (0xFB,'LATIN _SMALL_ LETTER U WITH DOUBLE ACUTE');
select ch,
hex(weight_string(ch level 1)) l1,
hex(weight_string(ch level 2)) l2,
hex(weight_string(ch level 3)) l3,
hex(weight_string(ch level 4)) l4,
name from t1 order by ch;

And the result is:

ch     l1      l2      l3      l4      name
O      9301    2001    0501    4F00    LATIN CAPITAL LETTER O
Ó      9301    2201    0501    D300    LATIN CAPITAL LETTER O WITH ACUTE
Ô      9301    2401    0501    D400    LATIN CAPITAL LETTER O WITH CURCUMFLEX
Ö      9301    2701    0501    D600    LATIN CAPITAL LETTER O WITH DIAERESIS
Ő      9301    2801    0501    D500    LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
ü      9B01    2701    0301    FC00    LATIN _SMALL_ LETTER U WITH DIAERESIS
ů      9B01    2601    0301    F900    LATIN _SMALL_ LETTER U WITH RING ABOVE
ú      9B01    2201    0301    FA00    LATIN _SMALL_ LETTER U WITH ACUTE
u      9B01    2001    0301    7500    LATIN _SMALL_ LETTER U
ű      9B01    2801    0301    FB00    LATIN _SMALL_ LETTER U WITH DOUBLE ACUTE

The letters "u" appear in a wrong order.
The correct order should be according to the "l2" value,
which represents secondary level weight.

Now let's add a key:

alter table t1 add key (ch, name);

And run the "ORDER BY" query again.
And the result is:

ch     l1      l2      l3      l4      name
O      9301    2001    0501    4F00    LATIN CAPITAL LETTER O
Ó      9301    2201    0501    D300    LATIN CAPITAL LETTER O WITH ACUTE
Ô      9301    2401    0501    D400    LATIN CAPITAL LETTER O WITH CURCUMFLEX
Ö      9301    2701    0501    D600    LATIN CAPITAL LETTER O WITH DIAERESIS
Ő      9301    2801    0501    D500    LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
u      9B01    2001    0301    7500    LATIN _SMALL_ LETTER U
ú      9B01    2201    0301    FA00    LATIN _SMALL_ LETTER U WITH ACUTE
ů      9B01    2601    0301    F900    LATIN _SMALL_ LETTER U WITH RING ABOVE
ü      9B01    2701    0301    FC00    LATIN _SMALL_ LETTER U WITH DIAERESIS
ű      9B01    2801    0301    FB00    LATIN _SMALL_ LETTER U WITH DOUBLE ACUTE

All letters are in correct order.

The difference between the two "ORDER BY" queries is that
the former is executed with "Using filesort" method, and the latter
is executed with "Using index" method.

Suggested fix:
Fix the "Using filesort" method to return in the correct order,
like the "Using index" method does.
[18 Jan 2008 20:38] Patrick Crews
cp1250_czech_cs is failing mysql-test for all engines as well.  These two collations are causing the test failures.
[17 Mar 2008 13:14] Alexander Barkov
The same happens with CHAR column as well.
[17 Mar 2008 13:38] Alexander Barkov
When committing, don't forget to activate main.ctype_latin2_ch
which is currently disabled.
[26 Mar 2008 11:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44443

ChangeSet@1.2606, 2008-03-26 15:03:51+04:00, bar@mysql.com +55 -0
  Bug#33791 Wrong ORDER BY with latin2_czech_cs
  Problem: ORDER BY didn't take into account accents and cases
  in multi-level collations (latin2_czech_cs and cp1250_czech_cs).
  This happened because my_strnxfrm_xxx() family functions were
  executed with wrong "nweights" argument, so only primary
  level weights padded by primary weights for SPACE character
  were written into the result. Bug was introduced in 6.0
  with WL#3664 "strnxfrm() changes for prefix keys and NOPAD" (for Falcon).
  Fix: pass correct "nweights" argument, pad each level to "nweights"
  separetely, and additionally pad the result string to generate
  fixed length records.
  Additional changes:
  - New syntax was added:
  WEIGHT_STRING(expr, result_length, nweights, flags)
  which is a great help for testing purposes.
  - my_stnxfrm_xxx() family functions now write partial weights
  if the full weight doesn't fit. This gives a little bit better
  ORDER BY for strings longer than @@max_sort_length.
  - Removing duplicate functions and introducing my_strnxfrm_unicode()
  which is compatible with utf8, utf8mb4, ucs2, utf16, utf32.
[26 Mar 2008 11:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/44444

ChangeSet@1.2606, 2008-03-26 15:26:49+04:00, bar@mysql.com +55 -0
  Bug#33791 Wrong ORDER BY with latin2_czech_cs
  Problem: ORDER BY didn't take into account accents and cases
  in multi-level collations (latin2_czech_cs and cp1250_czech_cs).
  This happened because my_strnxfrm_xxx() family functions were
  executed with wrong "nweights" argument, so only primary
  level weights padded by primary weights for SPACE character
  were written into the result. Bug was introduced in 6.0
  with WL#3664 "strnxfrm() changes for prefix keys and NOPAD" (for Falcon).
  Fix: pass correct "nweights" argument, pad each level to "nweights"
  separetely, and additionally pad the result string to generate
  fixed length records.
  Additional changes:
  - New syntax was added:
  WEIGHT_STRING(expr, result_length, nweights, flags)
  which is a great help for testing purposes.
  - my_stnxfrm_xxx() family functions now write partial weights
  if the full weight doesn't fit. This gives a little bit better
  ORDER BY for strings longer than @@max_sort_length.
  - Removing duplicate functions and introducing my_strnxfrm_unicode()
  which is compatible with utf8, utf8mb4, ucs2, utf16, utf32.
[7 May 2008 9:56] Sergey Vojtovich
Ok to push.
[23 Jul 2008 13:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50318

2739 Georgi Kodinov	2008-07-23
      fixed 5.1-bugteam -> 6.0-bugteam merge problems
[24 Jul 2008 12:19] Alexander Barkov
Pushed into mysql-6.0.7-bugteam.
[24 Jul 2008 14:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50423

2747 Kristofer Pettersson	2008-07-24
      Added files previously roaming the trees with the file extension OTHER causing issues with the conflict resolution system.
[28 Jul 2008 14:45] Bugs System
Pushed into 6.0.7-alpha  (revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (version source revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (pib:3)
[28 Jul 2008 20:41] Paul DuBois
Noted in 6.0.7 changelog.

ORDER BY failed to take into account accents and lettercases in
multi-level collations (latin2_czech_cs and cp1250_czech_cs).
[14 Sep 2008 3:44] Bugs System
Pushed into 6.0.7-alpha  (revid:bar@mysql.com-20080723094350-6s6xru573ie0aqku) (version source revid:v.narayanan@sun.com-20080820070709-nx09bk6qx81osd5s) (pib:3)