Bug #41895 utf8_unicode_ci inconsistent sorting
Submitted: 6 Jan 2009 13:58 Modified: 14 Jan 2009 9:07
Reporter: Hrvoje Novosel Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.1.30 OS:Any
Assigned to: CPU Architecture:Any
Tags: sorting, utf8_unicode_ci

[6 Jan 2009 13:58] Hrvoje Novosel
Description:
When sorting a varchar field with utf8_unicode_ci collation the results are different when filesort is used and when index is used.

How to repeat:
DROP TABLE IF EXISTS abcd;
CREATE TABLE IF NOT EXISTS abcd (
  letter varchar(1) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO abcd (letter) VALUES
('c'),
('C'),
('č'),
('Č'),
('ć'),
('Ć'),
('d'),
('D'),
('đ'),
('Đ'),
('e'),
('E'),
('s'),
('S'),
('š'),
('Š'),
('t'),
('T'),
('z'),
('Z'),
('ž'),
('Ž');

executing: select * from abcd order by letter;
will give:
+--------+
| letter |
+--------+
| c      |
| Ć      |
| ć      |
| Č      |
| č      |
| C      |
| d      |
| D      |
| đ      |
| Đ      |
| E      |
| e      |
| s      |
| S      |
| š      |
| Š      |
| T      |
| t      |
| z      |
| Z      |
| ž      |
| Ž      |
+--------+

now add an index:  ALTER TABLE abcd ADD INDEX ( letter );
and select again with: select * from abcd order by letter;
will give:
+--------+
| letter |
+--------+
| c      |
| C      |
| č      |
| Č      |
| ć      |
| Ć      |
| d      |
| D      |
| đ      |
| Đ      |
| e      |
| E      |
| s      |
| S      |
| š      |
| Š      |
| t      |
| T      |
| z      |
| Z      |
| ž      |
| Ž      |
+--------+

(Notice first 6 letters on both)
[14 Jan 2009 9:07] Sveta Smirnova
Thank you for the report.

According to http://www.collation-charts.org/mysql60/mysql604.utf8_unicode_ci.european.html the letters ('c'),('C'),('č'),('Č'),('ć'),('Ć') are equal in utf8_unicode_ci. Same for ('e'), ('E') and ('t'), ('T') This means order correct in both cases. So I close the report as "Not a Bug"