Bug #44131 Binary-mode "order by" returns records in incorrect order for UTF-8 strings
Submitted: 7 Apr 2009 11:15 Modified: 13 Mar 2010 18:06
Reporter: Pin Zhang Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.27-community-nt, 5.0.80, 5.1.33, 4.1, 5.0, 5.1, 6.0 bzr OS:Any (MS Windows, Linux)
Assigned to: Alexander Barkov CPU Architecture:Any

[7 Apr 2009 11:15] Pin Zhang
Description:
When sorting varchar(n) columns, the binary-mode "order by" (i.e. "order by binary(columnName)") takes only first n bytes into consideration when deciding the order of the results.
In the case the column being sorted contains multi-bytes UTF-8 characters, the actual length in bytes of the varchar(n) column can be greater than n. Under that kind of situation, the sorting result is incorrect since the sub-string starts from index n+1 to the end of the string is ignored when doing comparison during the sorting.

How to repeat:
1. configure the database to use UTF-8 character set and utf8-general-ci collation;

2. create a table as:
CREATE TABLE `sorttest` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `name` varchar(10) NOT NULL default '',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `Index_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

3. insert data into the table (each record contains 3 chinese characters, each character is 3-byte long):
INSERT INTO `sorttest` (`id`,`name`) VALUES 
 (2,'一二三01'),
 (3,'一二三09'),
 (4,'一二三02'),
 (5,'一二三08'),
 (6,'一二三11'),
 (7,'一二三91'),
 (8,'一二三21'),
 (9,'一二三81');

4. do query:
SELECT * FROM `sorttest` order by binary(name)

5. the following order of query result will be shown:
"id","name"
2,"一二三01"
3,"一二三09"
4,"一二三02"
5,"一二三08"
6,"一二三11"
8,"一二三21"
9,"一二三81"
7,"一二三91"

where the order:
"id","name"
2,"一二三01"
4,"一二三02"
5,"一二三08"
3,"一二三09"
6,"一二三11"
8,"一二三21"
9,"一二三81"
7,"一二三91"
is expected.

Suggested fix:
Make binary-mode sorting (order by binary(columnName)) for UTF-8 strings behaves as non-binary sorting (order by columnName) does.
[7 Apr 2009 13:37] Peter Laursen
Same for me on server 5.1.33
[7 Apr 2009 14:15] Valeriy Kravchuk
Thank you for the problem report. Please, send the results of:

show variables like 'char%';
show variables like 'coll%';

for the session where you get these results. They are repeatable for me on Mac OS X, but I want to be sure we have the same settings.
[8 Apr 2009 2:59] Pin Zhang
show variables like 'char%';
+--------------------------+---------------------------------------------------------+
| Variable_name            | Value                                                   |
+--------------------------+---------------------------------------------------------+
| character_set_client     | utf8                                                    | 
| character_set_connection | utf8                                                    | 
| character_set_database   | utf8                                                    | 
| character_set_filesystem | binary                                                  | 
| character_set_results    | utf8                                                    | 
| character_set_server     | utf8                                                    | 
| character_set_system     | utf8                                                    | 
| character_sets_dir       | C:\Program Files\MySQL\MySQL Server 5.0\share\charsets\ | 
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)

show variables like 'coll%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci | 
| collation_database   | utf8_general_ci | 
| collation_server     | utf8_general_ci | 
+----------------------+-----------------+
3 rows in set (0.00 sec)
[13 Apr 2009 7:03] Sveta Smirnova
Thank you for the report.

Verified as described.

Workaround: use varchar(15) or bigger.
[14 Apr 2009 6:58] Alexander Barkov
The problem happens because Item_char_typecast() representing
binary(name) reports itself as BINARY(10) instead of BINARY(30).

The same problem can be demonstrated using a simpler test case:

drop table if exists t1;
create table t1 as select binary(_utf8'123') as name;
show create table t1;

And the result is:

CREATE TABLE `t1` (
  `name` varbinary(3) NOT NULL DEFAULT ''
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

The expected result is varbinary(9) in 5.0/5.1,
and varbinary(12) in 6.0 and higher.
[14 Apr 2009 12:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71998
[3 Dec 2009 9:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92613
[3 Dec 2009 9:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92618
[19 Dec 2009 8:28] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091219082307-f3i4fn0tm8trb3c0) (version source revid:alik@sun.com-20091216180721-eoa754i79j4ssd3m) (merge vers: 6.0.14-alpha) (pib:15)
[19 Dec 2009 8:32] Bugs System
Pushed into 5.5.1-m2 (revid:alik@sun.com-20091219082021-f34nq4jytwamozz0) (version source revid:alexey.kopytov@sun.com-20091211164058-ycpe0f20d1c4h1gl) (merge vers: 5.5.0-beta) (pib:15)
[19 Dec 2009 8:36] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20091219082213-nhjjgmphote4ntxj) (version source revid:alik@sun.com-20091216180221-a5ps59gajad3pip9) (pib:15)
[15 Jan 2010 8:59] Bugs System
Pushed into 5.1.43 (revid:joro@sun.com-20100115085139-qkh0i0fpohd9u9p5) (version source revid:bar@mysql.com-20091203092234-dlnrwqzqy06yp4ft) (merge vers: 5.1.42) (pib:16)
[12 Mar 2010 14:12] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:28] Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:43] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[13 Mar 2010 18:06] Paul DuBois
Noted in 5.1.43, 5.5.1, 6.0.14 changelogs.

For a VARCHAR(N) column, ORDER BY BINARY(col_name) sorted using only
the first N bytes of the column, even though column values could be
longer than N bytes if they contained multibyte characters.