Bug #36509 Wrong sorting of UTF8 columns
Submitted: 5 May 2008 15:46 Modified: 5 May 2008 19:45
Reporter: Victor Porton Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.45 OS:Any
Assigned to: CPU Architecture:Any

[5 May 2008 15:46] Victor Porton
Description:
The order of sorting utf8 columns is wrong.

How to repeat:
Use the below script (it should be encoded as UTF-8). It outputs (Russian) words in wrong order:

Италия
Бельгия
Германия

Should be:

Бельгия
Германия
Италия

#!/usr/bin/perl -w -T

# WARNING: This script REMOVES the table `test`!!!

# Replace this with your DB connect code:
BEGIN { $ENV{'DOCUMENT_ROOT'} =~ /^(.*)$/; require "$1/../data/perl/Base.pm"; }
use Common;
my $dbh = connect_to_db;

$dbh->do("DROP TABLE IF EXISTS `test`");
$dbh->do(qq{
CREATE TABLE `test` (
  `name` varchar(255) collate utf8_bin NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
});

$dbh->do("INSERT test (name) VALUES('Бельгия'),('Германия'),('Италия')");

my $col = $dbh->selectcol_arrayref("SELECT name FROM test ORDER BY name");

print "Content-Type: text/plain; charset=utf8\n\n";
print join("\n", @$col);

Suggested fix:
"ORDER BY CONVERT(name USING latin1)" probably works (at least with the case of the three given words, not checked generally).
[5 May 2008 15:55] Victor Porton
The script which causes the bug

Attachment: test.pl (application/x-perl, text), 667 bytes.

[5 May 2008 18:35] Peter Laursen
in *pure SQL* and server 5.0.51b I cannot reproduce the problems.

CREATE TABLE `test` (
  `name` varchar(255) collate utf8_bin NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT into test VALUES ('Бельгия'),('Германия'),('Италия');
SELECT name FROM test ORDER BY name;

/* returns

name            
----------------
Бельгия  
Германия
Италия   
*/

Петер :-)
(not a MySQL person)
[5 May 2008 19:12] Sveta Smirnova
Thank you for the report.

Please add following code to the end of your script and provide us its output from your environment:

$col = $dbh->selectall_arrayref("show variables like 'char%'");

foreach my $c (@$col) {
	print join("---", @$c);
	print("\n");
}
[5 May 2008 19:38] Victor Porton
Modified script

Attachment: test.pl (application/x-perl, text), 830 bytes.

[5 May 2008 19:39] Victor Porton
The output of modified script (see "Files" section):

Италия
Бельгия
Германия
-----------
character_set_client---latin1
character_set_connection---latin1
character_set_database---utf8
character_set_filesystem---binary
character_set_results---latin1
character_set_server---latin1
character_set_system---utf8
character_sets_dir---/usr/share/mysql/charsets/
[5 May 2008 19:45] Sveta Smirnova
Thank you for the feedback.

character_set_client---latin1
character_set_connection---latin1
...
character_set_results---latin1

from the output shows you use latin1 client encoding with data in utf8. You should specify utf8 as character_set_client, character_set_connection, character_set_results for your application.

So I close the report as "Not a Bug".