| Bug #36509 | Wrong sorting of UTF8 columns | ||
|---|---|---|---|
| Submitted: | 5 May 2008 15:46 | Modified: | 5 May 2008 19:45 | 
| Reporter: | Victor Porton | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) | 
| Version: | 5.0.45 | OS: | Any | 
| Assigned to: | CPU Architecture: | Any | |
   [5 May 2008 15:55]
   Victor Porton        
  The script which causes the bug
Attachment: test.pl (application/x-perl, text), 667 bytes.
   [5 May 2008 18:35]
   Peter Laursen        
  in *pure SQL* and server 5.0.51b I cannot reproduce the problems.
CREATE TABLE `test` (
  `name` varchar(255) collate utf8_bin NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT into test VALUES ('Бельгия'),('Германия'),('Италия');
SELECT name FROM test ORDER BY name;
/* returns
name            
----------------
Бельгия  
Германия
Италия   
*/
Петер :-)
(not a MySQL person)
 
   [5 May 2008 19:12]
   Sveta Smirnova        
  Thank you for the report.
Please add following code to the end of your script and provide us its output from your environment:
$col = $dbh->selectall_arrayref("show variables like 'char%'");
foreach my $c (@$col) {
	print join("---", @$c);
	print("\n");
}
 
   [5 May 2008 19:38]
   Victor Porton        
  Modified script
Attachment: test.pl (application/x-perl, text), 830 bytes.
   [5 May 2008 19:39]
   Victor Porton        
  The output of modified script (see "Files" section): Италия Бельгия Германия ----------- character_set_client---latin1 character_set_connection---latin1 character_set_database---utf8 character_set_filesystem---binary character_set_results---latin1 character_set_server---latin1 character_set_system---utf8 character_sets_dir---/usr/share/mysql/charsets/
   [5 May 2008 19:45]
   Sveta Smirnova        
  Thank you for the feedback. character_set_client---latin1 character_set_connection---latin1 ... character_set_results---latin1 from the output shows you use latin1 client encoding with data in utf8. You should specify utf8 as character_set_client, character_set_connection, character_set_results for your application. So I close the report as "Not a Bug".


Description: The order of sorting utf8 columns is wrong. How to repeat: Use the below script (it should be encoded as UTF-8). It outputs (Russian) words in wrong order: Италия Бельгия Германия Should be: Бельгия Германия Италия #!/usr/bin/perl -w -T # WARNING: This script REMOVES the table `test`!!! # Replace this with your DB connect code: BEGIN { $ENV{'DOCUMENT_ROOT'} =~ /^(.*)$/; require "$1/../data/perl/Base.pm"; } use Common; my $dbh = connect_to_db; $dbh->do("DROP TABLE IF EXISTS `test`"); $dbh->do(qq{ CREATE TABLE `test` ( `name` varchar(255) collate utf8_bin NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin }); $dbh->do("INSERT test (name) VALUES('Бельгия'),('Германия'),('Италия')"); my $col = $dbh->selectcol_arrayref("SELECT name FROM test ORDER BY name"); print "Content-Type: text/plain; charset=utf8\n\n"; print join("\n", @$col); Suggested fix: "ORDER BY CONVERT(name USING latin1)" probably works (at least with the case of the three given words, not checked generally).