Bug #21347 | hebrew utf8 ORDER BY produces bad results in OpenBSD MySQL 5.0 | ||
---|---|---|---|
Submitted: | 30 Jul 2006 9:36 | Modified: | 29 Jun 2007 11:25 |
Reporter: | Jay Gay | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 5.0 | OS: | Any (Linux, OpenBSD) |
Assigned to: | Georgi Kodinov | CPU Architecture: | Any |
Tags: | hebrew, order by, utf8 |
[30 Jul 2006 9:36]
Jay Gay
[30 Jul 2006 9:56]
Jay Gay
Here is the table creation: CREATE TABLE `aleppo1phrase` ( `phrase` varchar(255) collate utf8_unicode_ci NOT NULL, `count` int(11) NOT NULL, PRIMARY KEY (`phrase`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci As you can see the charset is utf8 and the collate is utf8_unicode_ci.
[2 Aug 2006 19:52]
Sveta Smirnova
Thank you for the report. Could you please provide dump of about 10 rows which allows me to repeat the problem?
[3 Aug 2006 6:33]
Sveta Smirnova
Please, also provide output of SHOW VARIABLES LIKE '%char%'; statement and statement you used to insert values in the table.
[2 Sep 2006 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[8 May 2007 15:10]
liron wachs
hello, I was searching for a solution for my bug through google and found this bug that was reported more than half a year ago and represents my bug exactly - the ORDER BY is not ordering hebrew words that are encoded in utf8 in the correct order.I couldn't find here any solutionn for it. Do you have a solution for this bug? thanks.
[8 May 2007 18:04]
Omer Barnir
The above list in the correct Hebrew order (added a column referancing the original one)
Attachment: correct order.pdf (application/pdf, text), 15.87 KiB.
[8 May 2007 18:13]
Sveta Smirnova
Thank you for the report. Verified as described.
[8 May 2007 18:13]
Sveta Smirnova
test case
Attachment: bug21347.test (application/octet-stream, text), 503 bytes.
[29 Jun 2007 10:37]
Sveta Smirnova
Test with corrected connection collation
Attachment: bug21347.test (application/octet-stream, text), 546 bytes.
[29 Jun 2007 11:25]
Georgi Kodinov
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php There seem to be two problems with this report : 1. utf-8 literals are passed to the server while the character set used by the server to parse sql statements (character_set_client) is not utf-8 (but the default latin1). This is checked by : SHOW VARIABLES LIKE 'character_set_client' This returns latin1 (by default). This causes the server to interpret the utf-8 as latin1 and encode it again as utf-8 : hence the strange codes that are seen through hex(). 2. "ORDER BY HEX(string)" is not the same as "ORDER BY string" : that's why even character_set_client (or SET NAMES utf8) is set to utf-8 the original hebrew character sequences appear unsorted.