Bug #58174 | Client Encoding to Hebrew does not work | ||
---|---|---|---|
Submitted: | 13 Nov 2010 12:24 | Modified: | 22 Nov 2010 12:02 |
Reporter: | Ido Yana | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | Connector / NET | Severity: | S1 (Critical) |
Version: | MySQL Connector Net 6.3.5 | OS: | Windows |
Assigned to: | CPU Architecture: | Any |
[13 Nov 2010 12:24]
Ido Yana
[13 Nov 2010 12:30]
Peter Laursen
Why do you "SET NAMES 'latin1';" with hebrew? Why not "SET NAMES 'hebrew';" ?? It looks like you have stored hebrew-encoded strings in a latin1 table and it is decoded client side (what is a complete mess, totally incorrect but unfortunately very common with PHP applications) Peter (not a MySQL person)
[13 Nov 2010 12:32]
Peter Laursen
correction: It looks like you have stored hebrew-encoded strings in a utf8 table - but still the same comments!
[13 Nov 2010 12:37]
Ido Yana
while i tried this command with Hebrew, i got question marks instead of the correct string, in phpMyAdmin and also in the C# page i'm using an external code and i'm trying to get the text from the built-in tables why is there a different while using "SET NAMES 'latin1'" from phpMyAdmin to my C# page ? it is working totally fine in phpMyAdmin
[13 Nov 2010 12:38]
Peter Laursen
I believe that you will have to 1) execute "SELECT HEX(such column) FROM table LIMIT 1;" and paste the result. 2) tell how same value displays in the client (PhpMyAdmin and your C# application)
[13 Nov 2010 12:49]
Peter Laursen
I think that for other people to understand the issue you should 1) SHOW CREATE TABLE thetable; 2) SELECT HEX(such_column) FROM thetable LIMIT 1; 3) paste screenshots from phpMyAdmin and your C# application 4) provide simplified C# sample code to reproduce the problem But I still think that non-unicode hebrew-encoded strings are stored in utf8. Anyway I am not involved in neither Connector.Net nor phpMyAdmin - so I will not post here again. Only we (Webyog) have frequently replied to similar issues, refer: http://stats.webyog.com/faq/content/34/152/en/my-special-characters-display-as-_noisy_-lat... (and the statement "you have an application that 'uncritically' saves utf8 encoded strings in a non-utf8 database/table .. " is also reversly possible like "you have an application that 'uncritically' saves non-utf8 encoded strings in a utf8 database/table ..")
[13 Nov 2010 12:50]
Ido Yana
1. here is the result C397E28098C397E280A2C397E28098C397E284A2C397E28093C397C5BEC397C2A8202D20333120C397C593C397C290C397E280A2C397E28099C397E280A2C397C2A1C397CB9C2032303130 2. with SET NAMES 'latin1' phpMyAdmin התחלה C# pages התחלה with SET NAMES 'utf8' phpMyAdmin �?�?�?�?�? C# pages ×?×?×?×?×?
[13 Nov 2010 13:02]
Ido Yana
1. create table SQL CREATE TABLE tableName ( fieldName varchar(250) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 2. pasted it above 3. in a moment 4. here is the C# code string connString = ConfigurationManager.ConnectionStrings["MySQLDBConnectionString"].ConnectionString; MySqlConnection connection = new MySqlConnection(connString); MySqlCommand command = connection.CreateCommand(); try { connection.Open(); command.CommandText = " SET NAMES 'latin1'; " + " SELECT fieldName FROM tableName"; MySqlDataReader reader = command.ExecuteReader(); while (reader.Read()) { reader[fieldName].ToString(); } } catch { } finally { connection.Close(); }
[13 Nov 2010 13:04]
Ido Yana
phpMyAdmin
Attachment: phpMyAdmin.png (image/png, text), 26.39 KiB.
[13 Nov 2010 13:05]
Ido Yana
Visual Studio - Quick Watch.png
Attachment: Visual Studio - Quick Watch.png (image/png, text), 60.75 KiB.
[13 Nov 2010 13:05]
Peter Laursen
I doubt the HEX string represents the same row/column combination as the others! The HEX string is much larger (150 characters = 75 bytes stored). But as 75 is an odd number it is definitely not a utf8-string with hebrew characters (what will be 2 bytes each in utf8 with hebrew characters). I believe phpMyAdmin handles it because it does 'client side decoding' (but it should not in my opinion - it should display what is stored in the database. I also believe your C# applications stores display what is stored in the database - but some stored byte-sequences are not valid utf8 characters) Anyway .. maybe you can handle it by "SET character_set_client = 'hebrew'" (or 'latin1') Also I have no clue about C# and MySQL. So let the MySQL people take over.
[13 Nov 2010 13:14]
Ido Yana
i think that SET NAMES change the following 3 variables character_set_client character_set_result character_set_connection so "SET character_set_client = 'hebrew'" (or 'latin1') won't help in this issue
[13 Nov 2010 13:40]
Peter Laursen
I think there is a chance (but I am not sure!) that SETTING character_set_client and *not* SETTING character_set_result and character_set_connection could work for the case. But it may also depend on the Locale setting of your machine and Visual Studio settings.
[13 Nov 2010 14:19]
Ido Yana
tried that, also the other variables (each one alone, and their combinations), and still garbage
[15 Nov 2010 3:25]
Ido Yana
does anyone have a solution ?? i tried even to use CAST to this field, and still nothing
[15 Nov 2010 6:37]
Tonci Grgin
Ido, of course it is NOT working. This is a classic case of double encoding. You should always know which charset is data in. The most important in charsets is field encoding and session encoding. Only if they mismatch, any conversions are done. So you were passing to and fro latin1 bytes into utf8 stream (ie. utf8 bytes are seen as latin1 chars and encoded into utf8 yet again) so, eventually, you see A?A?A?A?A? (? for characters that can not be encoded in latin1)... Now, what's to be done, I do not know. What you can try, if your whole table is affected this way, is to simply mark your fields as binary, then again as utf8: ALTER TABLE t1 MODIFY f1 VARCHAR(255) CHARACTER SET binary; ALTER TABLE t1 MODIFY f1 VARCHAR(255) CHARACTER SET utf8; If you want to extract particular data in such field, you can: SET NAMES single_byte_charset; SELECT data FROM table; That would return data in Unicode. Finally, do review Susanne's presentation at: http://forge.mysql.com/wiki/How_to_Use_Charsets_and_Collations_Properly I do not see a bug here, so closing.
[15 Nov 2010 6:55]
Ido Yana
isn't latin1 a single_byte_charset ? which exact single byte chartset should i use ? why the above coded that i wrote show the correct result in phpMyAdmin and garbage in my C# pages ? also, i do not want to modify the tables cause it is an external sourcing
[20 Nov 2010 18:05]
Ido Yana
anyone ??
[22 Nov 2010 6:47]
Tonci Grgin
Ido, I can not go on answering questions like: isn't latin1 a single_byte_charset ? which exact single byte chartset should i use ? This is something you should learn by yourself. Please see the presentation I pointed you to, answers you are looking for as well as explanations on how to read SELECT HEX( and why I suggested BINARY conversions are there. In short, you have managed to double-encode your data and need now to find the settings that will work in your environment.
[22 Nov 2010 11:51]
Ido Yana
i build my DB from the begining with the following variables character_set_client utf8 character_set_connection utf8 character_set_database utf8 character_set_filesystem binary character_set_results utf8 character_set_system utf8 the SELECT HEX(field) that i got is 30312E30312E32303130202D20C397E284A2C397E280A2C397C29D20C397C2A4C397C2A9C397E280A2C397CB9C20C397E284A2C397C2A4C397E2809D20262333333B i tried SET NAMES hebrew and SET NAMES utf8 but still didn't succeed to get hebrew as it should be
[22 Nov 2010 12:02]
Susanne Ebrecht
We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem is not the result of a bug. Support on using our products is available both free in our forums at http://forums.mysql.com/ and for a reasonable fee direct from our skilled support engineers at http://www.mysql.com/support/ Thank you for your interest in MySQL.