Bug #67027 Could not display greek characters on c# project.
Submitted: 30 Sep 2012 21:41 Modified: 17 Oct 2012 17:31
Reporter: Apostolos Katranitsas Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / NET Severity:S1 (Critical)
Version:6.5.4 OS:Windows (Windows 7 64bit)
Assigned to: Roberto Ezequiel Garcia Ballesteros CPU Architecture:Any
Tags: C#, greek characters, MySql Connector, visual studio 2010

[30 Sep 2012 21:41] Apostolos Katranitsas
Description:
I can not display correctly Greek characters in a Visual Studio 2010 (sp1) c# project. I can see them correctly only if I login with mysql program (on linux machine where the MySQL server is) or from the MySQL Workbench. 
I have to set character_set_results=latin1 first and after that the data displays correctly (on linux mysql program and MySql Workbench).
I have include charset=utf8 in the connection string of my project.

The table is created with the following command:

CREATE TABLE `call_attribute` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `id_call` int(10) unsigned NOT NULL,
  `columna` varchar(30) default NULL,
  `value` varchar(128) NOT NULL,
  `column_number` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `id_call` (`id_call`),
  CONSTRAINT `call_attribute_ibfk_1` FOREIGN KEY (`id_call`) REFERENCES `calls` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=169 DEFAULT CHARSET=utf8

Data is imported from a text (with utf8 encoding) as below:
302,"Κατρανίτσας Απόστολος","Ότι τον φωτίσει ο Θεός","Ηράκλειο","Αττικής"

When I run the select query from my application the displayed data are not in Greek language (in no language actually, only random characters).

Wireshark shows the data from mysql server to my program as (Charset number: latin1 COLLATE latin1_swedish_ci (8)) which is true and data been send are correct.

Am I doing something wrong or this is a problem of the connector???

How to repeat:
Create a project that connects to mysql server.

At the connection string append "CHARSET=utf8";

import the data somehow (I have used an php page to upload them)

query with "set session character_set_results=latin1"

then query with "select * from call_attribute".

Watch the results and cry.
[2 Oct 2012 10:54] Apostolos Katranitsas
Trying with MySQL Entity Framework but the results are the same.
[2 Oct 2012 15:29] Fernando Gonzalez.Sanchez
Hi,

Have you tried appending to the connection string 
CHARSET=greek
instead?

Assuming greek is a supported charset in your mysql server (you can see the lists with 'show character set').
[2 Oct 2012 17:54] Apostolos Katranitsas
Yes, I have tried greek, latin1 and utf8.

Working with MySQL Workbench and mysql (from the linux server) the settings that gave me correct results was to set the session results character set to latin1 (query with 'set character_set_results=latin1'). Both of the programs fetch the data with correct characters. Only in visual studio (in my program) the data is displayed like utf8 string.
[2 Oct 2012 18:43] Apostolos Katranitsas
Sample of input data I upload to MySQL through webpage

Attachment: LocalCallList.csv (text/csv), 129 bytes.

[2 Oct 2012 19:09] Apostolos Katranitsas
I am curious about the character set I choose in mysql (linux program) and MySQL Workbench to fetch my data correctrly. I use latin1 to see greek characters. 
Can it be a problem with my data been uploaded? I'll try to use OpenOffice Calc to see if I can handle it.

One more thing to keep in mind is that I DON'T enter the data directly to MySQL. I use a web page to do that.
[3 Oct 2012 22:06] Apostolos Katranitsas
After a lot of debugging I saw that greek strings are comming from connection as utf8 but displayed as ascii or latin.

Even though I see this : Σάββατο4
if I get each character then it shows this:

         Dec  Hex     Which is :
Char: Î, 0206 00CE    \    
Char: £, 0163 00A3    /    Σ
Char: Î, 0206 00CE    \    
Char: ¬, 0172 00AC    /    ά
Char: Î, 0206 00CE    \    
Char: ², 0178 00B2    /    β
Char: Î, 0206 00CE    \    
Char: ², 0178 00B2    /    β
Char: Î, 0206 00CE    \    
Char: ±, 0177 00B1    /    α
Char: Ï, 0207 00CF    \    
Char: „, 8222 201E    /    this should be τ but gives an exception.
Char: Î, 0206 00CE    \    
Char: ¿, 0191 00BF    /    ο
Char: 4, 0052 0034        

So I see the utf8 characters of the word Σάββατο.

Any ideas???
[5 Oct 2012 20:56] Apostolos Katranitsas
Seems like a double encoding problem!!!
[7 Oct 2012 9:28] Apostolos Katranitsas
I finally solve my problem by setting:

skip-character-set-client-handshake

in /etc/my.cnf

I was lucky because this is a new server and the data was limited so I've manage to re-insert them again.
[7 Oct 2012 9:29] Apostolos Katranitsas
I have also set in /etc/my.cnf the line below
character-set-server = utf8

but I don't know if it has anything to do with my problem.
[8 Oct 2012 21:19] Roberto Ezequiel Garcia Ballesteros
Hi,
Greek chars should use UTF8 since it supports these characters. The table or column should have UTF8 charset and all the statements must use the same. In this case your table has UTF8 support and adding "CHARSET=utf8" to the connection string guarantees you use the right encoding. 
A possible problem is that the data was inserted using a different charset (maybe default latin1) into the table. Useless you are using UTF8 to query data; they were stored using a different charset and the result is a set of unknown characters.
Please try inserting a new test row into the table using Connector/Net and "CHARSET=utf8"” in the connection string. Then try a query and check the new row has the correct characters.
Connector/NET uses the "CHARSET" attribute to encode all the statements into a specific character set for current session.
Also, your server side configuration it's doing almost something similar, but also it's possible to do it from the client.
Please let me know if this works for you.
[9 Oct 2012 18:12] Apostolos Katranitsas
I couldn't make it from client side. I have used charset=utf8 and other (as mentioned at 2 Oct 17:54). No change was able to make the greek character come as it suppose to. After a lot of searching I found that I had a double encoding issue so I've manage to provide a solution by changing the my.cnf file. Nothing else gave the results I wanted. 
I have to remind that I can ONLY GET the data from database, from my program. The input was made from someone else web page and from there into database. I could not change anything from the web page (which was written with php) and as far as I'm concerned, it uses UTF-8 encoding almost everywhere. But I'm not a php developer my self so I couldn't go further.

Can you provide with a decent explanation what could possibly gone wrong with the encoding steps???

If you haven't some of the input or the steps you need to provide such an answer, please ask me.
[17 Oct 2012 17:31] Roberto Ezequiel Garcia Ballesteros
It's not a bug since Connector/NET uses the encoding specified by charset when retrieves data. It's necessary to insert/udate data using the same charset (it works as designed).