Bug #10870 MyODBC/MySql Connector/Net mangles unicode inserts
Submitted: 25 May 2005 21:25 Modified: 22 Jun 2005 19:45
Reporter: Mark Modrall Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / NET Severity:S2 (Serious)
Version:4.1.12/odbc 3.51/ .net 1.04 OS:Windows (W2K)
Assigned to: Reggie Burnett CPU Architecture:Any

[25 May 2005 21:25] Mark Modrall
Description:
I have a program written in C#/.Net that's processing string data and inserting bits into a MySql database.  Occasionally there will be chinese or russian or greek characters floating through.  C# works in ucs2 natively, so I know the strings are fine there, but when I insert them through either ODBC 3.51 or the MySql Connector/.Net, they get dumbed down to '?' in the table.

The table has been created using utf-8 and accented latin1 characters transfer just fine, but true double-byte unicode gets mangled.  I've seen the bug threads about the LOAD FILE function having this problem and that there's a workaround but I don't think that applies here.

How to repeat:
In C#, using .Net connector (has to be saved using unicode for the file):
MySqlConnection conn = new MySqlConnection ("server=svr;user id=user;password=pass;database=test;pooling=false" );
conn.Open();

MySqlCommand cmd = conn.CreateCommand();
cmd.CommandText = "create table if not exists temp3 (\n"+
		"query TEXT NOT NULL\n" +
		") CHARACTER SET UTF8;";
cmd.ExecuteNonQuery();

// works fine
cmd.CommandText = "insert into temp3 (query) VALUES (?query)";
cmd.Parameters.Clear();
cmd.Parameters.Add (new MySqlParameter ("?query", "abc"));
cmd.ExecuteNonQuery();

// doesn't work - all the chinese characters turn into ?
cmd.Parameters.Clear();
string qval = "澳大利亞";
char x;
x = qval[0]; // shows that each character is indeed the right unicode
x = qval[1]; // character codes
x = qval[2];
x = qval[3];
cmd.Parameters.Add (new MySqlParameter ("?query", qval));
cmd.ExecuteNonQuery();

Suggested fix:
don't have one.   If the problem is in the odbc client/ .net connector, changing the table encoding won't help.  If the problem is upstairs, using ucs2 for data that's going to be 98% 8-bit clean won't help.
[26 May 2005 13:52] Vasily Kishkin
Tested on Win 2000 Sp4, Microsoft Visual C# .NET , Connector .NET 1.0.4 Test case and result table are attached.
[26 May 2005 13:53] Vasily Kishkin
Test case

Attachment: 10870.zip (application/x-zip-compressed, text), 4.58 KiB.

[6 Jun 2005 23:11] Mark Matthews
Have you actually configured the _connection_ in either case (.Net, ODBC) to be UTF-8? It's not obvious from your bug report if you have. This will never work if you don't. The drivers are going to pick up the default from your server (which is usually latin1), so _some_ characters will work, but not ones that exist outside latin1.
[7 Jun 2005 14:57] Mark Modrall
Hi Mark...

No I haven't done anything different to change the encoding on the connector.  I've been searching through the documentation for both MyOdbc and the .Net connector (and everything else) and haven't found information on how to set that explicitly.

I found
http://dev.mysql.com/doc/mysql/en/charset-connection.html
which looks like the instructions are for using mysql on the local box, not over a connector.  Are the same non-query commands required to set the connection encoding on the .Net connector and/or MyOdbc?  

Thanks
[7 Jun 2005 21:31] Mark Modrall
Okay, I did try adding
cmd.CommandText = "SET NAMES 'utf8'";
cmd.ExecuteNonQuery();

before the INSERTs in the example for this case.  Since I couldn't find any documentation on how to change the connection character set explicitly for MyOdbc or the .Net Connector, I tried the closest thing I could find.

The command executed without error.  It did change what the database did but it was still way off the mark.  I also added
cmd.CommandText = "insert into temp3 (query) VALUES (?query)";
cmd.Parameters.Clear();
cmd.Parameters.Add (new MySqlParameter ("?query", "résumé"));
cmd.ExecuteNonQuery();

to test how latin1 would now be handled.  Here's what happened with the SET NAMES first:
the chinese string value still got converted to ????
the latin1 string ended up in the db as "r" (everything after the accented character was simply dropped - even worse than ? in some ways).
[8 Jun 2005 16:07] Mark Modrall
Okay, so "Set names utf8" over a MySqlConnection seems to create new problems.  After most of a day Googling user fora I finally got a clue that there's a ;charset= parameter supported on the connection string.

adding ;charset=utf8 to the connection string seems to ameliorate both this problem and the #42000 error with Connector .Net.
[22 Jun 2005 19:36] Reggie Burnett
Adding charset=utf8 to your connection string should fix this.  Please let me know how this turns out so I can close this bug report.
[22 Jun 2005 19:45] Reggie Burnett
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

I am reporting !bug for the connector/net side only.  If you wish, you can repost with issues related to using Connector/ODBC with .NET but they should be posted in the Connector/ODBC group.
[22 Jun 2005 19:47] Mark Modrall
Hi Reggie...

Adding ;charset=utf8 worked for my Connector/Net implementation, though it wasn't really anywhere in the documentation.  Seems like the documentation should have more information on what options exist for the connection string.

I never could find anything that helped with MyOdbc.  Also, using the set names option I found in the documentation simply produced a different bug.

Thanks
_mark