Bug #5450 Prepared updated of Unicode strings are not transcoded
Submitted: 7 Sep 2004 13:44 Modified: 17 Sep 2004 1:03
Reporter: Benson Margulies Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:3.1.3-beta OS:Windows (XP)
Assigned to: Mark Matthews CPU Architecture:Any

[7 Sep 2004 13:44] Benson Margulies
Description:
If you prepare an insert and set a parameter to a string containing interesting Unicode, where the target is a UTF-8 column, the results in the database are garbage. If you pre-transcode to UTF-8 in a byte array and set the parameter to the bytes, all is well. This test performed with 4.1.3b beta of the database.

How to repeat:
I'll attach a some useful materials for reproducing this.
[7 Sep 2004 13:45] Benson Margulies
utf-8 file to insert test data, Java test program.

Attachment: impr.zip (application/x-zip-compressed, text), 5.01 KiB.

[7 Sep 2004 16:47] Benson Margulies
my.cnf

Attachment: my.cnf (application/octet-stream, text), 1.96 KiB.

[8 Sep 2004 0:47] Eric Herman
It looks like there is an issue here. I am working to improve the test case.
[8 Sep 2004 0:53] Benson Margulies
I apologize that my test case is not quite as crisp as it might be. It started out as a program to check that UTF-8 data could be read out via a String from the ResultSet. When that tested out OK, I added some code to test round-trip, and found the apparent defect. You could just chop out everything above the insert, and toss out the .sql file.
[17 Sep 2004 1:03] Mark Matthews
I was not able to repeat this bug when I used 3.1.4 beta and MySQL-4.1.4.
[20 Sep 2004 20:42] Michael Saulnier
I am able to reproduce this with 3.1.4-beta of Connector/J, and MySQL 4.1.4-gamma for Windows.

As Benson Margulies' test case indicates, I am not able to successfully use

	myPreparedStatement.setString(1, myString);

to insert/update character data, and must instead use, as a workaround,

	myPreparedStatement.setBytes(1, myString.getBytes("UTF-8"));

Everything is fine when I read back the data using

	myResultSet.getString(1);

The tables were created with explicit UTF-8 encoding on the relevant columns, rather than allowing the columns to inherit default encodings from the table, from the database, and/or from the server.

If I use the "setBytes" workaround throughout the code, all is fine, and I can happily write/read greek letters successfully.