MySQL Bugs: #5450: Prepared updated of Unicode strings are not transcoded

Bug #5450	Prepared updated of Unicode strings are not transcoded
Submitted:	7 Sep 2004 13:44	Modified:	17 Sep 2004 1:03
Reporter:	Benson Margulies	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	Connector / J	Severity:	S2 (Serious)
Version:	3.1.3-beta	OS:	Windows (XP)
Assigned to:	Mark Matthews	CPU Architecture:	Any

Description:
If you prepare an insert and set a parameter to a string containing interesting Unicode, where the target is a UTF-8 column, the results in the database are garbage. If you pre-transcode to UTF-8 in a byte array and set the parameter to the bytes, all is well. This test performed with 4.1.3b beta of the database.

How to repeat:
I'll attach a some useful materials for reproducing this.

utf-8 file to insert test data, Java test program.

Attachment: impr.zip (application/x-zip-compressed, text), 5.01 KiB.

my.cnf

Attachment: my.cnf (application/octet-stream, text), 1.96 KiB.

It looks like there is an issue here. I am working to improve the test case.

I apologize that my test case is not quite as crisp as it might be. It started out as a program to check that UTF-8 data could be read out via a String from the ResultSet. When that tested out OK, I added some code to test round-trip, and found the apparent defect. You could just chop out everything above the insert, and toss out the .sql file.

I was not able to repeat this bug when I used 3.1.4 beta and MySQL-4.1.4.

I am able to reproduce this with 3.1.4-beta of Connector/J, and MySQL 4.1.4-gamma for Windows.

As Benson Margulies' test case indicates, I am not able to successfully use

	myPreparedStatement.setString(1, myString);

to insert/update character data, and must instead use, as a workaround,

	myPreparedStatement.setBytes(1, myString.getBytes("UTF-8"));

Everything is fine when I read back the data using

	myResultSet.getString(1);

The tables were created with explicit UTF-8 encoding on the relevant columns, rather than allowing the columns to inherit default encodings from the table, from the database, and/or from the server.

If I use the "setBytes" workaround throughout the code, all is fine, and I can happily write/read greek letters successfully.