Bug #61483 connector/J not correctly handling Korean characters
Submitted: 10 Jun 2011 16:59 Modified: 10 Jun 2011 18:01
Reporter: Peter Turk Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:MySQL 5.5.13, Connector/J 5.1.16 OS:Any (Windows XP, Mac OS X)
Assigned to: CPU Architecture:Any
Tags: connector, utf8

[10 Jun 2011 16:59] Peter Turk
Description:
Korean characters inserted into a MySQL database using Connector/J turns into question marks. Korean characters retrieved from a MySQL database using Connector/J turn into question marks.

How to repeat:
Create a database named utf8 with "character set=utf8". In MySQL Workbench, execute this:

use utf8;
create table utftest (id integer, name_last varchar(20));
insert into utftest (id,name_last) values(6,'서비스');
select * from utftest;

{ For convenience, the characters in the string above are
서	c11c
비	be44
스	c2a4
but the problem occurs with all Korean characters. These are just a random sample.}

The select statement, in either MySQL Workbench or mysql command line, shows the correct string;

Now repeat the select through connector/J 5.1.16 and the Korean characters have turned into question marks:
select * from utftest
6, ???

The attempts to debug this have been performed on Mac OS X 10.6.7, MySQL 5.5.13, Connector/J 5.1.16, but we first encountered the problem on a Windows XP machine and have observed it on several older combinations of MySQL and Connector/J.

When connecting with Connector/J, I do: SET NAMES 'utf8' to ensure that everything is in utf8. To confirm this, SHOW VARIABLES LIKE 'character_set%' produces this:

character_set_client=utf8
character_set_connection=utf8
character_set_database=utf8
character_set_filesystem=binary
character_set_results=utf8
character_set_server=latin1
character_set_system=utf8

Suggested fix:
Fully support Korean characters through Connector/J.

What is MySQL Workbench doing that my program cannot do through MySQL Connector?
[10 Jun 2011 17:25] Mark Matthews
Connector/J only sends characters in UTF-8 if either the characterEncoding property in your connection string has been set to "UTF-8", or if character_set_server on MySQLd is set to "UTF-8", which essentially triggers the "SET NAMES ..." call.

You are using "SET NAMES" in the mysql client, and workbench sets the connection to UTF-8 by default. 

Are either of the above conditions true in your testcase? If not, does setting them as described fix this issue?
[10 Jun 2011 18:01] Peter Turk
MySQL staff sent me this message:

Connector/J only sends characters in UTF-8 if either the
characterEncoding property in your connection string has been set to
"UTF-8", or if character_set_server on MySQLd is set to "UTF-8", which
essentially triggers the "SET NAMES ..." call.

I added "characterEncoding=utf8" to my connection string, and the problem disappeared.

Thanks.