Bug #3114 Unable to save utf8 data into MySql with JAVA Connector
Submitted: 9 Mar 2004 6:18 Modified: 24 Nov 2004 20:28
Reporter: nicolas nicolas Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:4.1.1a OS:Solaris (Solaris / Windows)
Assigned to: Mark Matthews CPU Architecture:Any

[9 Mar 2004 6:18] nicolas nicolas
Description:
I use mysql-connector-java-3.0.11-stable-bin.jar

I was working on MySql 4.1.0a and it was working properly. 

I had done the following steps :
1./ jdbc:mysql://host:port/pmt?useUnicode=true&characterEncoding=utf-8

2./ In JSP, when I save a data formular I used :
String value = new String(myData.getBytes("iso-8859-1"),"utf8");

3./ When I read data from MysQl, I did					
SELECT value FROM .....

How to repeat:
With MySql 4.1.1a, if I set my jdbc connection like
---------------------------------------------------
1./ jdbc:mysql://host:port/appli?useUnicode=true&characterEncoding=utf-8

I have an error since the fist sql query in java 
SELECT * FROM title

The error is :
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_
ci,COERCIBLE) for operation '='".

So I decide to update my application. I set the jdbc datasource like
1./ jdbc:mysql://host:port/appli

and to read data from MySql I use
SELECT CONVERT(Value USING utf8) FROM ....   => it works

THE PROBLEM IS THE FOLLOWING ONE : 
I am unable to save data properly in MySql ....

I try to use : 

1/ CREATE TABLE `test` (
  `col1` varchar(255) ,
) TYPE=InnoDB CHARACTER SET utf8 ( I try also TYPE=MyISAM DEFAULT CHARSET=utf8;)

2/ set names 'utf8'
3/ SET CHARACTER SET 'utf8'

3/ INSERT INTO test SET col1='Kay?r sesi',
                        col2=CONVERT('Kay?r sesi' USING utf8)

I try CAST .... but no way to save data in MySql, It puts ?????? for special characters ....

Suggested fix:
?
[17 Mar 2004 7:54] Mark Matthews
If you are going to store UTF-8 data in MySQL-4.1.1 or newer, your tables need to have a compatible collation as well (it appears that this was not checked in versions prior to 4.1.1).

See http://www.mysql.com/doc/en/Charset-Unicode-sets.html and http://www.mysql.com/doc/en/Charset-examples.html for more information, where the last link explains that the database default collation (latin1_swedish_ci) will be used if you don't specify one for your table.
[25 Mar 2004 2:20] nicolas nicolas
Thanks for your answer

I tried to add  COLLATION information to the table.

CREATE TABLE Translate (
  translationValue char(255),
  isMarked smallint(6) NOT NULL default '0',  
  PRIMARY KEY  (languageId,constrainId)
) DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

When I tried a SELECT like 
select CONVERT(translationValue USING utf8) from translate
select translationValue  from translate

I have always ??? instead of special characters  ...

I become out of more ideas .
Thanks again. Help is welcome.
[24 Nov 2004 16:42] Hontvari Jozsef Levente
I think this is indeed a bug in the Java connector, it affected me too. It cannot be easily fixed if different client apps access the database: one with useUnicode etc., the other is without that. 

The same connection string and database worked well in a previous mysql version now it doesn't - so this is at least an incompatibility. Now the question is which one is responsible: mysql server or java connector. I believe the change in the mysql server didn't necessarily cause any incompatibility. Previously the useUnicode=true&characterEncoding=UTF8 was a hack to store utf-8 in a latin-1 database, which was needed because mysql didn't support UTF-8. Mysql server never knew about this parameter! The driver did all the dirty work. It seems to me that the meaning of the very same parameters are changed, and now (instead of doing the same old hack) the driver tells the server that he wants to communicate in UTF-8! The two thing is completely different and unrelated! Of course it won't work. 

The connector should have used another parameter name for that functionality, and it should have left the old behaviour unchanged.
[24 Nov 2004 20:28] Mark Matthews
The new behavior in the server doesn't let you have it both ways.

Consider the 'old hack' deprecated. If you want to use the new server, you'll have to fix your application to not use a 'hack'.
[18 Feb 2005 17:07] Javier Martinez
Hello all, 
I have the same problem as you (same version of Mysql and connector)
Do you have any update on it?

thanks in advance,
Javier
[7 Dec 2006 4:11] Adam SI
I have come to a solution for Chinese UTF-8, may it works for you :)
The context is
* ubuntu linux 6.10
* locale zh_cn.UTF-8
* mysql-server-5.0
* mysql connector/J 5.0.4

The essential part is adding "--character-set-server utf8 --collation-server=utf8_general_ci" to the startup command of mysqld_safe ( in my case it is in /etc/init.d/mysql ) and then it all works, no need to add characterEncoding to connection string, no need to "SET NAMES utf8", no need to do the "getBytes(xxx)" trick. 

But you still have to add CHARACTER SET and COLLATION clause to CREATE DATABASE and  CREATE TABLE

You can have a try and tell us if it works for your language.

Adam
[16 Feb 2007 14:35] Mike Salsbury
I've tried this until it worked.  I got it to work on 5.0 on Windows XP, and 4.1.15 on Red Hat Enterprise 3.

The 5.0 version was installed 'to support many languages'.  I assume this makes utf8 the default character set and collation.  The 4.1.15 has latin1 as the default character set and collation.  This configuration worked on both.

1.  The column in the database has to have a character set and collation for utf8.  I.e.
CHARACTER SET utf8 COLLATE utf8_general_ci
 
2.  The request before storing to the database has to be handled as UTF-8.  I.e. needs a 
request.setCharacterEncoding("UTF-8");
line before any submitted parameters are read.  This was the vital missing piece of the jigsaw for me.
 
3.  The response also needs to be handled as UTF-8.  I.e. needs a 
response.setContentType("text/html; charset=UTF-8");
line in all files handling the response.  Not totally sure if this is required as one file worked without it.
 
4.  The character set and encoding need to be set on the database URL in the web.xml.  E.g.
<url>jdbc:mysql://localhost:3306/DATABASENAME?useUnicode=true&amp;characterEncoding=UTF8</url>
[19 Jun 2007 6:00] Peter L
Adam SI's comment worked for me:

Context:
CentOS 4.4
MySQL 4.1
MSql Connector 5.0.4 for Java

Table and columns are all set to utf8 and utf8_unicode_ci.

After inserting the command line parameters to myssqld_safe to the /etc/init.d/mysqld file's start() section, it is all peachy and I can store Japanese characters. The JSPs are set to SJIS, but JAva converts internally to Unicode and the driver now converts to utf8, all is well.