Bug #43471 error when insert a 4bytes utf8 word(s)
Submitted: 7 Mar 2009 12:47 Modified: 8 Mar 2009 7:23
Reporter: li sx Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / J Severity:S1 (Critical)
Version:5.1.7 OS:Windows
Assigned to: CPU Architecture:Any
Tags: 4bytes utf8 insert

[7 Mar 2009 12:47] li sx
Description:
I use mysql-5.1.32GA and connector/J-5.1.7
before,all the configs are utf8.
then,insert some words,these words are in unicode extB,so,they are 4 bytes.
and if I use the client,I can insert these words,but if I use connector/J,will have some problems.
when I do select by connector/J,no problem.
the exceptions below:

java.sql.SQLException: Incorrect string value: '\xF0\xA7\x9D\xA3' for column 'name' at row 1
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3515)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3447)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1951)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2101)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2554)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1761)
    at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2046)
    at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1964)
    at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1949)
    at Unicode.execute(Unicode.java:12)
    at Unicode.<init>(Unicode.java:37)
    at Unicode.main(Unicode.java:46)

How to repeat:
CREATE TABLE `ucode1` (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(6) DEFAULT '',
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

import java.sql.*;
import java.net.*;
public class Unicode {
	public void execute(String bytes)throws Exception{
		Class.forName("com.mysql.jdbc.Driver");
		Connection conn = DriverManager.getConnection(
				"jdbc:mysql://localhost:3306/testme", "root", "pwd");
		conn.setAutoCommit(false);
		PreparedStatement ps = conn.prepareStatement("insert into ucode1(name) values(?)");
		ps.setString(1,bytes);
		ps.executeUpdate();
		conn.commit();
		ps.close();
		conn.close();
	}
    public Unicode() throws Exception{
    	//after,it is a 3bytes word.no problem.
    	String s=URLDecoder.decode("%E4%B8%80","UTF-8");
    	execute(s);
    	//after,it is a 4bytes word.in unicode extB.
    	//when insert will have a problem.
    	//but I can use client to insert this word.and use jdbc to query it.
    	s=URLDecoder.decode("%F0%A7%9D%A3","UTF-8");
    	execute(s);
    }
    public static void main(String[] args){
    	try{
    		new Unicode();
    	}catch(Exception e){
    		e.printStackTrace();
    		System.exit(0);
    	}
    }
    
}
[7 Mar 2009 22:43] Mark Matthews
MySQL versions earlier than 6.0 don't support 4-byte UTF-8. The driver sets the sql_mode to 'strict_trans_tables', which means you'll get a truncation error when you try and use unicode characters that are represented with more than 3 bytes in utf-8. 

Your encodings *may* work from the command-line client, but it's just sheer luck. You might have other sequences of characters that do not "round-trip".
[8 Mar 2009 7:23] li sx
thx for your answers.so quick!
now I must use blob column to save the 4bytes.and then,I download the mysql-6.0.9-alpha-community,try the test(Unicode.java)
now I see no problem,but when the word insert into the table,it is not the word before,it is sub of before,see like "?".
and I use client insert two words,one is 3bytes the other is 4bytes,then,I use

select length(name) from ucode1;

type the length of the words,3bytes one return 3,and 4bytes is 6.I hope they will return the same number.
so I do not know is it a problem of mysql or the connector?

last,the configs of mysql-6xxx are utf8,is it right?or set utf8mb3 or other?
[20 Aug 2010 12:40] Mohammed Saleem
I have searched for a solution for this, found something might be interesting:

<a href="http://abusleem.net/2010/storing-4-bytes-utf8-characters-in-mysql/">storing-4-bytes-utf8-characters-in-mysql</a>