Bug #12400 Truncation in TEXT colums when using utf8
Submitted: 5 Aug 2005 14:59 Modified: 8 Aug 2005 10:52
Reporter: eric cournarie Email Updates:
Status: Duplicate Impact on me:
None 
Category:Connector / J Severity:S1 (Critical)
Version:3.10 OS:Windows (Windows)
Assigned to: CPU Architecture:Any

[5 Aug 2005 14:59] eric cournarie
Description:
 running on a MySQL server version 1.11-nt in utf8 mode
 
 in a column defined as a TEXT, if I insert a text which includes non-ascii characters, the text is truncated although its size should fit in the column.

 a TEXT size can contain 2^16 bytes, which according to the documentation, allows to store  2^16 / 3 utf8 characters, as utf8 characters are encoded into 3 bytes.

 however, if I insert a 2^16 / 3 length string, but which contains unicode characters, the string is truncated (without any SQL error or warnings)

 I'm using PreparedStatement.setCharacterStream for inserting the text.

 note that on the 3.1.6 version of the driver, everything works fine.

 but this bug happens also on 3.1.8 version.

How to repeat:

 try to insert a record with a string containing non-ascii characters (some japanese characters for instance). the string should be a 21845 character length string. when reading it back, you will notice that last characters are missing.

 ( for instance, if you have 10 non-ascii characters in your string, you will probably miss 10 characters at the end.)
[5 Aug 2005 15:12] eric cournarie
sorry , you should server version 4.1.11 , not 1.11

 here is also my mySQL variables values
 character_set_client            | utf8
 character_set_connection        | utf8
 character_set_database          | utf8
 character_set_results           | utf8
 character_set_server            | utf8
 character_set_system            | utf8
[5 Aug 2005 21:21] Aleksey Kishkin
eric, we need repeatable testcase in order to find the cause of this bug. Could you please give us example of such string?
[8 Aug 2005 8:56] eric cournarie
create a table like this:  create table testtable { a TEXT };

program to test;

String nonAsciiChars = "\u0167\u0167\u0167\u0167\u0167\u0167\u0167\u0167\u0167\u0167";
String endMarker = "This should appear at the end of this string.";

// max size of a string in utf8
int maxCount = (65536 / 3);

StringBuffer myData = new StringBuffer ();

for (int i =0; i < (maxCount - nonAsciiChars.length() - endMarker.length()); i++)
                myData.append ("*"); // just for filling the string
// add non ascii characters
myData.append(nonAsciiChars);
// add the end marker of the string
myData.append(endMarker);

String value = myData.toString();
// just for checking
System.out.println("Inserting a string of size " + value.length());

String sqlInsert = "INSERT INTO testtable VALUES ( ? )";
PreparedStatement ps = connexion.prepareStatement(sqlInsert);
ps.setCharacterStream (1, new StringReader (value), value.length());
ps.execute();

String sqlGet = "SELECT * FROM testtable";
ps = connexion.prepareStatement(sqlGet);
ResultSet rs = ps.executeQuery();
rs.next();
int dataColumnId = rs.findColumn("a");
Clob clob = rs.getClob (dataColumnId);
String v = clob.getSubString(1L, (int) clob.length());

System.out.println(" end read : " + v.substring(v.length() -50));

 the output gives me

 Inserting a string of size 21845
 end read : *****??????????This should appear at the end of th

 the string appear truncated . by doing a select * in MySQL directly, the same appear. so I think it's really at the time we write the data.

note also that when you add more non-ascii chars in nonAsciiChars , the truncation grows..
[8 Aug 2005 10:52] Aleksey Kishkin
Hi! it looks like a duplicate of http://bugs.mysql.com/bug.php?id=11614