Bug #43055 Connector/J causes data truncation error when inserting into latin1 text column
Submitted: 20 Feb 2009 11:50 Modified: 20 Feb 2009 17:04
Reporter: Till Kahle Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.32 OS:Any
Assigned to: CPU Architecture:Any
Tags: clob, data truncation, Latin1, longtext, text, utf8

[20 Feb 2009 11:50] Till Kahle
Description:
When inserting latin1 characters into a latin1 table, I get different results using varchar columns and text columns.
varchar works as expected - character sets are translated the way they should.
text (or longtext) is broken - there is a "data truncation" error and no data is inserted as soon as there are non-ascii characters involved.

The database connection uses unicode / utf8 since the same application has some utf8 tables that need to be handled correctly.

The database is the current debian etch version:
5.0.32-Debian_7etch8

How to repeat:
The following java program produces the error:

-----

package weikatec.test.mysql5bug;

import java.sql.DriverManager;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;

public class DataTruncationBugTest {

    public DataTruncationBugTest() {
        try {
            String url = "jdbc:mysql://servername:3306/dbname?useUnicode=true&characterEncoding=utf8";
            Class.forName("com.mysql.jdbc.Driver");
            Connection con = DriverManager.getConnection( url, "user", "pwd" );
            System.out.println( "Database driver " + con.getMetaData().getDriverVersion() +
                " with database " + con.getMetaData().getDatabaseProductVersion() );
            Statement stmt = con.createStatement();
            stmt.execute( "drop table if exists TillsEncodingTest" );
            stmt.execute( "create table TillsEncodingTest( name varchar(80) ) engine=myisam charset=latin1" );
            String testText = "This word is in french quotes: »E8300«";
            stmt.execute( "insert into TillsEncodingTest (name) values('" + testText + "')" );
            stmt.close();
            ResultSet rs = con.createStatement().executeQuery( "select name from TillsEncodingTest" );
            rs.next();
            String result = rs.getString( 1 );
            rs.close();
            System.out.println( "still alive; result from varchar is " + result );

            stmt = con.createStatement();
            stmt.execute( "drop table if exists TillsEncodingTest" );
            // now we use a text column (or longtext) instead of varchar:
            stmt.execute( "create table TillsEncodingTest( name longtext ) engine=myisam charset=latin1" );
            testText = "This word is in french quotes: »E8300«";
            // this line throws an exception:
            stmt.execute( "insert into TillsEncodingTest (name) values('" + testText + "')" );
            stmt.close();
            rs = con.createStatement().executeQuery( "select name from TillsEncodingTest" );
            rs.next();
            result = rs.getString( 1 );
            rs.close();
            System.out.println( "still alive; result from varchar is " + result );
            
        } catch( Exception e ) {
            e.printStackTrace();
        }
    }

    /**
     */
    public static void main(String[] args) {
        new DataTruncationBugTest();
    }

}

----

The program output shows that the varchar column is treated correctly:

Database driver mysql-connector-java-5.1.7 ( Revision: ${svn.Revision} ) with database 5.0.32-Debian_7etch8
still alive; result from varchar is This word is in french quotes: »E8300«
com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 'name' at row 1
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3513)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3447)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1951)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2101)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2548)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2477)
	at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:741)
	at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:587)
	at weikatec.test.mysql5bug.DataTruncationBugTest.<init>(DataTruncationBugTest.java:39)
	at weikatec.test.mysql5bug.DataTruncationBugTest.main(DataTruncationBugTest.java:55)
[20 Feb 2009 11:57] Tonci Grgin
Hi Till and thanks for your report.

What happens if you turn off "jdbc compliant truncation"?
Further more, I do not think this is a bug as TEXT columns have utf8_general_ci as collation, you can check that by invoking mysql command line client with -T option and doing "SELECT * FROM testtable".
I also suggest you make a table with both TEXT and VARCHAR fields in it, insert some data and then try above in mysql cl client.
[20 Feb 2009 12:04] Tonci Grgin
Hmmm, an output of SHOW VARIABLES LIKE "%CHAR%" would be most welcome too.
[20 Feb 2009 12:18] Till Kahle
Hi Tonci,
thanks for your speedy reply!

> What happens if you turn off "jdbc compliant truncation"?

If I add &jdbcCompliantTruncation=false to the connection URL it works!
Data is inserted without any warning.
That would mean that I won't get an error when I insert 10 characters into a varchar(9) field, and mysql will silently strip 1 character, correct?

> Further more, I do not think this is a bug as TEXT columns have
> utf8_general_ci as collation, you can check that by invoking mysql
> command line client with -T option and doing "SELECT * FROM
> testtable".

No, not quite, mysql -T says:
> Collation:  latin1_swedish_ci (8)
This makes sense since I created the table explicitly with "charset=latin1".

> I also suggest you make a table with both TEXT and VARCHAR fields in
> it, insert some data and then try above in mysql cl client.

In fact there is no data (not even trucated data) inserted into the text column
when the exception occurs. If I use a table with 2 columns, varchar and text, I get the exception right away and no data is inserted at all.

The command line client (the one I have) works nicely since it _only_ uses latin1 on all sides, there is no need to convert utf8 to latin1 for the text column and no error or warning occurs.
My java program works just like the command line if I use latin1 for the mysql connection, but that is not a solution for two reasons:
- it can't handle real utf8 tables with russian or chinese text any more
- it can't handle latin1 characters 0x80 to 0x9f (that is a cp1252 extension to iso-8859-1) correctly, they get translated to question marks. Probably another bug, but not a big deal if utf8 works on all sides.
[20 Feb 2009 12:23] Tonci Grgin
Till, ok.
---
No, not quite, mysql -T says:
> Collation:  latin1_swedish_ci (8)
This makes sense since I created the table explicitly with "charset=latin1".
---

Now I really need to see output from show variables.
[20 Feb 2009 12:28] Till Kahle
Sorry, I missed your 2nd posting. Here's the show variables output:

show variables like '%CHAR%'
  character_set_client = utf8
  character_set_connection = utf8
  character_set_database = latin1
  character_set_filesystem = binary
  character_set_results = 
  character_set_server = latin1
  character_set_system = utf8
  character_sets_dir = /usr/share/mysql/charsets/
[20 Feb 2009 14:14] Tonci Grgin
Test case for our test framework.

Attachment: TestBug43055.java (text/x-java), 3.50 KiB.

[20 Feb 2009 14:31] Tonci Grgin
Ahh hmmm. I knew this will be a b.... as soon as I saw it.

Let's see. You probably have "STRICT_TRANS_TABLES' in your SQL_MODE, that's causing truncation to become error instead of warning.
Then, you have cset cli utf8, cset conn utf8, cset results NULL, but server/database/table/fields are all latin1. This leads to lossy conversion regarding encoding of certain utf8 chars to latin1. I suspect "french quote" is one of them.
All in all, environment you set up is not suitable for what you want to do...

Now for my test case. First I stumbled upon Bug#27475 and Bug#24886:
Server version: 5.1.31 MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table test43055 (vcfld VARCHAR (50) NOT NULL PRIMARY KEY, txtfld TEXT) Engine=MyISAM charset latin1;
Query OK, 0 rows affected (0.02 sec)

mysql> select * from test43055;
Field   1:  `vcfld`
Catalog:    `def`
Database:   `test`
Table:      `test43055`
Org_table:  `test43055`
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     50
Max_length: 0
Decimals:   0
Flags:      NOT_NULL PRI_KEY NO_DEFAULT_VALUE PART_KEY

Field   2:  `txtfld`
Catalog:    `def`
Database:   `test`
Table:      `test43055`
Org_table:  `test43055`
Type:       BLOB       <<<<<<<<<<<<<<<<
Collation:  latin1_swedish_ci (8)
Length:     65535
Max_length: 0
Decimals:   0
Flags:      BLOB

0 rows in set (0.00 sec)

and BLOB has no charset...

Then I set up my environment as follows (so that *no* UTF8 -> Latin1 lossy conversion occurs):
Server version: 5.1.31 MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> SHOW VARIABLES LIKE "%char%";

+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | latin1                           |
| character_set_connection | latin1                           |
| character_set_database   | latin1                           |
| character_set_filesystem | binary                           |
| character_set_results    | latin1                           |
| character_set_server     | latin1                           |
| character_set_system     | utf8                             |
| character_sets_dir       | /opt/mysql/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)

and my test case inserted few rows of data into table:
mysql> SELECT * FROM bug43055;
Field   1:  `vchar`
Catalog:    `def`
Database:   `test`
Table:      `bug43055`
Org_table:  `bug43055`
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     50
Max_length: 38
Decimals:   0
Flags:      NOT_NULL PRI_KEY NO_DEFAULT_VALUE PART_KEY

Field   2:  `Class`
Catalog:    `def`
Database:   `test`
Table:      `bug43055`
Org_table:  `bug43055`
Type:       BLOB
Collation:  latin1_swedish_ci (8)
Length:     65535
Max_length: 38
Decimals:   0
Flags:      BLOB

+----------------------------------------+----------------------------------------+
| vchar                                  | Class				  |
+----------------------------------------+----------------------------------------+
| First line text                        | First line text			  |
| Second line text                       | Second line text			  |
| This word is in french quotes: »E8300« | This word is in french quotes: »E8300« |
+----------------------------------------+----------------------------------------+
3 rows in set (0.00 sec)

Finally, here's output of test case, completely correct:
Loading JDBC driver 'com.mysql.jdbc.Driver'
Done.

Done.

Connected to 5.1.31
java.vm.version         : 1.5.0_17-b04
java.vm.vendor          : Sun Microsystems Inc.
java.runtime.version    : 1.5.0_17-b04
os                      : Windows Server 2008, 6.0, x86
sun.management.compiler : HotSpot Client Compiler
-------------------------------------------------
1 .fld: vchar, MD: java.lang.String VARCHAR 12
2 .fld: Class, MD: java.lang.String VARCHAR -1
First line text		First line text	
Second line text	Second line text	
This word is in french quotes: »E8300«	This word is in french quotes: »E8300«	

So, be ware the difference between what you want and what you do. This is !Bg on the basis of suspicion that UTF8 » character can not be converted to latin1. Please change tables/columns to be UTF-8 or cset connection/client to be latin1.
[20 Feb 2009 14:57] Tonci Grgin
I stand partially corrected by Bar:
mysql> select hex(convert(_utf8'»«' using latin1));
+----------------------------------------+
| hex(convert(_utf8'»«' using latin1)) 	 |
+----------------------------------------+
| BBAB 					 |
+----------------------------------------+
1 row in set (0.00 sec)
0xBB - right double quote
0xAB - left double quote

so I presume you used some other character code.
[20 Feb 2009 15:32] Till Kahle
Thank you so much for your time and patience!

> 0xBB - right double quote
> 0xAB - left double quote
> so I presume you used some other character code.

I wish you were right, but that's exactly the codes I have
in my java source code (the java source is in iso-8859-1).

> So, be ware the difference between what you want and what you do. This is !Bg on the basis
> of suspicion that UTF8 » character can not be converted to latin1.

Well, these characters translate perfectly well to latin1,
and this is what mysql does in the first part of the example.
Keep in mind, this code works with varchar columns.

This is not a case of lossy conversion, there is no loss,
and there is no truncation. The value is completely rejected
by mysql server 5.0.32, but only for text columns, not for
varchar.

From your testcase I would guess that the bug is on the
server side and has been fixed somewhere between
5.0.32 and 5.1.31?

Are you using the same Connector/J version 5.1.7?
[20 Feb 2009 15:47] Tonci Grgin
Till, first clear SQL_MODE and retest. I think it'll work then.
Next, Bar already proved the characters in question are transcoded correctly, so I'm -1 on that.
It could be something in server too, but I'd start with SQL_MODE first. If it fails then I'd look for more recent server (which is a good idea anyway).
Finally, I am using 5.1.8 as I build from bzr source repository. But I see no relevant changes in changelog.

Inform me of your result.
[20 Feb 2009 16:40] Till Kahle
You're right about SQL_MODE.

I can't find any setting in my.cnf, on the command line I get:
mysql> select @@GLOBAL.sql_mode,  @@SESSION.sql_mode;
+-------------------+--------------------+
| @@GLOBAL.sql_mode | @@SESSION.sql_mode |
+-------------------+--------------------+
|                   |                    |
+-------------------+--------------------+

In jdbc I get:

global sql_mode: '',  session sql_mode: 'STRICT_TRANS_TABLES'

So calling "SET sql_mode = ''" helps, as does jdbcCompliantTruncation=false,
but it changes database semantics to something we don't really want.

If this is the only workaround, we'll use it until we have a server
version where the bug is fixed and text columns behave like varchar again.

I think I can now definitely say it is a server bug and it has been fixed:

server 5.0.27 win32 - no bug
server 5.0.32 debian - buggy
server 5.0.37 win32 - buggy
server 5.0.51a win32 - no bug

You are definitely right about using a newer version of mysql, but it's so nice
for an admin to use the debian packages...
[20 Feb 2009 16:56] Tonci Grgin
:-)

Thanks for your interest in MySQL.
[20 Feb 2009 17:04] Till Kahle
I'll set this to "Colsed" since it has been fixed.