Bug #43055 | Connector/J causes data truncation error when inserting into latin1 text column | ||
---|---|---|---|
Submitted: | 20 Feb 2009 11:50 | Modified: | 20 Feb 2009 17:04 |
Reporter: | Till Kahle | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | 5.0.32 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | clob, data truncation, Latin1, longtext, text, utf8 |
[20 Feb 2009 11:50]
Till Kahle
[20 Feb 2009 11:57]
Tonci Grgin
Hi Till and thanks for your report. What happens if you turn off "jdbc compliant truncation"? Further more, I do not think this is a bug as TEXT columns have utf8_general_ci as collation, you can check that by invoking mysql command line client with -T option and doing "SELECT * FROM testtable". I also suggest you make a table with both TEXT and VARCHAR fields in it, insert some data and then try above in mysql cl client.
[20 Feb 2009 12:04]
Tonci Grgin
Hmmm, an output of SHOW VARIABLES LIKE "%CHAR%" would be most welcome too.
[20 Feb 2009 12:18]
Till Kahle
Hi Tonci, thanks for your speedy reply! > What happens if you turn off "jdbc compliant truncation"? If I add &jdbcCompliantTruncation=false to the connection URL it works! Data is inserted without any warning. That would mean that I won't get an error when I insert 10 characters into a varchar(9) field, and mysql will silently strip 1 character, correct? > Further more, I do not think this is a bug as TEXT columns have > utf8_general_ci as collation, you can check that by invoking mysql > command line client with -T option and doing "SELECT * FROM > testtable". No, not quite, mysql -T says: > Collation: latin1_swedish_ci (8) This makes sense since I created the table explicitly with "charset=latin1". > I also suggest you make a table with both TEXT and VARCHAR fields in > it, insert some data and then try above in mysql cl client. In fact there is no data (not even trucated data) inserted into the text column when the exception occurs. If I use a table with 2 columns, varchar and text, I get the exception right away and no data is inserted at all. The command line client (the one I have) works nicely since it _only_ uses latin1 on all sides, there is no need to convert utf8 to latin1 for the text column and no error or warning occurs. My java program works just like the command line if I use latin1 for the mysql connection, but that is not a solution for two reasons: - it can't handle real utf8 tables with russian or chinese text any more - it can't handle latin1 characters 0x80 to 0x9f (that is a cp1252 extension to iso-8859-1) correctly, they get translated to question marks. Probably another bug, but not a big deal if utf8 works on all sides.
[20 Feb 2009 12:23]
Tonci Grgin
Till, ok. --- No, not quite, mysql -T says: > Collation: latin1_swedish_ci (8) This makes sense since I created the table explicitly with "charset=latin1". --- Now I really need to see output from show variables.
[20 Feb 2009 12:28]
Till Kahle
Sorry, I missed your 2nd posting. Here's the show variables output: show variables like '%CHAR%' character_set_client = utf8 character_set_connection = utf8 character_set_database = latin1 character_set_filesystem = binary character_set_results = character_set_server = latin1 character_set_system = utf8 character_sets_dir = /usr/share/mysql/charsets/
[20 Feb 2009 14:14]
Tonci Grgin
Test case for our test framework.
Attachment: TestBug43055.java (text/x-java), 3.50 KiB.
[20 Feb 2009 14:31]
Tonci Grgin
Ahh hmmm. I knew this will be a b.... as soon as I saw it. Let's see. You probably have "STRICT_TRANS_TABLES' in your SQL_MODE, that's causing truncation to become error instead of warning. Then, you have cset cli utf8, cset conn utf8, cset results NULL, but server/database/table/fields are all latin1. This leads to lossy conversion regarding encoding of certain utf8 chars to latin1. I suspect "french quote" is one of them. All in all, environment you set up is not suitable for what you want to do... Now for my test case. First I stumbled upon Bug#27475 and Bug#24886: Server version: 5.1.31 MySQL Community Server (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> create table test43055 (vcfld VARCHAR (50) NOT NULL PRIMARY KEY, txtfld TEXT) Engine=MyISAM charset latin1; Query OK, 0 rows affected (0.02 sec) mysql> select * from test43055; Field 1: `vcfld` Catalog: `def` Database: `test` Table: `test43055` Org_table: `test43055` Type: VAR_STRING Collation: latin1_swedish_ci (8) Length: 50 Max_length: 0 Decimals: 0 Flags: NOT_NULL PRI_KEY NO_DEFAULT_VALUE PART_KEY Field 2: `txtfld` Catalog: `def` Database: `test` Table: `test43055` Org_table: `test43055` Type: BLOB <<<<<<<<<<<<<<<< Collation: latin1_swedish_ci (8) Length: 65535 Max_length: 0 Decimals: 0 Flags: BLOB 0 rows in set (0.00 sec) and BLOB has no charset... Then I set up my environment as follows (so that *no* UTF8 -> Latin1 lossy conversion occurs): Server version: 5.1.31 MySQL Community Server (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> SHOW VARIABLES LIKE "%char%"; +--------------------------+----------------------------------+ | Variable_name | Value | +--------------------------+----------------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/mysql/share/charsets/ | +--------------------------+----------------------------------+ 8 rows in set (0.00 sec) and my test case inserted few rows of data into table: mysql> SELECT * FROM bug43055; Field 1: `vchar` Catalog: `def` Database: `test` Table: `bug43055` Org_table: `bug43055` Type: VAR_STRING Collation: latin1_swedish_ci (8) Length: 50 Max_length: 38 Decimals: 0 Flags: NOT_NULL PRI_KEY NO_DEFAULT_VALUE PART_KEY Field 2: `Class` Catalog: `def` Database: `test` Table: `bug43055` Org_table: `bug43055` Type: BLOB Collation: latin1_swedish_ci (8) Length: 65535 Max_length: 38 Decimals: 0 Flags: BLOB +----------------------------------------+----------------------------------------+ | vchar | Class | +----------------------------------------+----------------------------------------+ | First line text | First line text | | Second line text | Second line text | | This word is in french quotes: »E8300« | This word is in french quotes: »E8300« | +----------------------------------------+----------------------------------------+ 3 rows in set (0.00 sec) Finally, here's output of test case, completely correct: Loading JDBC driver 'com.mysql.jdbc.Driver' Done. Done. Connected to 5.1.31 java.vm.version : 1.5.0_17-b04 java.vm.vendor : Sun Microsystems Inc. java.runtime.version : 1.5.0_17-b04 os : Windows Server 2008, 6.0, x86 sun.management.compiler : HotSpot Client Compiler ------------------------------------------------- 1 .fld: vchar, MD: java.lang.String VARCHAR 12 2 .fld: Class, MD: java.lang.String VARCHAR -1 First line text First line text Second line text Second line text This word is in french quotes: »E8300« This word is in french quotes: »E8300« So, be ware the difference between what you want and what you do. This is !Bg on the basis of suspicion that UTF8 » character can not be converted to latin1. Please change tables/columns to be UTF-8 or cset connection/client to be latin1.
[20 Feb 2009 14:57]
Tonci Grgin
I stand partially corrected by Bar: mysql> select hex(convert(_utf8'»«' using latin1)); +----------------------------------------+ | hex(convert(_utf8'»«' using latin1)) | +----------------------------------------+ | BBAB | +----------------------------------------+ 1 row in set (0.00 sec) 0xBB - right double quote 0xAB - left double quote so I presume you used some other character code.
[20 Feb 2009 15:32]
Till Kahle
Thank you so much for your time and patience! > 0xBB - right double quote > 0xAB - left double quote > so I presume you used some other character code. I wish you were right, but that's exactly the codes I have in my java source code (the java source is in iso-8859-1). > So, be ware the difference between what you want and what you do. This is !Bg on the basis > of suspicion that UTF8 » character can not be converted to latin1. Well, these characters translate perfectly well to latin1, and this is what mysql does in the first part of the example. Keep in mind, this code works with varchar columns. This is not a case of lossy conversion, there is no loss, and there is no truncation. The value is completely rejected by mysql server 5.0.32, but only for text columns, not for varchar. From your testcase I would guess that the bug is on the server side and has been fixed somewhere between 5.0.32 and 5.1.31? Are you using the same Connector/J version 5.1.7?
[20 Feb 2009 15:47]
Tonci Grgin
Till, first clear SQL_MODE and retest. I think it'll work then. Next, Bar already proved the characters in question are transcoded correctly, so I'm -1 on that. It could be something in server too, but I'd start with SQL_MODE first. If it fails then I'd look for more recent server (which is a good idea anyway). Finally, I am using 5.1.8 as I build from bzr source repository. But I see no relevant changes in changelog. Inform me of your result.
[20 Feb 2009 16:40]
Till Kahle
You're right about SQL_MODE. I can't find any setting in my.cnf, on the command line I get: mysql> select @@GLOBAL.sql_mode, @@SESSION.sql_mode; +-------------------+--------------------+ | @@GLOBAL.sql_mode | @@SESSION.sql_mode | +-------------------+--------------------+ | | | +-------------------+--------------------+ In jdbc I get: global sql_mode: '', session sql_mode: 'STRICT_TRANS_TABLES' So calling "SET sql_mode = ''" helps, as does jdbcCompliantTruncation=false, but it changes database semantics to something we don't really want. If this is the only workaround, we'll use it until we have a server version where the bug is fixed and text columns behave like varchar again. I think I can now definitely say it is a server bug and it has been fixed: server 5.0.27 win32 - no bug server 5.0.32 debian - buggy server 5.0.37 win32 - buggy server 5.0.51a win32 - no bug You are definitely right about using a newer version of mysql, but it's so nice for an admin to use the debian packages...
[20 Feb 2009 16:56]
Tonci Grgin
:-) Thanks for your interest in MySQL.
[20 Feb 2009 17:04]
Till Kahle
I'll set this to "Colsed" since it has been fixed.