Bug #43819 | JDBC - Unsupported unicode character(s) | ||
---|---|---|---|
Submitted: | 24 Mar 2009 5:25 | Modified: | 21 Sep 2009 22:56 |
Reporter: | Phil Cross | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S1 (Critical) |
Version: | 5.1.7 | OS: | Linux (Ubuntu - Hardy Heron) |
Assigned to: | Mark Matthews | CPU Architecture: | Any |
Tags: | jdbc, question mark, Unicode |
[24 Mar 2009 5:25]
Phil Cross
[24 Mar 2009 5:28]
Phil Cross
BTW, I am using JDBC connector 5.1.7 and mysql version 5.0.51a-3ubuntu5.4 although I suspect that there are similar results with other versions of mysql because I have a BSD server with the same problem. I am pretty sure this is a JDBC driver issue (if not user error or lack of documentation).
[24 Mar 2009 8:47]
Tonci Grgin
Hi Phil and thanks for your report. According to http://www.fileformat.info/info/unicode/char/2192/index.htm, E28692 is UTF-8 and 2192 is Unicode for right arrow. Can you please try putting charset introducer in front of 2192 like this: SELECT (INSERT ...) _ucs2 0x2192 or use UTF8 representation E28692 and inform me of result. mysql> show variables like "%character%"; +--------------------------+----------------------------------+ | Variable_name | Value | +--------------------------+----------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /opt/mysql/mysql/share/charsets/ | +--------------------------+----------------------------------+ 8 rows in set (0.00 sec) mysql> SELECT _utf8 0xE28692; +----------------+ | _utf8 0xE28692 | +----------------+ | → | +----------------+ 1 row in set (0.03 sec) mysql> SELECT _ucs2 0x2192; +--------------+ | _ucs2 0x2192 | +--------------+ | → | +--------------+ 1 row in set (0.00 sec) Output of show variables like "%character%"; and show create table table_with_problem; might prove useful too. Please also check general server log at the time your c/J test starts.
[24 Mar 2009 15:11]
Phil Cross
I understand that the unicode character 2192 is 0xE28692 in utf-8. The exception in my original post shows that mysql (or the jdbc driver) is reporting that \xE2\x86\x92 is an invalid sequence. When I work with the command line as you describe in your message, everything works fine. It only fails when going through the JDBC driver. The problem is that the system I work with can only interact through jdbc, so I can't use a solution where I operate directly in the mysql client via command line. I have to have a programming solution in java or a new connector JAR file (which may or may not happen). I may try some previous jdbc drivers and see if the problem has always existed or if it is specific to this 5.1.7 release.
[24 Mar 2009 15:19]
Tonci Grgin
Phil, I do not understand your last post... Did you tried slapping charset introducer in front of your query (statement.execute("insert into test (id, str) values (1, _ucs2 '"+ testString + "')");)? Did you checked logs? Did you provided variables and structures as asked? I still do not see a bug here, just illegal mixing of charsets/collations resulting in proper error with expected UTF8 code for symbol.
[24 Mar 2009 15:40]
Phil Cross
Sorry, my log is empty. I changed the code in my example from: statement.execute("insert into test (id, str) values (1, '" + testString + "')"); to statement.execute("insert into test (id, str) values (1, _ucs2 '" + testString + "')"); and I still get output of: run: inserting: → selected: ? I also reverted the connection lines to: Connection connection = DriverManager.getConnection("jdbc:mysql://localhost/test", "test", "test"); //Connection connection = DriverManager.getConnection("jdbc:mysql://localhost/test?useUnicode=true&characterEncoding=UTF-8", "test", "test"); When I use the "useUnicode" parameter: //Connection connection = DriverManager.getConnection("jdbc:mysql://localhost/test", "test", "test"); Connection connection = DriverManager.getConnection("jdbc:mysql://localhost/test?useUnicode=true&characterEncoding=UTF-8", "test", "test"); I get a stranger exception (not sure how it came up with x86, x92): run: inserting: → java.sql.SQLException: Incorrect string value: '\x86\x92' for column 'str' at row 1 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3515) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3447) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1951) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2101) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2548) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2477) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:741) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:587) at mysqltest.Main.main(Main.java:17) Finally, you say that "I still do not see a bug here, just illegal mixing of charsets/collations resulting in proper error with expected UTF8 code for symbol." I disagree (and maybe I am misunderstanting the error message). It seems to me that if I have a string "\u2192" and I pass it to mysql and it says: java.sql.SQLException: Incorrect string value: '\xE2\x86\x92' for column 'str' at row 1 That sounds to me like a proper unicode to utf8 conversion occurred and mysql is rejecting it. I never told it about E28692 but it came up with it on its own (as it should have). So, why is it telling me that it's an "incorrect string value"?
[24 Mar 2009 16:09]
Tonci Grgin
A simple test shows this character is represented with more than 3 bytes in UTF-8 thus I can only suggest you to try MySQL server 6 as older versions have incomplete implementation for such long UTF8 streams... System.out.println("→".getBytes("utf-8").length); Can you test with MySQL server 6.0 and see if it works?
[24 Mar 2009 16:20]
Phil Cross
Actually, this character is exactly 3 bytes (not more than 3) does that lead to the same answer? If so, I can try mysql 6 but that won't be a quick test, so I want to verify with you. Code to verify the length (your code had an extra character in the previous post): System.out.println("\u2192".getBytes("utf-8").length); prints "3"
[24 Mar 2009 16:24]
Tonci Grgin
Phil, true, just saw it in e-mail. Verified as described.
[20 Aug 2009 19:53]
Mark Matthews
Looking at the bytes sent by the driver, the UTF-8 encoding is correct. It's the server that is complaining. I'm assigning this to the server, however it's easy to test locally by setting the sql_mode to 'strict_trans_tables' before you issue the insert in the CLI. I get the same error as the JDBC driver does when do that (with a relatively recent 5.0, server, but not 5.1). Without 'strict_trans_tables' enabled the server will still store the bytes, but not issue an error. The JDBC driver sets 'strict_trans_tables' because the behavior given in that mode is required by the JDBC API specification (exceptions on truncation).
[21 Aug 2009 22:07]
Peter Gulutzan
What's the result of select character_set_name from information_schema.columns where table_name='test'and column_name='str'; ?
[21 Sep 2009 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".