| Bug #46688 | error message contains incorrect unicode key data | ||
|---|---|---|---|
| Submitted: | 13 Aug 2009 5:54 | Modified: | 14 Aug 2009 5:32 |
| Reporter: | Neil Bacon | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | Connector / J | Severity: | S2 (Serious) |
| Version: | 5.0.5, 5.1.6 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | error message unicode utf8 | ||
[13 Aug 2009 6:39]
Mark Matthews
The issue is that mysql doesn't actually support unicode in its error messages. This works with the mysql client because it is charset unaware, your terminal is interpreting the "generic" bytes that get spit back by mysqld as UTF-8. The JDBC driver isn't so lucky. It has to pick a character set to go from byte[] to char[] so that it can make java.lang.Strings. The only way for it to do this for error messages is to look at the server variable "language" to determine what character encoding is in use. Unfortunately, none of them are utf-8. There are plans for making all error messages in the server sent via UTF-8. When that happens, the driver will support this. There might be a chance for someone to hack in support to always treat error messages as utf-8, even though they aren't, but at the moment it is not on any roadmap.
[14 Aug 2009 5:16]
Neil Bacon
patch to use characterEncoding url param for error messages
Attachment: com.mysql.jdbc.CharsetMapping.java.patch (text/x-patch), 663 bytes.
[14 Aug 2009 5:32]
Neil Bacon
The driver uses the following heuristic to guess the encoding for error messages:
get "language" property from server
map to character set: "english" -> "latin1"
map to java encoding: "latin1" -> "Cp1252"
(using hard coded mappings).
The patch I've just attached will use the url parameter "characterEncoding", if specified, in preference to the above. A new parameter could be added for this purpose - you've got to let the user override this heuristic somehow.
Now the error message is useful. Isn't open source sweet.
Cheers,
Neil.

Description: On attempting to insert a row with a duplicate primary key, the error message contains incorrect data for the key. This occurs when saving a large object graph using Hibernate, so the correct data is essential in sorting out the problem. mysql client/server: 5.1.31-1ubuntu2 (Ubuntu) How to repeat: Using mysql command line client: create database charSetTest DEFAULT CHARACTER SET utf8 COLLATE utf8_bin; use charSetTest; create table t (v varchar(20), t text, primary key ( v )); set character_set_client=utf8; set character_set_connection=utf8; set character_set_results=utf8; set character_set_server=utf8; show variables like 'char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ insert into t (v, t) values ('中文', '中文'); select * from t; +--------+--------+ | v | t | +--------+--------+ | 中文 | 中文 | +--------+--------+ 1 row in set (0.00 sec) insert into t (v, t) values ('中文', '中文'); ERROR 1062 (23000): Duplicate entry '中文' for key 'PRIMARY' OK so far, now show the problem with JDBC: jdbc url: jdbc:mysql://localhost/charSetTest?characterEncoding=UTF-8&characterSetResults=UTF-8 set character_set_server=utf8; show variables like 'char%'; Variable_name Value character_set_client utf8 character_set_connection utf8 character_set_database utf8 character_set_filesystem binary character_set_results utf8 character_set_server utf8 character_set_system utf8 character_sets_dir /usr/share/mysql/charsets/ select * from t; v t 中文 中文 insert into t (v, t) values ('中文', '中文'); Error: Duplicate entry '䏿–‡' for key 'PRIMARY' SQLState: 23000 ErrorCode: 1062 Here the data within the quotes should have been '中文', as it is with the mysql command line client.