Bug #46688 | error message contains incorrect unicode key data | ||
---|---|---|---|
Submitted: | 13 Aug 2009 5:54 | Modified: | 14 Aug 2009 5:32 |
Reporter: | Neil Bacon | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | Connector / J | Severity: | S2 (Serious) |
Version: | 5.0.5, 5.1.6 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | error message unicode utf8 |
[13 Aug 2009 5:54]
Neil Bacon
[13 Aug 2009 6:39]
Mark Matthews
The issue is that mysql doesn't actually support unicode in its error messages. This works with the mysql client because it is charset unaware, your terminal is interpreting the "generic" bytes that get spit back by mysqld as UTF-8. The JDBC driver isn't so lucky. It has to pick a character set to go from byte[] to char[] so that it can make java.lang.Strings. The only way for it to do this for error messages is to look at the server variable "language" to determine what character encoding is in use. Unfortunately, none of them are utf-8. There are plans for making all error messages in the server sent via UTF-8. When that happens, the driver will support this. There might be a chance for someone to hack in support to always treat error messages as utf-8, even though they aren't, but at the moment it is not on any roadmap.
[14 Aug 2009 5:16]
Neil Bacon
patch to use characterEncoding url param for error messages
Attachment: com.mysql.jdbc.CharsetMapping.java.patch (text/x-patch), 663 bytes.
[14 Aug 2009 5:32]
Neil Bacon
The driver uses the following heuristic to guess the encoding for error messages: get "language" property from server map to character set: "english" -> "latin1" map to java encoding: "latin1" -> "Cp1252" (using hard coded mappings). The patch I've just attached will use the url parameter "characterEncoding", if specified, in preference to the above. A new parameter could be added for this purpose - you've got to let the user override this heuristic somehow. Now the error message is useful. Isn't open source sweet. Cheers, Neil.