Bug #81196 | Connector/J Not Following Database character set | ||
---|---|---|---|
Submitted: | 26 Apr 2016 1:40 | Modified: | 9 Aug 2018 14:50 |
Reporter: | Jason Hunter | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Connector / J | Severity: | S2 (Serious) |
Version: | 5.1.38+ | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | utf8mb4 |
[26 Apr 2016 1:40]
Jason Hunter
[29 Apr 2016 11:50]
Chiranjeevi Battula
Hello Jason Hunter, Thank you for the bug report. I could not repeat the issue at our end using with "SET NAMES" and Connector / J 5.1.38. Please see attached sample test case. Thanks, Chiranjeevi.
[29 Apr 2016 11:51]
Chiranjeevi Battula
test case
Attachment: 81196.java (text/x-java), 1.01 KiB.
[29 Apr 2016 17:02]
Jason Hunter
We are using the "org.apache.commons.dbcp.BasicDataSource" class for our DB connection pool which has a method to execute a statement after connection, but adding the "SET NAMES" call had no effect. The warning at the bottom of https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-charsets.html says not to use the SET NAMES on the connector, and adding a SET NAMES request to execute before any database calls is not a scalable solution. The line "boolean useutf8mb4 = utf8mb4Supported && (CharsetMapping.UTF8MB4_INDEXES.contains(this.io.serverCharsetIndex));" in ConnectionImpl.java (line ~1767) uses the default server charset to determine whether utf8mb4 should be used. So it won't matter what the column, table, or database is setup to support utf8mb4 if the server is not set to also support it. If a connection is to a particular DB then it should default to the charset supported by that DB and not the global server charset. If I add com.mysql.jdbc.faultInjection.serverCharsetIndex=224 to myt jdbc url then everything works fine because of this code for testing in ConnectionImpl.java (line ~1702): if (this.props != null && this.props.getProperty("com.mysql.jdbc.faultInjection.serverCharsetIndex") != null) { this.io.serverCharsetIndex = Integer.parseInt(this.props.getProperty("com.mysql.jdbc.faultInjection.serverCharsetIndex")); } Is there a way to get the charset on a database that could be added to the doHandshake method of MysqlIO.java that could supplement the serverCharsetIndex (like dbCharsetIndex) and then used in the ConnectionImpl to determine whether to support utf8mb4.
[29 Apr 2016 18:27]
Mark Matthews
It appears more that there's some bug detecting whether UTF8MB4 can be used by the connection. It's not workable for the driver to automatically detect per-database character sets because there are many cases where there are multiple schemas referenced in the same SQL statement. The correct workaround is to do what you've stated, which is force the encoding with "characterEncoding=". If that's not working because UTF8MB4 can't be detected in your case, then that is the actual bug.
[29 Apr 2016 19:02]
Jason Hunter
So the current logic assumes that even though the jdbc url is requesting utf-8 since the server isn't set to a character set that supports the utf8mb4 and since an SQL statement can reference multiple schemas then it will use utf8 instead of utf8mb4. Debugging through the driver code it correctly detects that we want utf8mb4 (given the db version and the characterEncoding=utf8) but then pushes us to non-mb4 cause the server doesn't globally support mb4 (in our case latin1). Hence it would seem the only workaround for the situation where the DB and server character encodings are not "compatible" would be to either add a characterEncoding type of utf8mb4 which would force the encoding no matter the server encoding or add some other parameter that would force using utf8mb4.
[4 May 2016 12:11]
Alexander Soklakov
Hi Jason, We can't change the current logic of forcing the encoding with "characterEncoding=UTF-8" to utf8mb4 because we could affect many existing applications. But your suggestion about additional parameter is correct, we need a way to set utf8bm4 explicitly and "characterEncoding" should accept only Java encoding names. Actually, we have a good candidate, the "connectionCollation" option you already tried, so the patch could look like: === diff --git a/src/com/mysql/jdbc/ConnectionImpl.java b/src/com/mysql/jdbc/ConnectionImpl.java index 9da30ea..854ae59 100644 --- a/src/com/mysql/jdbc/ConnectionImpl.java +++ b/src/com/mysql/jdbc/ConnectionImpl.java @@ -1762,7 +1762,8 @@ // charset names are case-sensitive boolean utf8mb4Supported = versionMeetsMinimum(5, 5, 2); - boolean useutf8mb4 = utf8mb4Supported && (CharsetMapping.UTF8MB4_INDEXES.contains(this.io.serverCharsetIndex)); + boolean useutf8mb4 = utf8mb4Supported && (CharsetMapping.UTF8MB4_INDEXES.contains(this.io.serverCharsetIndex) + || (getConnectionCollation() != null && StringUtils.startsWithIgnoreCase(getConnectionCollation(), "utf8mb4"))); if (!getUseOldUTF8Behavior()) { if (dontCheckServerMatch || !characterSetNamesMatches("utf8") || (utf8mb4Supported && !characterSetNamesMatches("utf8mb4"))) { === I.e. with this change we should specify both characterEncoding=utf8 and connectionCollation=utf8mb4_unicode_ci to have SET NAMES issued.
[4 May 2016 13:35]
Chiranjeevi Battula
http://bugs.mysql.com/bug.php?id=76889 marked as duplicate of this one.
[16 May 2016 4:47]
monty solomon
We experienced a similar issue when the server character set was set to utf8. We updated the settings in my.cnf to use utf8mb4. [client] loose-default-character-set = utf8mb4 [mysqld] character-set-server = utf8mb4
[20 Dec 2016 14:13]
Oliver Behncke
We witness the same problem. The database charset utf8mb4 is ignored in a jdbc connection. Is there any news on this topic?
[14 Jun 2018 13:26]
Eivind Larsen
Are there any news on this? I seem unable to give any hint to the client that it should use utf8mb4 over utf8 even when the database (scheme) and all the tables are set to utf8. The SET NAMES workaround proposed here does not work for the jdbc driver as explained in the docs: > Warning > Do not issue the query set names with Connector/J, as the driver will not detect that the > character set has changed, and will continue to use the character set detected during the > initial connection setup.
[14 Jun 2018 13:27]
Eivind Larsen
> even when the database (scheme) and all the tables are set to utf8. I of course meant 'even when the database (scheme) and all the tables are set to utf8mb', not utf8.
[9 Aug 2018 14:50]
Daniel So
Posted by developer: Added the following entry to the Connector/J 5.1.47 and 8.0.13 changelogs: "The value UTF-8 for the connection property characterEncoding now maps to the utf8mb4 character set on the server and, for MySQL Server 5.5.2 and later, characterEncoding=UTF-8 can now be used to set the connection character set to utf8mb4 even if character_set_server has been set to something else on the server. (Before this change, the server must have character_set_server=utf8mb4 for Connector/J to use that character set.) Also, if the connection property connectionCollation is also set and is incompatible with the value of characterEncoding, characterEncoding will be overridden with the encoding corresponding to connectionCollation. See Using Character Sets and Unicode for details, including how to use the utf8mb3 character set now for connection."
[27 Sep 2018 23:41]
Tianshi Wang
When will 8.0.13 be released?