Bug #100606 Unecesary call to "SET NAMES 'utf8' COLLATE 'utf8_general_ci'"
Submitted: 21 Aug 2020 16:01 Modified: 10 Jun 21:13
Reporter: Marc Fletcher (OCA) Email Updates:
Status: Closed Impact on me:
Category:Connector / J Severity:S2 (Serious)
Version:8.0.21 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[21 Aug 2020 16:01] Marc Fletcher
When a connection is retrieved from a connection pool, e.g. DBCP2, it is validated to make sure it's state matches that of the underlying MySQL server.

During this validation NativeSession.configureClientCharacterSet is executed.

It is observed that even if the server defaults are configured correctly a call to SET NAMES [utf8CharsetNAme] COLLATE [connectionCollation] ensues which can have a serious impact on overall performance due to the unavoidable network communication. We've observed over a 100% performance degradation due to these calls against MysqL5.7 (using a patched driver that doesn't perform this operation).

The incorrect logic takes place on Lines 471-473 of https://github.com/mysql/mysql-connector-j/blob/33f65445a1bcc544eb0120491926484da168f199/s....

More specifically there are two evaluations here that are mutually exclusive, one of which will ALWAYS evaluate to TRUE:

1. !this.protocol.getServerSession().characterSetNamesMatches("utf8")
OR (||)
2. (!this.protocol.getServerSession().characterSetNamesMatches("utf8mb4"))

How to repeat:
1. Setup a MySQL server and ensure the following parameters are set:

character_set_client = utf8
character_set_connection = utf8
character_set_results = utf8
character_set_server = utf8
collation_server = utf8_general_ci
collation_connection = utf8_general_ci

2. Get a connection using the mysql-connector-j driver (version 8.0.21).

Observe that the driver still executes the following command which is unecessary:

SET NAMES 'utf8' COLLATE 'utf8_general_ci'

Suggested fix:
Replace lines 471-473 with the following:

boolean isCharacterSetNotUTF8 =
    || this.protocol.getServerSession().characterSetNamesMatches("utf8mb4"));

String serverCollation = this.protocol.getServerSession().getServerVariable(
boolean isCollationDifferent = connectionCollationSuffix.length() > 0
        && !connectionCollation.equalsIgnoreCase(serverCollation);

if (dontCheckServerMatch || isCharacterSetNotUTF8 || isCollationDifferent) {
[21 Aug 2020 16:11] Marc Fletcher
PR raised to expedite the merging of this into the next release - https://github.com/mysql/mysql-connector-j/pull/54
[28 Aug 2020 9:42] MySQL Verification Team
Hello Marc,

Thank you for the report and contribution.

[29 Sep 2020 18:40] OCA Admin
Contribution submitted via Github - Bug #100606 - Unecesary call to "SET NAMES 'utf8' COLLATE 'utf8_general_ci'" 
(*) Contribution by Marc Fletcher (Github MarcFletcher, mysql-connector-j/pull/54): Please see the bug report available at - https://bugs.mysql.com/bug.php?id=100606

The OCA has been completed and I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_471721305.txt (text/plain), 2.21 KiB.

[4 Mar 20:13] Frederic Descamps
Hi Marc, 

My name is Frédéric Descamps (@lefred), Community Manager and I would like to let you know that our development team is processing your contribution.

Thank you for using and helping MySQL.

[10 Jun 21:13] Daniel So
Posted by developer:
Added the following entry to the Connector/J 8.0.26 changelog:

"Connector/J issued unnecessary SET NAMES statements to set collation just to the server's own setting, which caused performance degradation. To fix this and other problems, a new mechanism for negotiating the character set and collation for communication between the server and Connector/J has been introduced; see Using Character Sets and Unicode for details. Thanks to Marc Fletcher for contributing to the fix."
[28 Jul 19:19] Brett Kail
This change appears to cause a similar bug to https://bugs.mysql.com/bug.php?id=95139 , which reports that cacheServerConfiguration=true (as enabled by useConfigs=maxPerformance) breaks encoding handling.

Specifically, this logic:
...causes all sessions to share the same server configuration HashMap, and then this logic:
...means that the "SET character_set_results = NULL" command is executed for the first connection to a URL in a JVM, but the .put() prevents that command from being executed for second and subsequent connections to the same URL.

The same shared server configuration map exists in 8.0.25, but this logic:
...is setting a different (local.character_set_results) variable than it checks (character_set_results), so in practice it always runs.

Perhaps the NativeSession.loadServerVariables logic should cache an unmodifiable map and create a mutable copy before calling setServerVariables?  In the meantime, we have disabled cacheServerConfiguration=false (the improvements from this bug means we no longer have the per-connection "SET NAMES" but now have a per-connection "SELECT").