Bug #116120 Inappropriate charset selected for connection when jdk.charsets not included
Submitted: 16 Sep 12:02 Modified: 19 Sep 12:07
Reporter: Martin Sandiford Email Updates:
Status: Verified Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:9.0.0 OS:Any
Assigned to: CPU Architecture:Any

[16 Sep 12:02] Martin Sandiford
Description:
TL;DR: When the jdk.charsets module is not included when using jlink, Connector/J will select the "eucjpms" charset for the connection when a UTF-8 connection is requested.  I was expecting utf8mb4 to continue to be selected in this case.

There are a couple of choices in JDBC driver when mapping Java charsets to MySQL charsets that conspire to cause the driver to make a poor, arguably invalid, charset selection for a connection when not all charsets are included in a Java deployment.

The first issue is in CharsetMapping.java around line 739.  Here, if a suitable Java charset cannot be found for a MysqlCharset a mapping to UTF-8 is added if it is a multi-byte character set.  I don't believe this is sound in all cases, but in any case I would expect the potentially unsound mapping to be added at the lowest possible priority.

Link to code: https://github.com/mysql/mysql-connector-j/blob/e0e8e3461e5257ba4aa19e6b3614a2685b298947/s...

The second issue is in the resolution of the MySQL character set when UTF-8 is requested.  This happens in the getStaticMysqlCharsetForJavaEncoding(...) method in CharsetMapping.java at around line 549.  Note that the "eucjpms" charset has been bundled into the UTF-8 mappings along with the valid ones, but unlike the valid utf8mb3 and utf8mb4 charsets there is a minimum server value specified (5.0.3). This results in the charset having a higher precedence than either utf8mb3 or utf8mb4, and hence it is selected for the connection.

Link to definition of eucjpms charset: https://github.com/mysql/mysql-connector-j/blob/e0e8e3461e5257ba4aa19e6b3614a2685b298947/s...

Link to relevant code in getStaticMysqlCharsetForJavaEncoding: https://github.com/mysql/mysql-connector-j/blob/e0e8e3461e5257ba4aa19e6b3614a2685b298947/s...

How to repeat:
Can be reasonably easily reproduced with jdk 17.0.12 or 21.0.4 as below.  I have not tried with other jdk versions.

Create a new directory and download Connector / J 9.0.0 into that directory, perhaps from here: https://dev.mysql.com/downloads/connector/j/

Launch mysql in a docker container:

docker run --name test-mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=password -d mysql:9.0.1

Save the code below between the --snip-- lines into JdbcTest.java:

--snip--
import java.sql.DriverManager;

public class JdbcTest {
    public static void main(String[] args) {
        System.out.format("Java: %s%n", Runtime.version());
        final var url = "jdbc:mysql://127.0.0.1/mysql"
            + "?useUnicode=true"
            + "&characterEncoding=UTF-8"
            + "&useSSL=false"
            + "&allowPublicKeyRetrieval=true";
        final var user = "root";
        final var password = "password";
        final var query = "SELECT @@character_set_connection, @@collation_connection";
        System.out.format("Connection: %s%n", url);
        try (final var conn = DriverManager.getConnection(url, user, password)) {
            try (final var stmt = conn.createStatement()) {
                System.out.format("Query: %s%n", query);
                try (final var result = stmt.executeQuery(query)) {
                    final var haveRow = result.next();
                    if (haveRow) {
                        final var metadata = result.getMetaData();
                        final var columnCount = metadata.getColumnCount();
                        for (int i = 1; i <= columnCount; i++) {
                            final var name = metadata.getColumnName(i);
                            final var value = result.getString(i);
                            System.out.format("%s: %s%n", name, value);
                        }
                    } else {
                        System.out.println("Can't retrieve connection information.");
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace(System.err);
        }
    }
}
--snip--

Compile the file:

javac JdbcTest.java

Run jlink:

rm -rf output && jlink --add-modules "java.base,java.sql,java.naming" --output output

Launch the test app:

./output/bin/java -cp .:./mysql-connector-j-9.0.0.jar JdbcTest

Example output:

--snip--
Java: 17.0.12+0
Connection: jdbc:mysql://127.0.0.1/mysql?useUnicode=true&characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=true
Query: SELECT @@character_set_connection, @@collation_connection
@@character_set_connection: eucjpms
@@collation_connection: eucjpms_japanese_ci
--snip--

Note that including "java.charsets" in the modules to jlink will result in utf8mb4 being correctly selected as the connection charset.

Suggested fix:
In my opinion, the server version that supports a given charset should be a filtering criteria only.  It should not determine the precedence that charset is given when determining a suitable charset for a connection.

Given that fix, the driver should either:

Not default multi-byte charsets to be equivalent to UTF-8.

or:

If the UTF-8 equivalency is a risky but acceptable fallback, then the risky equivalency should be made lowest possible priority.
[18 Sep 9:09] MySQL Verification Team
Hello Martin Sandiford,

Thank you for the report and test case.
I quickly ran your test case but seeing results as expected. Am I missing anything? Do I need to set charsets etc at the server level etc.? Please let me know.

--
locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
--
-- MySQL Server 8.0.39

BugNumber=116134
rm -rf $BugNumber/
bin/mysqld --no-defaults --initialize-insecure --basedir=$PWD --datadir=$PWD/$BugNumber --log-error-verbosity=3
bin/mysqld_safe --no-defaults --mysqld-version='' --basedir=$PWD --datadir=$PWD/$BugNumber --core-file --socket=/tmp/mysql.sock  --port=3306 --log-error=$PWD/$BugNumber/log.err --mysqlx-port=33330 --mysqlx-socket=/tmp/mysql_x_ushastry.sock --log-error-verbosity=3  --secure-file-priv="" --local-infile=1  2>&1 &

bin/mysql -uroot -S/tmp/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 8.0.39 MySQL Community Server - GPL

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like '%version%';
+--------------------------+------------------------------+
| Variable_name            | Value                        |
+--------------------------+------------------------------+
| admin_tls_version        | TLSv1.2,TLSv1.3              |
| immediate_server_version | 999999                       |
| innodb_version           | 8.0.39                       |
| original_server_version  | 999999                       |
| protocol_version         | 10                           |
| replica_type_conversions |                              |
| slave_type_conversions   |                              |
| tls_version              | TLSv1.2,TLSv1.3              |
| version                  | 8.0.39                       |
| version_comment          | MySQL Community Server - GPL |
| version_compile_machine  | x86_64                       |
| version_compile_os       | Linux                        |
| version_compile_zlib     | 1.2.13                       |
+--------------------------+------------------------------+
13 rows in set (0.01 sec)

mysql>

--

javac -cp .:mysql-connector-j-9.0.0/mysql-connector-j-9.0.0.jar Bug116120.java
rm -rf output && jlink --add-modules "java.base,java.sql,java.naming" --output output

java -cp .:mysql-connector-j-9.0.0/mysql-connector-j-9.0.0.jar Bug116120
Java: 17.0.12+8-LTS-286
Connection: jdbc:mysql://******/mysql?useUnicode=true&characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=true
Query: SELECT @@character_set_connection, @@collation_connection
@@character_set_connection: utf8mb4
@@collation_connection: utf8mb4_0900_ai_ci

regards,
Umesh
[19 Sep 10:34] Martin Sandiford
Hi Umesh,

It looks like you have run the test with the system "java", rather than the java that jlink places in the output/bin folder.  You will need to use the ./output/bin/java executable to launch the test.

I don't think this is the only issue though—

I just tried on Linux (Ubuntu 20.4), and the test seems to be unreproducible with JDK 17 there.  I also tried with JDK 21, and can reproduce the problem with that—  if you are using Linux, you will need to use JDK 21.  

Previously I had been using MacOS, and it's present with both JDK 17 & 21 on that platform.

I've also just tried with server 8.0.39 that you are using and can reproduce the issue with that.
[19 Sep 11:33] MySQL Verification Team
Thank you, Martin. 
Let me re-run and get back to you if anything further needed.

Sincerely,
Umesh
[19 Sep 12:07] MySQL Verification Team
Thank you Martin.
I'm able to reproduce now.

regards,
Umesh
[19 Sep 12:12] MySQL Verification Team
--

-- Env
cat /etc/*release
Oracle Linux Server release 7.9
NAME="Oracle Linux Server"
VERSION="7.9"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME="Oracle Linux Server 7.9"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:7:9:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7"
ORACLE_BUGZILLA_PRODUCT_VERSION=7.9
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=7.9
Red Hat Enterprise Linux Server release 7.9 (Maipo)
Oracle Linux Server release 7.9

-- jdk 17

javac -cp .:mysql-connector-j-9.0.0/mysql-connector-j-9.0.0.jar Bug116120.java
rm -rf output && jlink --add-modules "java.base,java.sql,java.naming" --output output

output/bin/java -cp .:mysql-connector-j-9.0.0/mysql-connector-j-9.0.0.jar Bug116120
Java: 17.0.12+8-LTS-286
Connection: jdbc:mysql://******/mysql?useUnicode=true&characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=true
Query: SELECT @@character_set_connection, @@collation_connection
@@character_set_connection: utf8mb4
@@collation_connection: utf8mb4_0900_ai_ci

-- jdk 21

 javac -cp .:mysql-connector-j-9.0.0/mysql-connector-j-9.0.0.jar Bug116120.java
 rm -rf output && jlink --add-modules "java.base,java.sql,java.naming" --output output
 output/bin/java -cp .:mysql-connector-j-9.0.0/mysql-connector-j-9.0.0.jar Bug116120
Java: 21.0.4+8-LTS-274
Connection: jdbc:mysql://******/mysql?useUnicode=true&characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=true
Query: SELECT @@character_set_connection, @@collation_connection
@@character_set_connection: eucjpms
@@collation_connection: eucjpms_japanese_ci

-- jdk 23

 javac -cp .:mysql-connector-j-9.0.0/mysql-connector-j-9.0.0.jar Bug116120.java
 rm -rf output && jlink --add-modules "java.base,java.sql,java.naming" --output output
 output/bin/java -cp .:mysql-connector-j-9.0.0/mysql-connector-j-9.0.0.jar Bug116120
Java: 23+37-2369
Connection: jdbc:mysql://******/mysql?useUnicode=true&characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=true
Query: SELECT @@character_set_connection, @@collation_connection
@@character_set_connection: eucjpms
@@collation_connection: eucjpms_japanese_ci