Bug #120682 Connector/J decodes MySQL JSON columns with connection encoding when metadata collation is binary
Submitted: 14 Jun 9:23
Reporter: Junjie Zhang (OCA) Email Updates:
Status: Open Impact on me:
None 
Category:Connector / J Severity:S3 (Non-critical)
Version:Connector/J 5.1.49 MySQL Server 5.7.44 OS:Any
Assigned to: CPU Architecture:Any
Tags: charset, connector-j, gbk, json

[14 Jun 9:23] Junjie Zhang
Description:
Connector/J 5.1.48 and 5.1.49 can decode MySQL JSON column values with the connection encoding instead of the JSON field encoding when result metadata reports binary collation.

This corrupts JSON text when the connection uses a non-UTF-8 character set, e.g. characterEncoding=gbk.

Environment:
- MySQL Server 5.7.44
- Connector/J 5.1.49
- Also present in 5.1.48
- Regression baseline: 5.1.40.9
- Table default charset: gbk
- Column type: JSON

Why this appears to be a Connector/J bug:
Field.java already recognizes JSON as UTF-8:
    if (this.mysqlType == MysqlDefs.FIELD_TYPE_JSON) {
        this.encoding = "UTF-8";
    }

ResultSetImpl.getStringInternal() later overrides this when collationIndex is binary:
    String encoding = metadata.getCollationIndex() == CharsetMapping.MYSQL_COLLATION_INDEX_binary
        ? this.connection.getEncoding()
        : metadata.getEncoding();

With characterEncoding=gbk, JSON UTF-8 bytes are decoded as GBK. This contradicts Connector/J's own JSON field encoding and regresses from 5.1.40.9.

MySQL Bug #95698 also notes JSON metadata can be returned as binary charset, but Connector/J has enough FIELD_TYPE_JSON information to preserve the JSON field encoding.

How to repeat:
CREATE DATABASE repro DEFAULT CHARACTER SET gbk COLLATE gbk_chinese_ci;
USE repro;
CREATE TABLE fare_detail_job (
  id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  fare_prices JSON NOT NULL,
  channel VARCHAR(64) CHARACTER SET gbk COLLATE gbk_chinese_ci NOT NULL
) DEFAULT CHARSET=gbk COLLATE=gbk_chinese_ci;

Insert with characterEncoding=utf8:
INSERT INTO fare_detail_job(fare_prices, channel)
VALUES ('{"code":"中文票价","route":"杭州-北京"}', 'IBE中文');

Read with Connector/J 5.1.49 and characterEncoding=gbk:
SELECT fare_prices, channel FROM fare_detail_job;

Actual:
channel: IBE中文
fare_prices: {"code": "涓枃绁ㄤ环", "route": "鏉窞-鍖椾含"}

Expected:
channel: IBE中文
fare_prices: {"code": "中文票价", "route": "杭州-北京"}

Verified matrix:
5.1.40.9 + characterEncoding=gbk -> GBK varchar OK, JSON OK
5.1.49 + characterEncoding=gbk -> GBK varchar OK, JSON mojibake
5.1.49 + characterEncoding=gbk;characterSetResults=gbk -> GBK-only JSON OK, but non-GBK characters become '?'

Suggested fix:
Preserve Field.getEncoding() for FIELD_TYPE_JSON before applying the binary-collation fallback.

private String getStringEncoding(Field field) throws SQLException {
    if (field.getMysqlType() == MysqlDefs.FIELD_TYPE_JSON) {
        return field.getEncoding();
    }
    return field.getCollationIndex() == CharsetMapping.MYSQL_COLLATION_INDEX_binary
        ? this.connection.getEncoding()
        : field.getEncoding();
}

Use this helper in both string extraction paths in ResultSetImpl:
- extractStringFromNativeColumn(...)
- getStringInternal(...)

This keeps the historical binary-collation fallback for non-JSON expression/function results while preventing it from overriding the explicit JSON UTF-8 field encoding.

A patch against Connector/J 5.1.49 is available locally in my repro workspace:
upstream/mysql-connector-java-5.1.49-json-field-encoding.patch
[14 Jun 9:29] Junjie Zhang
Patch for Connector/J 5.1.49: preserve FIELD_TYPE_JSON field encoding when result metadata collation is binary, instead of falling back to the connection encoding.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: mysql-connector-java-5.1.49-json-field-encoding.patch (application/octet-stream, text), 1.43 KiB.