Description:
Connector/J 5.1.48 and 5.1.49 can decode MySQL JSON column values with the connection encoding instead of the JSON field encoding when result metadata reports binary collation.
This corrupts JSON text when the connection uses a non-UTF-8 character set, e.g. characterEncoding=gbk.
Environment:
- MySQL Server 5.7.44
- Connector/J 5.1.49
- Also present in 5.1.48
- Regression baseline: 5.1.40.9
- Table default charset: gbk
- Column type: JSON
Why this appears to be a Connector/J bug:
Field.java already recognizes JSON as UTF-8:
if (this.mysqlType == MysqlDefs.FIELD_TYPE_JSON) {
this.encoding = "UTF-8";
}
ResultSetImpl.getStringInternal() later overrides this when collationIndex is binary:
String encoding = metadata.getCollationIndex() == CharsetMapping.MYSQL_COLLATION_INDEX_binary
? this.connection.getEncoding()
: metadata.getEncoding();
With characterEncoding=gbk, JSON UTF-8 bytes are decoded as GBK. This contradicts Connector/J's own JSON field encoding and regresses from 5.1.40.9.
MySQL Bug #95698 also notes JSON metadata can be returned as binary charset, but Connector/J has enough FIELD_TYPE_JSON information to preserve the JSON field encoding.
How to repeat:
CREATE DATABASE repro DEFAULT CHARACTER SET gbk COLLATE gbk_chinese_ci;
USE repro;
CREATE TABLE fare_detail_job (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
fare_prices JSON NOT NULL,
channel VARCHAR(64) CHARACTER SET gbk COLLATE gbk_chinese_ci NOT NULL
) DEFAULT CHARSET=gbk COLLATE=gbk_chinese_ci;
Insert with characterEncoding=utf8:
INSERT INTO fare_detail_job(fare_prices, channel)
VALUES ('{"code":"中文票价","route":"杭州-北京"}', 'IBE中文');
Read with Connector/J 5.1.49 and characterEncoding=gbk:
SELECT fare_prices, channel FROM fare_detail_job;
Actual:
channel: IBE中文
fare_prices: {"code": "涓枃绁ㄤ环", "route": "鏉窞-鍖椾含"}
Expected:
channel: IBE中文
fare_prices: {"code": "中文票价", "route": "杭州-北京"}
Verified matrix:
5.1.40.9 + characterEncoding=gbk -> GBK varchar OK, JSON OK
5.1.49 + characterEncoding=gbk -> GBK varchar OK, JSON mojibake
5.1.49 + characterEncoding=gbk;characterSetResults=gbk -> GBK-only JSON OK, but non-GBK characters become '?'
Suggested fix:
Preserve Field.getEncoding() for FIELD_TYPE_JSON before applying the binary-collation fallback.
private String getStringEncoding(Field field) throws SQLException {
if (field.getMysqlType() == MysqlDefs.FIELD_TYPE_JSON) {
return field.getEncoding();
}
return field.getCollationIndex() == CharsetMapping.MYSQL_COLLATION_INDEX_binary
? this.connection.getEncoding()
: field.getEncoding();
}
Use this helper in both string extraction paths in ResultSetImpl:
- extractStringFromNativeColumn(...)
- getStringInternal(...)
This keeps the historical binary-collation fallback for non-JSON expression/function results while preventing it from overriding the explicit JSON UTF-8 field encoding.
A patch against Connector/J 5.1.49 is available locally in my repro workspace:
upstream/mysql-connector-java-5.1.49-json-field-encoding.patch
Description: Connector/J 5.1.48 and 5.1.49 can decode MySQL JSON column values with the connection encoding instead of the JSON field encoding when result metadata reports binary collation. This corrupts JSON text when the connection uses a non-UTF-8 character set, e.g. characterEncoding=gbk. Environment: - MySQL Server 5.7.44 - Connector/J 5.1.49 - Also present in 5.1.48 - Regression baseline: 5.1.40.9 - Table default charset: gbk - Column type: JSON Why this appears to be a Connector/J bug: Field.java already recognizes JSON as UTF-8: if (this.mysqlType == MysqlDefs.FIELD_TYPE_JSON) { this.encoding = "UTF-8"; } ResultSetImpl.getStringInternal() later overrides this when collationIndex is binary: String encoding = metadata.getCollationIndex() == CharsetMapping.MYSQL_COLLATION_INDEX_binary ? this.connection.getEncoding() : metadata.getEncoding(); With characterEncoding=gbk, JSON UTF-8 bytes are decoded as GBK. This contradicts Connector/J's own JSON field encoding and regresses from 5.1.40.9. MySQL Bug #95698 also notes JSON metadata can be returned as binary charset, but Connector/J has enough FIELD_TYPE_JSON information to preserve the JSON field encoding. How to repeat: CREATE DATABASE repro DEFAULT CHARACTER SET gbk COLLATE gbk_chinese_ci; USE repro; CREATE TABLE fare_detail_job ( id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, fare_prices JSON NOT NULL, channel VARCHAR(64) CHARACTER SET gbk COLLATE gbk_chinese_ci NOT NULL ) DEFAULT CHARSET=gbk COLLATE=gbk_chinese_ci; Insert with characterEncoding=utf8: INSERT INTO fare_detail_job(fare_prices, channel) VALUES ('{"code":"中文票价","route":"杭州-北京"}', 'IBE中文'); Read with Connector/J 5.1.49 and characterEncoding=gbk: SELECT fare_prices, channel FROM fare_detail_job; Actual: channel: IBE中文 fare_prices: {"code": "涓枃绁ㄤ环", "route": "鏉窞-鍖椾含"} Expected: channel: IBE中文 fare_prices: {"code": "中文票价", "route": "杭州-北京"} Verified matrix: 5.1.40.9 + characterEncoding=gbk -> GBK varchar OK, JSON OK 5.1.49 + characterEncoding=gbk -> GBK varchar OK, JSON mojibake 5.1.49 + characterEncoding=gbk;characterSetResults=gbk -> GBK-only JSON OK, but non-GBK characters become '?' Suggested fix: Preserve Field.getEncoding() for FIELD_TYPE_JSON before applying the binary-collation fallback. private String getStringEncoding(Field field) throws SQLException { if (field.getMysqlType() == MysqlDefs.FIELD_TYPE_JSON) { return field.getEncoding(); } return field.getCollationIndex() == CharsetMapping.MYSQL_COLLATION_INDEX_binary ? this.connection.getEncoding() : field.getEncoding(); } Use this helper in both string extraction paths in ResultSetImpl: - extractStringFromNativeColumn(...) - getStringInternal(...) This keeps the historical binary-collation fallback for non-JSON expression/function results while preventing it from overriding the explicit JSON UTF-8 field encoding. A patch against Connector/J 5.1.49 is available locally in my repro workspace: upstream/mysql-connector-java-5.1.49-json-field-encoding.patch