Bug #80631 ResultSet.getString return garbled result with json type data
Submitted: 7 Mar 2016 8:01 Modified: 14 Sep 2016 1:51
Reporter: Tianming Yi Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:5.1.38, 5.1.39, 6.0.3 OS:Any
Assigned to: Filipe Silva CPU Architecture:Any
Tags: Connector/J, garbled, json, utf8

[7 Mar 2016 8:01] Tianming Yi
Description:
I stored Chinese character in json type column. Data stored in mysql is correct encoded with utf-8. But in ResultSet.getString(), the result is garbled. In contract, Chinese character stored in text type column is not garbled.
I tried ResultSet.getBytes() then new String(bytes, "utf-8"), the string is correct encoded.

How to repeat:
Store some Chinese character, like 中国 in json type column.
get connection with jdbc driver, execute query, ResultSet rs = pstmt.executeQuery(), then print rs.getString(columnName).
The result is like:
{"id": "������"}	
And it should be:
{"id": "中文"}
[8 Mar 2016 9:44] Chiranjeevi Battula
Hello Tianming Yi,

Thank you for the bug report.
Verified this behavior on MySQL Connector / J 5.1.38.

Thanks,
Chiranjeevi.
[8 Mar 2016 9:45] Chiranjeevi Battula
output:
run:
{"key2": "value2", "郜大超": "value1"}
{"key4": "value4", "中国": "value3"}
BUILD SUCCESSFUL (total time: 0 seconds)
[15 Mar 2016 18:25] Filipe Silva
Hi Tianming Yi,

Can you provide some more info about your environment:
- Connection string properties?
- What's the server version you are using?
- Server encoding? Database encoding? Client encoding? (If in doubt, can you just paste the result of "show variables like 'character_set%'"?)

Have you tried setting the connection property "characterEncoding=UTF-8"?
[27 Mar 2016 7:02] Vladimir V
Hi! I got the same problem, settings does not help.

The correct header for this bug is "JSON does not allow Unicode characters"
As JSON is binary type - it is not known how to store unicode inside it.

Here is my test table:
CREATE TABLE utilJSON (
	field1 JSON
) CHARACTER SET utf8, ENGINE=INNODB;

The Connector / J uses "iso-8859-1" so no unicode characters:

// inside statement.executeQuery("select field1 from utilJSON")
// com/mysql/jdbc/MysqlIO.java:
short charSetNumber = (short) packet.readInt();
we got charSetNumber = 63

// Lets look at com/mysql/jdbc/CharsetMapping.java
// 63 means binary:
collation[63] = new Collation(63, "binary", 1, MYSQL_CHARSET_NAME_binary);
//and mysql binary means ISO8859_1 java
new MysqlCharset(MYSQL_CHARSET_NAME_binary, 1, 1, new String[] { "ISO8859_1" }),

----------
//temp workaround, you can read value like this:
new String(fieldVal.getBytes("iso-8859-1"))
[27 Mar 2016 9:25] Vladimir V
Found some documentation
https://dev.mysql.com/doc/refman/5.7/en/json.html
"MySQL handles strings used in JSON context using the utf8mb4 character set and utf8mb4_bin collation. "
So unicode is ok for JSON field.

j/connector have to use utf8mb4_bin instead if iso-8859-1 for binary data in json field.
[25 Apr 2016 7:39] Chiranjeevi Battula
http://bugs.mysql.com/bug.php?id=81188 marked as duplicate of this one.
[29 Apr 2016 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[13 May 2016 10:33] Hitoshi Asai
I can reproduce it in another setting for Japanese, my environments are following,

- Connection string properties?
    - jdbc:mysql://127.0.0.1:3306/test?characterEncoding=UTF-8&useSSL=false
- What's the server version you are using?
    - MySQL 5.7.12 (Installed from MySQL APT repo) / Ubuntu 16.04
- Server encoding? Database encoding? Client encoding? (If in doubt, can you just paste the result of "show variables like 'character_set%'"?)
    - Server locale is "ja_JP.UTF-8" and the result of "show variables.." is

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

repro steps:

CREATE DATABASE test CHARACTER SET utf8;

USE test
CREATE TABLE json_test (
  id int PRIMARY KEY NOT NULL,
  varchar_column varchar(100) NOT NULL,
  json_column json NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO json_test (id, varchar_column, json_column) VALUES (1, '中文', '{"id": "中文"}');
INSERT INTO json_test (id, varchar_column, json_column) VALUES (2, '中文', '{"id": "Chinese"}');
INSERT INTO json_test (id, varchar_column, json_column) VALUES (3, 'Chinese', '{"id": "中文"}');
INSERT INTO json_test (id, varchar_column, json_column) VALUES (4, 'Chinese', '{"id": "Chinese"}');

** MySQL Client shows '中文' correctly **
mysql> select * from json_test;
+----+----------------+-------------------+
| id | varchar_column | json_column       |
+----+----------------+-------------------+
|  1 | 中文           | {"id": "中文"}    |
|  2 | 中文           | {"id": "Chinese"} |
|  3 | Chinese        | {"id": "中文"}    |
|  4 | Chinese        | {"id": "Chinese"} |
+----+----------------+-------------------+

** Connector/J (mysql-connector-java-5.1.39.jar, Oracle JDK 8u92) is not **
source:
  import java.sql.*;
  
  public class Main {
      public static void main(String[] args) throws ClassNotFoundException, SQLException {
          Class.forName("com.mysql.jdbc.Driver");
          Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/test?characterEncoding=UTF-8&useSSL=false", "user", "password");
          Statement stmt = conn.createStatement();
          ResultSet rset = stmt.executeQuery("SELECT * FROM json_test");
          while (rset.next()) {
              System.out.println(rset.getInt(1) + "\t" + rset.getString(2) + "\t" + rset.getString(3));
          }
          rset.close();
          stmt.close();
          conn.close();
      }
  }

result:
  1       中文    {"id": "中æ‡"}
  2       中文    {"id": "Chinese"}
  3       Chinese {"id": "中æ‡"}
  4       Chinese {"id": "Chinese"}
(varchar column displayed correctly, json column always corrupted)
[17 May 2016 10:01] Umesh Shastry
-- with the latest test case from earlier note

[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.12: cat Main.java
import java.sql.*;

  public class Main {
      public static void main(String[] args) throws ClassNotFoundException, SQLException {
          Class.forName("com.mysql.jdbc.Driver");
          Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:15000/test?characterEncoding=UTF-8&useSSL=false", "root", "");
          Statement stmt = conn.createStatement();
          ResultSet rset = stmt.executeQuery("SELECT * FROM json_test");
          while (rset.next()) {
              System.out.println(rset.getInt(1) + "\t" + rset.getString(2) + "\t" + rset.getString(3));
          }
          rset.close();
          stmt.close();
          conn.close();
      }
  }

[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.12: bin/mysql -uroot -S /tmp/mysql_ushastry.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.12-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test
Database changed
mysql> CREATE TABLE json_test (
    ->   id int PRIMARY KEY NOT NULL,
    ->   varchar_column varchar(100) NOT NULL,
    ->   json_column json NOT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql>
mysql> INSERT INTO json_test (id, varchar_column, json_column) VALUES (1, '中文', '{"id": "中文"}');
INSERT INTO json_test (id, varchar_column, json_column) VALUES (2, '中文', '{"id": "Chinese"}');
INSERT INTO json_test (id, varchar_column, json_column) VALUES (3, 'Chinese', '{"id": "中文"}');
INSERT INTO json_test (id, varchar_column, json_column) VALUES (4, 'Chinese', '{"id": "Chinese"}');
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO json_test (id, varchar_column, json_column) VALUES (2, '中文', '{"id": "Chinese"}');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO json_test (id, varchar_column, json_column) VALUES (3, 'Chinese', '{"id": "中文"}');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO json_test (id, varchar_column, json_column) VALUES (4, 'Chinese', '{"id": "Chinese"}');
Query OK, 1 row affected (0.00 sec)

mysql>
mysql> select * from json_test;
+----+----------------+-------------------+
| id | varchar_column | json_column       |
+----+----------------+-------------------+
|  1 | 中文           | {"id": "中文"}    |
|  2 | 中文           | {"id": "Chinese"} |
|  3 | Chinese        | {"id": "中文"}    |
|  4 | Chinese        | {"id": "Chinese"} |
+----+----------------+-------------------+
4 rows in set (0.00 sec)

mysql> \q
Bye
[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.12: vi Main.java
[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.12: javac -cp '.:/home/umshastr/bugs/mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar' Main.java
[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.12: java -cp '.:/home/umshastr/bugs/mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar' Main
1       中文    {"id": "中æ��"}
2       中文    {"id": "Chinese"}
3       Chinese {"id": "中�"}
4       Chinese {"id": "Chinese"}
[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.12:
[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.12: javac -cp '.:/home/umshastr/bugs/mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar' Main.java
[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.12: java -cp '.:/home/umshastr/bugs/mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar' Main
1       中文    {"id": "中æ��"}
2       中文    {"id": "Chinese"}
3       Chinese {"id": "中�"}
4       Chinese {"id": "Chinese"}
[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.12:
[24 May 2016 2:23] Guangpu Feng
Hi, Vladimir V

Thanks for your workaround, it helps.

  //temp workaround, you can read value like this:
  new String(fieldVal.getBytes("iso-8859-1"))

I think this is a serious bug that should be fixed, the above code should be fixed back when bugfix released in later version.
[23 Jun 2016 8:28] zhou yanming
I can reproduce this issue with version 5.1.39, please fix it asap.
[23 Jun 2016 12:21] OCA Admin
Contribution submitted via Github - //Fix for bug #80631 Any ResultSet.getString return garbled result wi… 
(*) Contribution by Dong SongLing (Github neil4dong, mysql-connector-j/pull/9#issuecomment-227795293): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_70847180.txt (text/plain), 8.41 KiB.

[27 Jun 2016 0:56] tsubasa tanaka
This affects Connector/J 6.0.3-m2 too(please add 6.0.3 to effected version)
[23 Aug 2016 7:10] Leif Bladt
Any news on this?
[24 Aug 2016 8:25] Leif Bladt
I can confirm, that the patch works at least for german umlaut characters.
[14 Sep 2016 1:51] Daniel So
Posted by developer:
 
Added the following entry to the Connector/J 5.1.40 changelog: 

"ResultSet.getString() sometimes returned garbled data for columns of the JSON data type. This was because JSON data was binary encoded by MySQL using the utf8mb4 character set, but decoded by Connector/J using the ISO-8859-1 character set. This patch fixes the decoding for JSON data. Thanks to Dong Song Ling for contributing the patch. "
[30 Sep 2016 16:42] Daniel So
Posted by developer:
 
Some minors edits to the changelog entry:

"ResultSet.getString() sometimes returned garbled data for columns of the
JSON data type. This was because JSON data was binary encoded by MySQL using
the utf8mb4 character set, but decoded by Connector/J using the ISO-8859-1
character set. This patch fixes the decoding for JSON data. Thanks to Dong
Song Ling for contributing to the fix."
[14 Oct 2016 16:02] Daniel So
The fix has been included in Connector/J 6.0.5. The entry for the 5.1.40
changelog has been included into the 6.0.5 changelog.