Bug #98136 TEXT err with utf8mb4_unicode_520_nopad_ci starting mysql-connector-java 5.1.41
Submitted: 7 Jan 0:01 Modified: 24 Feb 8:57
Reporter: Livio Cavallo Email Updates:
Status: Verified Impact on me:
Category:Connector / J Severity:S2 (Serious)
Version:5.1.41, 8.0.18 OS:Microsoft Windows (Win7/8/10)
Assigned to: CPU Architecture:Other (x64)
Tags: jdbc, mysql-connector-java, nopad, text, Unicode, utf8mb4_unicode_520_nopad_ci

[7 Jan 0:01] Livio Cavallo
If you store TEXT data with any accented vowuel, for instance à, in a table with collation utf8mb4_unicode_520_nopad_ci, the text is stored regularly (you can see it correctly with phpmyadmin) but if you read that same data back now that vowel if read as Ã.

Tested with mysql-connector-java 5.1.41, 5.1.42, 5.1.43, 5.1.48, 6.0.6, 8..0.7-dmr, 8.0.17, 8.0.18.

The problem is present in all JDK and JRE tested: Oracle JDK13, JDK J1.8 (Zulu Community), Java(TM) SE Runtime Environment (build 1.8.0_231-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.231-b11, mixed mode).

I am connecting to 10.2.30-MariaDB - MariaDB Server. I think the same problem will arise with different mySql version.

How to repeat:
- Create a table in a mySQL DB (10.2.30-MariaDB - MariaDB Server) with a TEXT column with collation and charset utf8mb4_unicode_520_nopad_ci

- Insert a recordset containing 'à' in that column.

- Connect to DB via java jdbc, using mysql-connector-java version 5.1.41.

- Read that recordset

- that column will show in Java as a wrong 'Ã' instead of the correct 'à'

Suggested fix:
The problem can be avoided in two ways:
- using PAD charsets and collations (utf8mb4_unicode_520_ci)
- using mysql-connector-java version previous to 5.1.41. I tested with success with ver. 5.1.40 and 5.1.38.

These are not fixes; these are workarounds.

I do not know how to really fix this problem.
[7 Jan 8:41] Umesh Shastry
Hello Livio Cavallo,

Thank you for the report.

[24 Feb 8:57] Livio Cavallo
I detected the same problem in win 7, 8 and 10