Bug #73663 utf8mb4 does not work for connector/j >=5.1.13
Submitted: 21 Aug 2014 2:24 Modified: 12 Sep 2014 15:36
Reporter: 0 0 Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:5.1.32 OS:Linux
Assigned to: Alexander Soklakov CPU Architecture:Any

[21 Aug 2014 2:24] 0 0
Description:
I connect to RDS MySQL 5.6 (with character_set_server=utf8mb4 set to utf8mb4) with the latest connect/j (5.1.32)
I have also append "&useUnicode=true&characterEncoding=UTF-8" in JDBC URL

I have a quick debug on it.

debugger screenshot:
http://oi60.tinypic.com/1pj8xw.jpg

In this case, this.io.serverCharsetIndex return 46 (not the non-updated screenshot show 33)

it does not match serverCharsetIndex == 45 and hence fail to enable utf8mb4.

Could you take a look if anything wrong here?
Thanks.

How to repeat:
- connect to RDS MySQL 5.6 (with character_set_server=utf8mb4 set to utf8mb4) with the latest connect/j (5.1.32), "&useUnicode=true&characterEncoding=UTF-8" appended in JDBC URL

- test query with utf8mb4 char

Suggested fix:
the checking should be serverCharsetIndex >= 45 and anything else, if my guess is correct
[21 Aug 2014 2:26] 0 0
correction: only work for 5.1.13, and does not work for 5.1.32
[21 Aug 2014 2:26] 0 0
screenshot of debugger

Attachment: Screen Shot 2014-08-20 at 2.34.49 pm copy.png (image/png, text), 440.98 KiB.

[21 Aug 2014 7:39] Alexander Soklakov
Hi,

Thank you for the report.

The situation you describe exists if server collation set to non-default value. We need to include all possible utf8mb4 collation ids to this check, ie 45,46 and 224..247

As a workaround for current version you could use collation-server=utf8mb4_general_ci, if it's appropriate, or send SET NAMES utf8mb4 before data exchange.
[12 Sep 2014 15:36] Daniel So
Added the following entry to the Connector/J 5.1.33 changelog:

"The 4-byte UTF8 (utfbmb4) character encoding could not be used with Connector/J when the server collation was set to anything other than the default value (utf8mb4_general_ci). This fix makes Connector/J detect and set the proper character mapping for any utfmb4 collation."