Bug #78902 character encoding detection broken?
Submitted: 21 Oct 2015 16:45 Modified: 9 Dec 2015 22:05
Reporter: Lucas Jackson Email Updates:
Status: Verified Impact on me:
None 
Category:Connector / J Severity:S3 (Non-critical)
Version:5.1.36, 5.1.37 OS:CentOS (6.5)
Assigned to: CPU Architecture:Any
Tags: characterencoding

[21 Oct 2015 16:45] Lucas Jackson
Description:
I've been putting utf8mb4 characters in a mysql table for a while now.
Up to 5.1.35 of the connector, this has worked fine.
Starting with 5.1.36 and in 5.1.37, attempting to do so now produces the exception:

Incorrect string value: '\xF0\x9F\x98\x83' for column 'text' at row 1 

No other changes have been made other than replacing the jar.

Specifying the encoding manually with characterEncoding=utf8 in my jdbc url fixes the problem, so I'm assuming something with the character encoding autodetection got broken somewhere between 5.1.35 and 5.1.36.

Versions: 
OS: CentOS 6.5 
Tomcat: 8.0.26 
Java: 1.7.0_79-b15 
MySQL Cluster: 5.6.25-ndb-7.4.7-log 

my.cnf 
character-set-server = utf8mb4 
collation-server = utf8mb4_unicode_ci 
default-character-set = utf8mb4 

mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; 
+--------------------------+--------------------+ 
| Variable_name | Value | 
+--------------------------+--------------------+ 
| character_set_client | utf8mb4 | 
| character_set_connection | utf8mb4 | 
| character_set_database | utf8mb4 | 
| character_set_filesystem | binary | 
| character_set_results | utf8mb4 | 
| character_set_server | utf8mb4 | 
| character_set_system | utf8 | 
| collation_connection | utf8mb4_general_ci | 
| collation_database | utf8mb4_general_ci | 
| collation_server | utf8mb4_unicode_ci | 
+--------------------------+--------------------+ 

mysql> select table_collation from tables where table_name = 'tweet'; 
+--------------------+ 
| table_collation | 
+--------------------+ 
| utf8mb4_general_ci | 
+--------------------+ 

mysql> select character_set_name,collation_name from columns where table_name = 'tweet' and column_name = 'text'; 
+--------------------+--------------------+ 
| character_set_name | collation_name | 
+--------------------+--------------------+ 
| utf8mb4 | utf8mb4_general_ci | 
+--------------------+--------------------+ 

JDBC: 
url="jdbc:mysql:loadbalance:// 
host1,host2/db? 
connectTimeout=1000 
& 
loadBalancePingTimeout=100 
& 
loadBalanceBlacklistTimeout=10000 
& 
retriesAllDown=2 
& 
failOverReadOnly=false 
& 
loadBalanceStrategy=bestResponseTime 
& 
loadBalanceValidateConnectionOnSwapServer=true 
& 
allowMasterDownConnections=true 
& 
noAccessToProcedureBodies=true 
& 
cacheServerConfiguration=true 
& 
dontTrackOpenResources=true 
& 
elideSetAutoCommits=true 
& 
enableQueryTimeouts=false 
& 
maintainTimeStats=false 
& 
useLocalSessionState=true"

How to repeat:
Attempt to insert/update to a UTF8MB4 mysql column with an UTF8MB4 character.
[21 Oct 2015 20:18] MySQL Verification Team
Hi,

I don't see that in your connection URI you have

...useUnicode=true&characterEncoding=...characterSetResults=...connectionCollation=...

Also I don't see a version of Connector/J you are using

note that old connector/J did not support utf8mb4 for servers 5.5.2 and newer.
new connector/J auto-detects servers configured with character_set_server=utf8mb4 or treats the Java encoding utf-8 passed using characterEncoding=... as utf8mb4 in the SET NAMES= calls it makes when establishing the connection. Look at Bug #54175.

So check out http://dev.mysql.com/doc/relnotes/connector-j/en/news-5-1-13.html

solution, use 5.1.13 connector/j or newer

all best
Bogdan Kecman
[21 Oct 2015 23:04] Lucas Jackson
I said that i was using connector/j 5.1.33 and also 5.1.35 and everything was fine with the JDBC url provided.

When trying 5.1.36 or 5.1.37 the JDBC url provided does not work as expected and i have to workaround by adding characterEncoding=utf8.
[21 Oct 2015 23:05] Lucas Jackson
which is to say, it's not auto-detecting anymore.
[22 Oct 2015 6:24] MySQL Verification Team
Hi,
I somehow understood 5.1.36 is MySQL version and not Connector/J version my bad sorry. It looks like a bug, lemme try to find out where exactly. For now useUnicode=true&characterEncoding=...characterSetResults=...connectionCollation= is a workaround that should work properly (I actually always set those irrelevant to "auto detection" introduced in 5.1.13) anyhow let's see where did we introduce the problem and how to fix it :)

kind regards
Bogdan Kecman
[22 Oct 2015 16:02] Lucas Jackson
thanks much, i appreciate it.

i looked through the changelogs, only thing i saw related to encoding was the password encoding stuff, so maybe..

-Tony
[3 Nov 2015 18:25] MySQL Verification Team
Hi,

Verified as described. There is a simple workaround so I'm setting the Severity of the bug down to S3.

Thanks for submitting this one

kind regards
Bogdan Kecman
[9 Dec 2015 22:05] Lucas Jackson
Just changing the category for this bug
[2 May 15:08] Mark Callaghan
Is this related to https://bugs.mysql.com/bug.php?id=95139