Bug #57694 3byte UTF8 can not be used with 5.5.3+ server
Submitted: 24 Oct 2010 18:01 Modified: 2 Dec 2010 17:15
Reporter: Elena Stepanova Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S3 (Non-critical)
Version:trunk OS:Any
Assigned to: Tony Bedford CPU Architecture:Any

[24 Oct 2010 18:01] Elena Stepanova
Description:
In MySQL server 5.5.3 a new character set utf8mb4 was introduced in addition to existing utf8, so both character sets are available at the same time. However, Connector/J maps them into the same UTF-8 name:

+ "UTF-8 = utf8,"
+ "UTF-8 = *> 5.5.2 utf8mb4,"

Thus, for server versions higher than 5.5.2 it uses utf8mb4, and retrieves wrong attributes, e.g. getMaxBytesPerChar returns 4, which is true for utf8mb4 but should still be 3 for utf8.

How to repeat:
Run testsuite.regression.MetaDataRegressionTest.testBug6399 on MySQL server 5.5.3 or above.

It fails with AssertionFailedError 
expected:<3> but was:<2>
in line 781 (on an UTF8 column) 

It is caused by a wrong value returned by f.getMaxBytesPerCharacter() called in getColumnDisplaySize.

Suggested fix:
Map utf8mb4 into a unique name.
[2 Nov 2010 8:09] Tonci Grgin
Pushed up to revision 992.
[2 Dec 2010 17:15] Tony Bedford
Main docs have been updated with details of using 3-byte and 4-byte utf8 with Connector/J.

An entry has also been added to the 5.1.14 changelog:

Connector/J mapped both 3-byte and 4-byte UTF8 encodings to the same Java UTF8 encoding.

To use 3-byte UTF8 with Connector/J set characterEncoding=utf8 and set useUnicode=true in the connection string.

To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set.