MySQL Bugs: #71435: Read/write between two tables with identical char fields but diff encodings

Bug #71435	Read/write between two tables with identical char fields but diff encodings
Submitted:	20 Jan 2014 21:27	Modified:	6 Aug 2014 15:23
Reporter:	Mike Cress	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster/J	Severity:	S3 (Non-critical)
Version:	7.3.3	OS:	Any
Assigned to:	Lakshmi Narayanan Sreethar	CPU Architecture:	Any
Tags:	char, ClusterJ, Data length, UTF-8

Description:
I have two tables with two fields that are identical "char(4) not null". One encoding is "utf8_general_ci" and the other is "latin1_swedish_ci". Using clusterJ, I read the value from latin1 table and write it to the utf8 table and I get this error:

For field emailDomain column emailDomain valueDelegate object String, error executing objectSetValue. Caused by com.mysql.clusterj.ClusterJUserException:Data length 12 too long. Column 'emailDomain' can only accept data of length 4.

How to repeat:
1.) Set up a table with something like "WebDomain char(4) not null". One table latin1 and the other utf8.
2.) Create representative ClusterJ interfaces for each table with 
       @Column(name="WebDomain")
	String getEmailDomain();
	void setEmailDomain( String Domain );

3.) Create a test app to read from the latin1 table and write to the utf8 table.

Suggested fix:
A work-around might be to just trim() the string prior to insertion into the latin1 table. This worked for me.

Posted by developer:
 
This bug can be duplicated by reading from the utf8 column and writing to the latin1 column. Reading from the utf8 column results in reading 12 characters during the decoding process.

Step 3 should read "read from the utf8 table and write to the latin1 table".

Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

Documented fix in the NDB 7.1.32, 7.2.17, and 7.3.6 changelogs, as follows:

        Writing a value failed when read from a fixed-width char column
        using utf8 to another column of the same type and length but
        using latin1. The data was returned with extra spaces after
        being padded during its insertion. The value is now trimmed
        before returning it.

        This fix also corrects -Data length too long- errors during the
        insertion of valid utf8 characters of 2 or more bytes. This was
        due to padding of the data prior to encoding it, rather than
        after.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html