Bug #102482 Data loss with ALL_NON_LOSSY and replication between latin1->utf8mb4
Submitted: 4 Feb 17:24 Modified: 7 Feb 11:35
Reporter: Sveta Smirnova (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[4 Feb 17:24] Sveta Smirnova
Description:
When replicating a table with latin1 character set on the source server to a table with utf8mb4 character set and conversion ALL_NON_LOSSY it is possible to miss the letter "ü"

How to repeat:
On the source:
CREATE TABLE `test1` (
  `id` smallint NOT NULL,
  `testtext` varchar(100) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

On the replica:
CREATE TABLE `test1` (
  `id` smallint NOT NULL,
  `testtext` varchar(100) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

On the source:
insert into test1 (id, testtext) values(1, "zü");
select testtext, hex(testtext) from test1 where id=1;
testtext	hex(testtext)
zü	7AFC

On the replica:
select testtext, hex(testtext) from test1 where id=1;
testtext	hex(testtext)
z	7A

Or run attached test case fr MTR.

Suggested fix:
Do not looze the letter!
[4 Feb 17:25] Sveta Smirnova
Test case for MTR

Attachment: rpl_CS0015346.test (application/octet-stream, text), 815 bytes.

[4 Feb 17:25] Sveta Smirnova
Option file for the source server

Attachment: rpl_CS0015346-master.opt (application/octet-stream, text), 30 bytes.

[4 Feb 17:25] Sveta Smirnova
Option file for the replica

Attachment: rpl_CS0015346-slave.opt (application/octet-stream, text), 39 bytes.

[5 Feb 10:24] MySQL Verification Team
Hello Sveta,

Thank you for the report and feedback.
Verified as described.

regards,
Umesh
[7 Feb 11:36] Sveta Smirnova
This happens not only for the particular character "ü", but on characters with hex encoding >=0x80. Replica drops such a character and everything what goes after it. While with ALL_NON_LOSSY it should either fail with an error (because operation becomes lossy) or perform conversion (because latin1 is a subset of UTF8).
[15 Feb 8:44] Erlend Dahl
Replacing 'zü' with _latin1 x'7AFC' in the testcase to avoid utf-8/latin-1 conversion issues shows that the problem can be reproduced on 5.7.
[15 Feb 16:57] Bernt Marius Johnsen
I did some research and replicated latin1 -> utf8mb4 (as in the original case), latin1 -> koi8r, latin1 -> utf16, utf8mb4 -> gb18030 and latin1 -> gb18030. My conclusion is:

1) The values are replicated verbatim without any conversion and
2) Replication of a value stops of a byte from source is not allowed in the target character set.

I used following replica .opt-file (no source .opt file)

==================================================
--slave_type_conversions=ALL_NON_LOSSY
================================================== 

and I used the following .test-file:

===============================================
--source include/master-slave.inc

create table t1 (v varchar(100) charset latin1);
create table t2 (v varchar(100) charset latin1);
create table t3 (v varchar(100) charset latin1);
create table t4 (v varchar(100) charset utf8mb4);
create table t5 (v varchar(100) charset latin1);

--sync_slave_with_master
drop table t1;
drop table t2;
drop table t3;
drop table t4;
drop table t5;

create table t1 (v varchar(100) charset utf8mb4);
create table t2 (v varchar(100) charset utf16);
create table t3 (v varchar(100) charset koi8r);
create table t4 (v varchar(100) charset gb18030);
create table t5 (v varchar(100) charset gb18030);

--connection master
insert into t1 values(_latin1 x'7AFC');
insert into t2 values(_latin1 x'7AFC');
insert into t3 values(_latin1 x'7AFC');
insert into t4 values(_latin1 x'7AFC');
insert into t5 values(_latin1 x'7AFC');

-- echo ============ Select from master
select v, hex(v) from t1;
select v, hex(v) from t2;
select v, hex(v) from t3;
select v, hex(v) from t4;
select v, hex(v) from t5;

--sync_slave_with_master
-- echo ============ Select from Slave
select v, hex(v) from t1;
select v, hex(v) from t2;
select v, hex(v) from t3;
select v, hex(v) from t4;
select v, hex(v) from t5;

--connection master
drop table t1;
drop table t2;
drop table t3;
drop table t4;
drop table t5;
--sync_slave_with_master
--source include/rpl_end.inc
==================================================