Bug #102482 Data loss with ALL_NON_LOSSY and replication between latin1->utf8mb4
Submitted: 4 Feb 2021 17:24 Modified: 7 Feb 2021 11:35
Reporter: Sveta Smirnova (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[4 Feb 2021 17:24] Sveta Smirnova
Description:
When replicating a table with latin1 character set on the source server to a table with utf8mb4 character set and conversion ALL_NON_LOSSY it is possible to miss the letter "ü"

How to repeat:
On the source:
CREATE TABLE `test1` (
  `id` smallint NOT NULL,
  `testtext` varchar(100) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

On the replica:
CREATE TABLE `test1` (
  `id` smallint NOT NULL,
  `testtext` varchar(100) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

On the source:
insert into test1 (id, testtext) values(1, "zü");
select testtext, hex(testtext) from test1 where id=1;
testtext	hex(testtext)
zü	7AFC

On the replica:
select testtext, hex(testtext) from test1 where id=1;
testtext	hex(testtext)
z	7A

Or run attached test case fr MTR.

Suggested fix:
Do not looze the letter!
[4 Feb 2021 17:25] Sveta Smirnova
Test case for MTR

Attachment: rpl_CS0015346.test (application/octet-stream, text), 815 bytes.

[4 Feb 2021 17:25] Sveta Smirnova
Option file for the source server

Attachment: rpl_CS0015346-master.opt (application/octet-stream, text), 30 bytes.

[4 Feb 2021 17:25] Sveta Smirnova
Option file for the replica

Attachment: rpl_CS0015346-slave.opt (application/octet-stream, text), 39 bytes.

[5 Feb 2021 10:24] MySQL Verification Team
Hello Sveta,

Thank you for the report and feedback.
Verified as described.

regards,
Umesh
[7 Feb 2021 11:36] Sveta Smirnova
This happens not only for the particular character "ü", but on characters with hex encoding >=0x80. Replica drops such a character and everything what goes after it. While with ALL_NON_LOSSY it should either fail with an error (because operation becomes lossy) or perform conversion (because latin1 is a subset of UTF8).
[15 Feb 2021 8:44] Erlend Dahl
Replacing 'zü' with _latin1 x'7AFC' in the testcase to avoid utf-8/latin-1 conversion issues shows that the problem can be reproduced on 5.7.
[15 Feb 2021 16:57] Bernt Marius Johnsen
I did some research and replicated latin1 -> utf8mb4 (as in the original case), latin1 -> koi8r, latin1 -> utf16, utf8mb4 -> gb18030 and latin1 -> gb18030. My conclusion is:

1) The values are replicated verbatim without any conversion and
2) Replication of a value stops of a byte from source is not allowed in the target character set.

I used following replica .opt-file (no source .opt file)

==================================================
--slave_type_conversions=ALL_NON_LOSSY
================================================== 

and I used the following .test-file:

===============================================
--source include/master-slave.inc

create table t1 (v varchar(100) charset latin1);
create table t2 (v varchar(100) charset latin1);
create table t3 (v varchar(100) charset latin1);
create table t4 (v varchar(100) charset utf8mb4);
create table t5 (v varchar(100) charset latin1);

--sync_slave_with_master
drop table t1;
drop table t2;
drop table t3;
drop table t4;
drop table t5;

create table t1 (v varchar(100) charset utf8mb4);
create table t2 (v varchar(100) charset utf16);
create table t3 (v varchar(100) charset koi8r);
create table t4 (v varchar(100) charset gb18030);
create table t5 (v varchar(100) charset gb18030);

--connection master
insert into t1 values(_latin1 x'7AFC');
insert into t2 values(_latin1 x'7AFC');
insert into t3 values(_latin1 x'7AFC');
insert into t4 values(_latin1 x'7AFC');
insert into t5 values(_latin1 x'7AFC');

-- echo ============ Select from master
select v, hex(v) from t1;
select v, hex(v) from t2;
select v, hex(v) from t3;
select v, hex(v) from t4;
select v, hex(v) from t5;

--sync_slave_with_master
-- echo ============ Select from Slave
select v, hex(v) from t1;
select v, hex(v) from t2;
select v, hex(v) from t3;
select v, hex(v) from t4;
select v, hex(v) from t5;

--connection master
drop table t1;
drop table t2;
drop table t3;
drop table t4;
drop table t5;
--sync_slave_with_master
--source include/rpl_end.inc
==================================================
[9 Aug 2024 19:31] Fernando Camargos
I used Sveta's test case and found that on MySQL 8.0.34 (and 0.38), if the server/database is configured with character set latin1 and you connect using the client character set utf8mb4 and do an INSERT containing one of these special characters:

master [localhost:2789] {msandbox} (test) > \s
--------------
/home/fernando/Downloads/8.0.34/bin/mysql  Ver 8.0.34 for Linux on x86_64 (MySQL Community Server - GPL)

Connection id:        12
Current database:    test
Current user:        msandbox@localhost
SSL:            Not in use
Current pager:        stdout
Using outfile:        ''
Using delimiter:    ;
Server version:        8.0.34 MySQL Community Server - GPL
Protocol version:    10
Connection:        Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    utf8mb4
Conn.  characterset:    utf8mb4
UNIX socket:        /tmp/mysql_sandbox2789.sock
Binary data as:        Hexadecimal
Uptime:            3 min 57 sec

Threads: 4  Questions: 55  Slow queries: 0  Opens: 171  Flush tables: 3  Open tables: 90  Queries per second avg: 0.232
--------------

master [localhost:2789] {msandbox} (test) > insert into test1 (id, testtext) values(2, "zü");
Query OK, 1 row affected (0,00 sec)

master [localhost:2789] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=2;
+----------+---------------+
| testtext | hex(testtext) |
+----------+---------------+
| zü       | 7AFC          |
+----------+---------------+
1 row in set (0,00 sec)

then the text field will be truncated on the especial character on a replica whose table was converted to utf8mb4:

slave2 [localhost:2791] {msandbox} (test) > \s
--------------
/home/fernando/Downloads/8.0.34/bin/mysql  Ver 8.0.34 for Linux on x86_64 (MySQL Community Server - GPL)

Connection id:        14
Current database:    test
Current user:        msandbox@localhost
SSL:            Not in use
Current pager:        stdout
Using outfile:        ''
Using delimiter:    ;
Server version:        8.0.34 MySQL Community Server - GPL
Protocol version:    10
Connection:        Localhost via UNIX socket
Server characterset:    utf8mb4
Db     characterset:    utf8mb4
Client characterset:    utf8mb4
Conn.  characterset:    utf8mb4
UNIX socket:        /tmp/mysql_sandbox2791.sock
Binary data as:        Hexadecimal
Uptime:            11 min 15 sec

Threads: 8  Questions: 25  Slow queries: 0  Opens: 169  Flush tables: 3  Open tables: 88  Queries per second avg: 0.037
--------------

slave2 [localhost:2791] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=2;
+----------+---------------+
| testtext | hex(testtext) |
+----------+---------------+
| z        | 7A            |
+----------+---------------+
1 row in set (0,00 sec)
 

However, if you connect to the master using the client character set latin1 and do the same INSERT:

$ ./use test --default-character-set=latin1

Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 8.0.34 MySQL Community Server - GPL

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

master [localhost:2789] {msandbox} (test) > \s
--------------
/home/fernando/Downloads/8.0.34/bin/mysql  Ver 8.0.34 for Linux on x86_64 (MySQL Community Server - GPL)

Connection id:        13
Current database:    test
Current user:        msandbox@localhost
SSL:            Not in use
Current pager:        stdout
Using outfile:        ''
Using delimiter:    ;
Server version:        8.0.34 MySQL Community Server - GPL
Protocol version:    10
Connection:        Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    latin1
Conn.  characterset:    latin1
UNIX socket:        /tmp/mysql_sandbox2789.sock
Binary data as:        Hexadecimal
Uptime:            13 min 37 sec

Threads: 4  Questions: 70  Slow queries: 0  Opens: 179  Flush tables: 3  Open tables: 98  Queries per second avg: 0.085
--------------

master [localhost:2789] {msandbox} (test) > insert into test1 (id, testtext) values(3, "zü");
Query OK, 1 row affected (0,00 sec)

master [localhost:2789] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=3;
+----------+---------------+
| testtext | hex(testtext) |
+----------+---------------+
| zü      | 7AC3BC        |
+----------+---------------+
1 row in set (0,00 sec)

the text field replicates in full in the replica:

slave2 [localhost:2791] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=3;
+----------+---------------+
| testtext | hex(testtext) |
+----------+---------------+
| zü       | 7AC3BC        |
+----------+---------------+
1 row in set (0,00 sec)