| Bug #102482 | Data loss with ALL_NON_LOSSY and replication between latin1->utf8mb4 | ||
|---|---|---|---|
| Submitted: | 4 Feb 2021 17:24 | Modified: | 7 Feb 2021 11:35 |
| Reporter: | Sveta Smirnova (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Row Based Replication ( RBR ) | Severity: | S3 (Non-critical) |
| Version: | 5.7 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[4 Feb 2021 17:24]
Sveta Smirnova
[4 Feb 2021 17:25]
Sveta Smirnova
Test case for MTR
Attachment: rpl_CS0015346.test (application/octet-stream, text), 815 bytes.
[4 Feb 2021 17:25]
Sveta Smirnova
Option file for the source server
Attachment: rpl_CS0015346-master.opt (application/octet-stream, text), 30 bytes.
[4 Feb 2021 17:25]
Sveta Smirnova
Option file for the replica
Attachment: rpl_CS0015346-slave.opt (application/octet-stream, text), 39 bytes.
[5 Feb 2021 10:24]
MySQL Verification Team
Hello Sveta, Thank you for the report and feedback. Verified as described. regards, Umesh
[7 Feb 2021 11:36]
Sveta Smirnova
This happens not only for the particular character "ü", but on characters with hex encoding >=0x80. Replica drops such a character and everything what goes after it. While with ALL_NON_LOSSY it should either fail with an error (because operation becomes lossy) or perform conversion (because latin1 is a subset of UTF8).
[15 Feb 2021 8:44]
Erlend Dahl
Replacing 'zü' with _latin1 x'7AFC' in the testcase to avoid utf-8/latin-1 conversion issues shows that the problem can be reproduced on 5.7.
[15 Feb 2021 16:57]
Bernt Marius Johnsen
I did some research and replicated latin1 -> utf8mb4 (as in the original case), latin1 -> koi8r, latin1 -> utf16, utf8mb4 -> gb18030 and latin1 -> gb18030. My conclusion is: 1) The values are replicated verbatim without any conversion and 2) Replication of a value stops of a byte from source is not allowed in the target character set. I used following replica .opt-file (no source .opt file) ================================================== --slave_type_conversions=ALL_NON_LOSSY ================================================== and I used the following .test-file: =============================================== --source include/master-slave.inc create table t1 (v varchar(100) charset latin1); create table t2 (v varchar(100) charset latin1); create table t3 (v varchar(100) charset latin1); create table t4 (v varchar(100) charset utf8mb4); create table t5 (v varchar(100) charset latin1); --sync_slave_with_master drop table t1; drop table t2; drop table t3; drop table t4; drop table t5; create table t1 (v varchar(100) charset utf8mb4); create table t2 (v varchar(100) charset utf16); create table t3 (v varchar(100) charset koi8r); create table t4 (v varchar(100) charset gb18030); create table t5 (v varchar(100) charset gb18030); --connection master insert into t1 values(_latin1 x'7AFC'); insert into t2 values(_latin1 x'7AFC'); insert into t3 values(_latin1 x'7AFC'); insert into t4 values(_latin1 x'7AFC'); insert into t5 values(_latin1 x'7AFC'); -- echo ============ Select from master select v, hex(v) from t1; select v, hex(v) from t2; select v, hex(v) from t3; select v, hex(v) from t4; select v, hex(v) from t5; --sync_slave_with_master -- echo ============ Select from Slave select v, hex(v) from t1; select v, hex(v) from t2; select v, hex(v) from t3; select v, hex(v) from t4; select v, hex(v) from t5; --connection master drop table t1; drop table t2; drop table t3; drop table t4; drop table t5; --sync_slave_with_master --source include/rpl_end.inc ==================================================
[9 Aug 2024 19:31]
Fernando Camargos
I used Sveta's test case and found that on MySQL 8.0.34 (and 0.38), if the server/database is configured with character set latin1 and you connect using the client character set utf8mb4 and do an INSERT containing one of these special characters:
master [localhost:2789] {msandbox} (test) > \s
--------------
/home/fernando/Downloads/8.0.34/bin/mysql Ver 8.0.34 for Linux on x86_64 (MySQL Community Server - GPL)
Connection id: 12
Current database: test
Current user: msandbox@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 8.0.34 MySQL Community Server - GPL
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: latin1
Client characterset: utf8mb4
Conn. characterset: utf8mb4
UNIX socket: /tmp/mysql_sandbox2789.sock
Binary data as: Hexadecimal
Uptime: 3 min 57 sec
Threads: 4 Questions: 55 Slow queries: 0 Opens: 171 Flush tables: 3 Open tables: 90 Queries per second avg: 0.232
--------------
master [localhost:2789] {msandbox} (test) > insert into test1 (id, testtext) values(2, "zü");
Query OK, 1 row affected (0,00 sec)
master [localhost:2789] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=2;
+----------+---------------+
| testtext | hex(testtext) |
+----------+---------------+
| zü | 7AFC |
+----------+---------------+
1 row in set (0,00 sec)
then the text field will be truncated on the especial character on a replica whose table was converted to utf8mb4:
slave2 [localhost:2791] {msandbox} (test) > \s
--------------
/home/fernando/Downloads/8.0.34/bin/mysql Ver 8.0.34 for Linux on x86_64 (MySQL Community Server - GPL)
Connection id: 14
Current database: test
Current user: msandbox@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 8.0.34 MySQL Community Server - GPL
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8mb4
Db characterset: utf8mb4
Client characterset: utf8mb4
Conn. characterset: utf8mb4
UNIX socket: /tmp/mysql_sandbox2791.sock
Binary data as: Hexadecimal
Uptime: 11 min 15 sec
Threads: 8 Questions: 25 Slow queries: 0 Opens: 169 Flush tables: 3 Open tables: 88 Queries per second avg: 0.037
--------------
slave2 [localhost:2791] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=2;
+----------+---------------+
| testtext | hex(testtext) |
+----------+---------------+
| z | 7A |
+----------+---------------+
1 row in set (0,00 sec)
However, if you connect to the master using the client character set latin1 and do the same INSERT:
$ ./use test --default-character-set=latin1
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 8.0.34 MySQL Community Server - GPL
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
master [localhost:2789] {msandbox} (test) > \s
--------------
/home/fernando/Downloads/8.0.34/bin/mysql Ver 8.0.34 for Linux on x86_64 (MySQL Community Server - GPL)
Connection id: 13
Current database: test
Current user: msandbox@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 8.0.34 MySQL Community Server - GPL
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: latin1
Client characterset: latin1
Conn. characterset: latin1
UNIX socket: /tmp/mysql_sandbox2789.sock
Binary data as: Hexadecimal
Uptime: 13 min 37 sec
Threads: 4 Questions: 70 Slow queries: 0 Opens: 179 Flush tables: 3 Open tables: 98 Queries per second avg: 0.085
--------------
master [localhost:2789] {msandbox} (test) > insert into test1 (id, testtext) values(3, "zü");
Query OK, 1 row affected (0,00 sec)
master [localhost:2789] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=3;
+----------+---------------+
| testtext | hex(testtext) |
+----------+---------------+
| zü | 7AC3BC |
+----------+---------------+
1 row in set (0,00 sec)
the text field replicates in full in the replica:
slave2 [localhost:2791] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=3;
+----------+---------------+
| testtext | hex(testtext) |
+----------+---------------+
| zü | 7AC3BC |
+----------+---------------+
1 row in set (0,00 sec)
