Bug #102482 | Data loss with ALL_NON_LOSSY and replication between latin1->utf8mb4 | ||
---|---|---|---|
Submitted: | 4 Feb 2021 17:24 | Modified: | 7 Feb 2021 11:35 |
Reporter: | Sveta Smirnova (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Row Based Replication ( RBR ) | Severity: | S3 (Non-critical) |
Version: | 5.7 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[4 Feb 2021 17:24]
Sveta Smirnova
[4 Feb 2021 17:25]
Sveta Smirnova
Test case for MTR
Attachment: rpl_CS0015346.test (application/octet-stream, text), 815 bytes.
[4 Feb 2021 17:25]
Sveta Smirnova
Option file for the source server
Attachment: rpl_CS0015346-master.opt (application/octet-stream, text), 30 bytes.
[4 Feb 2021 17:25]
Sveta Smirnova
Option file for the replica
Attachment: rpl_CS0015346-slave.opt (application/octet-stream, text), 39 bytes.
[5 Feb 2021 10:24]
MySQL Verification Team
Hello Sveta, Thank you for the report and feedback. Verified as described. regards, Umesh
[7 Feb 2021 11:36]
Sveta Smirnova
This happens not only for the particular character "ü", but on characters with hex encoding >=0x80. Replica drops such a character and everything what goes after it. While with ALL_NON_LOSSY it should either fail with an error (because operation becomes lossy) or perform conversion (because latin1 is a subset of UTF8).
[15 Feb 2021 8:44]
Erlend Dahl
Replacing 'zü' with _latin1 x'7AFC' in the testcase to avoid utf-8/latin-1 conversion issues shows that the problem can be reproduced on 5.7.
[15 Feb 2021 16:57]
Bernt Marius Johnsen
I did some research and replicated latin1 -> utf8mb4 (as in the original case), latin1 -> koi8r, latin1 -> utf16, utf8mb4 -> gb18030 and latin1 -> gb18030. My conclusion is: 1) The values are replicated verbatim without any conversion and 2) Replication of a value stops of a byte from source is not allowed in the target character set. I used following replica .opt-file (no source .opt file) ================================================== --slave_type_conversions=ALL_NON_LOSSY ================================================== and I used the following .test-file: =============================================== --source include/master-slave.inc create table t1 (v varchar(100) charset latin1); create table t2 (v varchar(100) charset latin1); create table t3 (v varchar(100) charset latin1); create table t4 (v varchar(100) charset utf8mb4); create table t5 (v varchar(100) charset latin1); --sync_slave_with_master drop table t1; drop table t2; drop table t3; drop table t4; drop table t5; create table t1 (v varchar(100) charset utf8mb4); create table t2 (v varchar(100) charset utf16); create table t3 (v varchar(100) charset koi8r); create table t4 (v varchar(100) charset gb18030); create table t5 (v varchar(100) charset gb18030); --connection master insert into t1 values(_latin1 x'7AFC'); insert into t2 values(_latin1 x'7AFC'); insert into t3 values(_latin1 x'7AFC'); insert into t4 values(_latin1 x'7AFC'); insert into t5 values(_latin1 x'7AFC'); -- echo ============ Select from master select v, hex(v) from t1; select v, hex(v) from t2; select v, hex(v) from t3; select v, hex(v) from t4; select v, hex(v) from t5; --sync_slave_with_master -- echo ============ Select from Slave select v, hex(v) from t1; select v, hex(v) from t2; select v, hex(v) from t3; select v, hex(v) from t4; select v, hex(v) from t5; --connection master drop table t1; drop table t2; drop table t3; drop table t4; drop table t5; --sync_slave_with_master --source include/rpl_end.inc ==================================================
[9 Aug 2024 19:31]
Fernando Camargos
I used Sveta's test case and found that on MySQL 8.0.34 (and 0.38), if the server/database is configured with character set latin1 and you connect using the client character set utf8mb4 and do an INSERT containing one of these special characters: master [localhost:2789] {msandbox} (test) > \s -------------- /home/fernando/Downloads/8.0.34/bin/mysql Ver 8.0.34 for Linux on x86_64 (MySQL Community Server - GPL) Connection id: 12 Current database: test Current user: msandbox@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 8.0.34 MySQL Community Server - GPL Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: latin1 Db characterset: latin1 Client characterset: utf8mb4 Conn. characterset: utf8mb4 UNIX socket: /tmp/mysql_sandbox2789.sock Binary data as: Hexadecimal Uptime: 3 min 57 sec Threads: 4 Questions: 55 Slow queries: 0 Opens: 171 Flush tables: 3 Open tables: 90 Queries per second avg: 0.232 -------------- master [localhost:2789] {msandbox} (test) > insert into test1 (id, testtext) values(2, "zü"); Query OK, 1 row affected (0,00 sec) master [localhost:2789] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=2; +----------+---------------+ | testtext | hex(testtext) | +----------+---------------+ | zü | 7AFC | +----------+---------------+ 1 row in set (0,00 sec) then the text field will be truncated on the especial character on a replica whose table was converted to utf8mb4: slave2 [localhost:2791] {msandbox} (test) > \s -------------- /home/fernando/Downloads/8.0.34/bin/mysql Ver 8.0.34 for Linux on x86_64 (MySQL Community Server - GPL) Connection id: 14 Current database: test Current user: msandbox@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 8.0.34 MySQL Community Server - GPL Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8mb4 Db characterset: utf8mb4 Client characterset: utf8mb4 Conn. characterset: utf8mb4 UNIX socket: /tmp/mysql_sandbox2791.sock Binary data as: Hexadecimal Uptime: 11 min 15 sec Threads: 8 Questions: 25 Slow queries: 0 Opens: 169 Flush tables: 3 Open tables: 88 Queries per second avg: 0.037 -------------- slave2 [localhost:2791] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=2; +----------+---------------+ | testtext | hex(testtext) | +----------+---------------+ | z | 7A | +----------+---------------+ 1 row in set (0,00 sec) However, if you connect to the master using the client character set latin1 and do the same INSERT: $ ./use test --default-character-set=latin1 Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 13 Server version: 8.0.34 MySQL Community Server - GPL Copyright (c) 2000, 2023, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. master [localhost:2789] {msandbox} (test) > \s -------------- /home/fernando/Downloads/8.0.34/bin/mysql Ver 8.0.34 for Linux on x86_64 (MySQL Community Server - GPL) Connection id: 13 Current database: test Current user: msandbox@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 8.0.34 MySQL Community Server - GPL Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: latin1 Db characterset: latin1 Client characterset: latin1 Conn. characterset: latin1 UNIX socket: /tmp/mysql_sandbox2789.sock Binary data as: Hexadecimal Uptime: 13 min 37 sec Threads: 4 Questions: 70 Slow queries: 0 Opens: 179 Flush tables: 3 Open tables: 98 Queries per second avg: 0.085 -------------- master [localhost:2789] {msandbox} (test) > insert into test1 (id, testtext) values(3, "zü"); Query OK, 1 row affected (0,00 sec) master [localhost:2789] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=3; +----------+---------------+ | testtext | hex(testtext) | +----------+---------------+ | zü | 7AC3BC | +----------+---------------+ 1 row in set (0,00 sec) the text field replicates in full in the replica: slave2 [localhost:2791] {msandbox} (test) > select testtext, hex(testtext) from test1 where id=3; +----------+---------------+ | testtext | hex(testtext) | +----------+---------------+ | zü | 7AC3BC | +----------+---------------+ 1 row in set (0,00 sec)