Bug #99969 replication state is normal, but some transactions are lost
Submitted: 24 Jun 2020 8:33 Modified: 26 Jun 2020 16:08
Reporter: li samp Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7 OS:Red Hat
Assigned to: MySQL Verification Team CPU Architecture:Any

[24 Jun 2020 8:33] li samp
Description:
m-s environment(master 133 & slave 134),make a backup from 134(backup_s),then insert data in 133(date1),shutdown 133 and restore using backup_s,and set 133 be the slave of 134.

133 show replicaiton normal,but lost some date(data1)

How to repeat:
133 create test data:
> create database sampson;
> use sampson;
> create table t1(id int primary key,note varchar(15) not null);
> insert into t1 values(1,'before dump');
> select * from t1;
+----+-------------+
| id | note        |
+----+-------------+
|  1 | before dump |
+----+-------------+
1 row in set (0.00 sec)

> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000001 |      820 |              |                  | e9b20a86-b608-11ea-8869-005056af491e:1-3 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)
> select @@version;
+------------+
| @@version  |
+------------+
| 5.7.30-log |
+------------+

133 dump data for slave 134:
$ mysqldump -uroot -p  -A > backup/dump_0624.sql
$ scp backup/dump_0624.sql 10.157.27.134:/home/osa/backup/

133 create replication user:
> grant replication slave on *.* to repl@'%' identified by 'repl';
> flush privileges;
> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000001 |     1253 |              |                  | e9b20a86-b608-11ea-8869-005056af491e:1-5 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

134 resore date and become 133's slave:
$ mysql -uroot -p < backup/dump_0624.sql

> change master to master_host='10.157.27.133',master_port=3306,master_user='repl',master_password='repl',master_auto_position=1;
> start slave;
> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.157.27.133
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1253
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 831
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
... ...
           Retrieved_Gtid_Set: e9b20a86-b608-11ea-8869-005056af491e:4-5
            Executed_Gtid_Set: e9b20a86-b608-11ea-8869-005056af491e:1-5
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

133 insert date id=2:
> insert into t1 values(2,'before backup');
> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000001 |     1569 |              |                  | e9b20a86-b608-11ea-8869-005056af491e:1-6 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

134 replicaiton status:
> select * from sampson.t1;
+----+---------------+
| id | note          |
+----+---------------+
|  1 | before dump   |
|  2 | before backup |
+----+---------------+
2 rows in set (0.00 sec)

root@localhost 14:19:  [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.157.27.133
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1569
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 1147
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
... ...
           Retrieved_Gtid_Set: e9b20a86-b608-11ea-8869-005056af491e:4-6
            Executed_Gtid_Set: e9b20a86-b608-11ea-8869-005056af491e:1-6
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

134 backup by xtrabackup:
innobackupex --user=root --password=xxxxxx--port=3306 --socket=/tmp/mysql.sock --parallel=4 --compress-threads=4 --compress --stream=xbstream /home/osa/backup/ 2>/home/osa/backup/backup_mysql_20200624.log| ssh 10.157.27.133 "/usr/bin/xbstream -x -C /home/osa/backup/backup_0624"

*Using mysqldump to back up data is still the same result.

133 insert data id =3:
> insert into t1 values(3,'before restore');
Query OK, 1 row affected (0.00 sec)

root@localhost 22:24:  [sampson]> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000001 |     1887 |              |                  | e9b20a86-b608-11ea-8869-005056af491e:1-7 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)

134 replication status:
> select * from sampson.t1;
+----+----------------+
| id | note           |
+----+----------------+
|  1 | before dump    |
|  2 | before backup  |
|  3 | before restore |
+----+----------------+
3 rows in set (0.00 sec)

root@localhost 14:24:  [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.157.27.133
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1887
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 1465
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
... ...
           Retrieved_Gtid_Set: e9b20a86-b608-11ea-8869-005056af491e:4-7
            Executed_Gtid_Set: e9b20a86-b608-11ea-8869-005056af491e:1-7
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

133 shutdown:
# /etc/init.d/mysqld stop
Shutting down MySQL............ SUCCESS! 
# rm -rf /../data/
# rm -rf /../logs/binlog/
# cat /dev/null > /../logs/error.log 

134 clean replication info:
> stop slave;
> reset slave all;

134 insert data id =4:
> use sampson
> insert into t1 values(4,'slave insert');
> select * from t1;
+----+----------------+
| id | note           |
+----+----------------+
|  1 | before dump    |
|  2 | before backup  |
|  3 | before restore |
|  4 | slave insert   |
+----+----------------+
4 rows in set (0.00 sec)

> show master status;
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| mysql-bin.000001 |     1507 |              |                  | 78f36e9c-b5c6-11ea-9c14-005056af1d39:1,
e9b20a86-b608-11ea-8869-005056af491e:1-7 |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

133 resore by backup from 134
# innobackupex --parallel=4 --decompress backup_0624/
# innobackupex --apply-log backup_0624/
# innobackupex --copy-back backup_0624/

start 133:
# chown -R mysql:mysql /data/mysql/mysql_3306/
# /etc/init.d/mysqld start

set 133 be the slave of 134:
# mysql -uroot -p
> stop slave;
> reset slave all;
> reset master;
> SET @@GLOBAL.GTID_PURGED='e9b20a86-b608-11ea-8869-005056af491e:1-6';
> change master to master_host='10.157.27.134',master_port=3306,master_user='repl',master_password='repl',master_auto_position=1;
> start slave;
> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.157.27.134
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1507
               Relay_Log_File: localhost-relay-bin.000003
                Relay_Log_Pos: 481
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
... ...
           Retrieved_Gtid_Set: 78f36e9c-b5c6-11ea-9c14-005056af1d39:1
            Executed_Gtid_Set: 78f36e9c-b5c6-11ea-9c14-005056af1d39:1,
e9b20a86-b608-11ea-8869-005056af491e:1-6
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

> show master status;
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| mysql-bin.000001 |      452 |              |                  | 78f36e9c-b5c6-11ea-9c14-005056af1d39:1,
e9b20a86-b608-11ea-8869-005056af491e:1-6 |
+------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

> select @@server_uuid;
+--------------------------------------+
| @@server_uuid                        |
+--------------------------------------+
| bb256e3e-b62d-11ea-8d4b-005056af491e |
+--------------------------------------+
1 row in set (0.00 sec)

> select * from t1;
+----+---------------+
| id | note          |
+----+---------------+
|  1 | before dump   |
|  2 | before backup |
|  4 | slave insert  |
+----+---------------+
3 rows in set (0.00 sec)

replication status normal,but lost data id=3.

error log on 134:
when 133 start slave for 134:[Note] Start binlog_dump to master_thread_id(20) slave_server(4), pos(, 4)

then 133 do stop slave;& start slave;:[Note] While initializing dump thread for slave with UUID <f6a77adb-b62e-11ea-8226-005056af491e>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(20).
[Note] Start binlog_dump to master_thread_id(21) slave_server(4), pos(, 4)

Suggested fix:
if this is not a bug,please help me to solve this problem,Thanks!
[26 Jun 2020 16:08] MySQL Verification Team
Hi,

This is not a bug.

Best solution to not have this problem again:
Use group replication.

Partial solution for this type of problem:
Use semisync replication, in that case the data you add "after" the slave was off was not ack-ed to the client so it's ok to be discarded.

Get out of current issue:
When you promote 134 to be the master you have to manually execute on it the rest of the binlog from dead 133 that was not executed.

kind regards
Bogdan