MySQL Bugs: #109031: 'start slave' reports error after stopping slave during IO thd reconnecting

Bug #109031	'start slave' reports error after stopping slave during IO thd reconnecting
Submitted:	9 Nov 2022 2:47	Modified:	29 Nov 2022 1:02
Reporter:	Weijie Kong	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7.38	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
'start slave' reports error (ERROR 1201 (HY000): Could not initialize master info structure; more error messages can be found in the MySQL error log) in this case:
(1)slave connects to master and replicates from master
(2)master restarts, cause slave IO thd reconnects
(3)stop slave when master skips slave-executed events
(4)start slave, an error happens

This error is reported in function mts_recovery_groups, which is called at the beginning of start slave. This function is aimed to recover rli->recovery_groups for further recovery. It compares rli->m_group_master_log_pos with worker->m_group_master_log_pos to determine the active worker threads since last checkpoint. Then it reads events from rli->group_relay_log_pos till the event corresponding to worker->group_relay_log_pos and determines if worker's transaction should be re-executed later.

In the case described above, 1) rli->m_group_master_log_pos is wrong, which causes needless recovery on workers. 2) there is no transaction events after rli->group_relay_log_pos. And finally mts_recovery_groups reports error '[ERROR] Error looking for file after ./relay.000004'.

The reason for the wrong rli->m_group_master_log_pos is that master dump thread will seek from the beginning of the binlog where the last slave-executed gtid resides and sends heartbeat event periodically to slave, slave IO/SQL threads update their master_log_filename:master_log_pos after receiving the heartbeat event according to the pos saved in it. So during this process, we will see that slave IO/SQL thread's master_log_filename:master_log_pos jumps from the middle of the master's binlog file to the beginning and increases linearly. If we stop slave during this process, slave IO/SQL thread's master_log_filename:master_log_pos will be wrong because they don't correspond to the last executed trans on slave.

The reason for 2) is that the relaylog is created by reconnecting rotate event, and slave is stopped when dealing with heartbeat events and writing rotate events to relaylog.

How to repeat:
Slave-config:
Gtid mode is on , MTS is on, MASTER_HEARTBEAT_PERIOD=1(the little, the better. Because more heartbeat events can be produced during master skipping events), Disk is HDD.

Master-config:
max-binlog-size=1073741824, binlog fromat is ROW, Disk is HDD(The slower, the better. Because it costs more time when skipping events, then we have more time to stop slave on slave)

Produce steps：
(1)Use sysbench oltp_write_only.lua to produce about 56W trans on master(binlog is as near to as 1073741824)
(2)Slave connects to master
(3)Kill/Shutdown master before slave IO thd catching up with master(the recommended number of slave executed trans > 50W, so slave's Exec_Master_Log_Pos is almost at the end of master's binlog).

In our case, the IO thd's position is: Master_Log_File: mysql-bin.000001, Read_Master_Log_Pos: 1038177511.

Caution: don't let IO thd catch up with master!!! If IO thd caught up with master, then master will position to the newly created binlog file during restart when dealing IO reconnecting. In this case, this bug can't be repeated.

(4)Wait for slave SQL thd catching up with IO thd.

In our case, SQL thd's pos is: Relay_Log_File: relay.000003, Relay_Log_Pos: 1038176951, Relay_Master_Log_File: mysql-bin.000001, Exec_Master_Log_Pos: 1038176746.

(5)Start master, use show slave status to observe IO thd's master pos on slave.
(6)Stop slave before master finishing skipping slave-executed events（Slave IO thd's master pos doesn't catch up with that of step3, in our case, that is 1038177511）
(7)Check table mysql.slave_relay_log_info and mysql.slave_worker_info to ensure Master_log_name:Master_log_pos in mysql.slave_relay_log_info is less than that in mysql.slave_worker_info. If not , repeat step 1-6.

In our case, the position in mysql.slave_relay_log_info is: Relay_log_name: ./relay.000004, Relay_log_pos: 481, Master_log_name: mysql-bin.000001,Master_log_pos: 505666371.
The position in mysql.slave_worker_info is: Id: 1, Relay_log_name: ./relay.000003, Relay_log_pos: 1038175066, Master_log_name: mysql-bin.000001, Master_log_pos: 1038174861; Id: 2, Relay_log_name: ./relay.000003, Relay_log_pos: 1038176951, Master_log_name: mysql-bin.000001, Master_log_pos: 1038176746; ...; Id: 8, Relay_log_name: ./relay.000003, Relay_log_pos: 1038154343, Master_log_name: mysql-bin.000001, Master_log_pos: 1038154138.

(8)Start slave, error happens.

Suggested fix:
Before set_group_master_log_pos, compare current m_group_master_log_pos with the setting value, do not decrease m_group_master_log_pos when m_group_master_log_name is the same.

Hi,

What part of the slave is important for you to be lagging, part that fetches data from the master or part that executes it as in order to delay slave receiving data easiest test would be to cut network connection for a while but not sure if that's what you are testing here. Running a slower server for slave will get slave to delay execution (increased slave behind master value). 

Which one of these is more important for your test (as just doing the test as you explained did not reproduce the issue).

Thanks

The part that fetches data from the master is important. Exactly, we make IO thd to be lagging to avoid fetching data from the newly created binlog after master restarted. Slave server is not necessarily to be very slow for repeating this issue, because we wait for slave SQL thd to catch up with IO thd in step 4, faster slave server is preferred. 
The keys to repeat this issue are step 3 and step 6. In step3, IO thd is stopped(master is killed or shutdowned) as near to as the end of master's binlog file(but not at the end). The nearer to the end of master's binlog file, the more time will be cost to skip events when master is restarted and deals with slave IO reconnecting, the more time we have to do step 6. So we use HDD for master to produce more skipping events time. Just network cutting is not enough, because it just causes IO thd sbm, instead of more time for master to skip events. 
In one word, we should guarantee slave's Read_Master_Log_Pos is almost at the end of master's binlog when master is shutdowned/killed, and stop slave when master is skipping slave-executed events after restarted.

Hi,

still having issue reproducing. I'll try few more things. Thanks for update.

Hi Weijie Kong,

It maybe works for reproducing your case:

(1). on slave: build the replication

mysql> change master to master_host=..., MASTER_AUTO_POSITION=1;
mysql> set global slave_parallel_workers = 2;
mysql> SET GLOBAL slave_parallel_type = LOGICAL_CLOCK;
mysql> set global relay_log_purge = off;

mysql> start slave;

It is important to disable automatic purging of relay logs.(Gtid mode is on , MTS is on)

(2). on master: prepare the test data

mysql> create database test;
mysql> use test;
mysql> create table t (id int);

Ensure that the replication is healthy.

(3). on slave: lock the table about the checkpoint of SQL thread (coordinator and scheduler)

mysql> lock table mysql.slave_relay_log_info read;

It is also important to make SQL thread to be lagging to worker threads

(4). on master: produce some test data

mysql>  flush binary logs;
mysql> insert into tmp.t values (2);

(5). on slave: show the checkpoint about SQL thread and worker threads

mysql> select Relay_log_name,Relay_log_pos, Master_log_name, Master_log_pos from mysql.slave_relay_log_info;
+------------------------------------+---------------+------------------+----------------+
| Relay_log_name                     | Relay_log_pos | Master_log_name  | Master_log_pos |
+------------------------------------+---------------+------------------+----------------+
| .\DESKTOP-45GUNCI-relay-bin.000002 |           688 | mysql-bin.000001 |            475 |
+------------------------------------+---------------+------------------+----------------+
1 row in set (0.00 sec)

mysql> select Relay_log_name,Relay_log_pos, Master_log_name, Master_log_pos from mysql.slave_worker_info;
+------------------------------------+---------------+------------------+----------------+
| Relay_log_name                     | Relay_log_pos | Master_log_name  | Master_log_pos |
+------------------------------------+---------------+------------------+----------------+
| .\DESKTOP-45GUNCI-relay-bin.000004 |           658 | mysql-bin.000002 |            445 |
|                                    |             0 |                  |              0 |
+------------------------------------+---------------+------------------+----------------+

(6). on slave: force kill the slave instance

(7). on slave: start the slave instance

It is not wise to start the replication threads when the server starts.(You can start the slave with --skip-slave-start)

(8). on slave: restart the replication
mysql> set global slave_parallel_workers = 2;
mysql> SET GLOBAL slave_parallel_type = LOGICAL_CLOCK;
mysql> set global relay_log_purge = off;

mysql> start slave;
ERROR 1201 (HY000): Could not initialize master info structure; more error messages can be found in the MySQL error log

The bug has been fixed in mysql8.0 thought the commit: https://github.com/mysql/mysql-server/commit/77c7d1e43de3ef25e50d18a1b0a6ae52d5fe65d6