MySQL Bugs: #96447: Group Replication Clone process might conflict with recovery phase 0

Bug #96447	Group Replication Clone process might conflict with recovery phase 0
Submitted:	7 Aug 2019 10:59	Modified:	6 Nov 2019 11:22
Reporter:	Pedro Gomes	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S3 (Non-critical)
Version:	8.0.17	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
This bug remains untested but concern is: 

The member leaves the group leaving a great number of transactions to apply on the RelayLog
The member is restarted and starts to apply this backlog
The member comes back and opts for a clone provisioning step
The member issues a clone command and drops all its data
The clone process issues a STOP SLAVE command that does not stop GR applier thread

The member applier SQL thread errors out and makes the member leave? 

How to repeat:
Make a member use clone while having a big backlog of transaction in its RL to apply.
Check for errors

Suggested fix:
Stop the applier thread before doing a clone request. 
Restart it if we fallback to incremental binlog recovery

Posted by developer:
 
Changelog entry added for MySQL 8.0.19:

When a group member rejoins a replication group, it begins the distributed recovery process by checking the relay log for its group_replication_applier channel for any transactions that it already received from the group, and applying these. The joining member then initiates state transfer from an existing online member, which might begin with a remote cloning operation. Previously, the group_replication_applier channel was not explicitly stopped when a remote cloning operation was started, so it was possible that the applier might still be applying existing transactions at that time, which might lead to errors. The group_replication_applier channel is now stopped before a remote cloning operation is requested, and restarted when the distributed recovery process moves on to state transfer from a donor's binary log.