Bug #105584 when there are two same relay_log file name in relay_log.index, mgr can't online
Submitted: 16 Nov 2021 3:19 Modified: 23 Dec 2021 12:36
Reporter: ldd ldd Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.x,5.7.26 OS:CentOS
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: gr, group replication, master slave, mgr, MySQL

[16 Nov 2021 3:19] ldd ldd
Description:
presupposition:
 relay_log_purge=on

problem:
mgr state is always in recovering,when there are two same relay log file name in relay log index

assume applier relay_log.index contains follow relay log name
relay_log.000001
relay_log.000001
relay_log.000002
relay_log.000003

and group_replication_applier starting replication in log 'relay_log.000001'

How to repeat:
in mgr one seconday node: no any transaction in mgr

1、stop group_replication

2、mv the last group_replication_applier relay log

3、start group_replication (this step mgr could online very fastly)

4、stop group_replication

5、start group_replication (this step mgr state was always in recovering,couldn't online)

Suggested fix:
set global relay_log_purge=off
[17 Nov 2021 11:14] MySQL Verification Team
Hi,

I'm not sure I understand what but you are reporting, what do you expect to happen if you remove the relay log file? 

thanks
[18 Nov 2021 3:25] ldd ldd
sorry confuse you

i mean when i rm -rf one secondary node group_replication_applier relay log (the last relay log),then again emit start group_replication command, but the secondary node state is always in RECOVERING, can't ONLINE; like follow:

mysql> select state from performance_schema.replication_group_members where member_host='10.5.12.88';
+--------------+
| member_state |
+--------------+
| RECOVERING   |
+--------------+
1 row in set (0.00 sec)

How to reproduce:
assume three mysql group replication nodes:
10.5.12.12(primary)
10.5.12.88(secondary)
10.5.12.11(secondary)

mysql -h10.5.12.88 -uroot

mysql>set global relay_log_purge=on;

mysql>stop group_replication;

mysql>exit;

rm -rf 10.5.12.88-relay-bin-group_replication_applier.000049(the last relay log)

mysql -h10.5.12.88 -uroot

mysql>start group_replication;(this step could ONLINE very fastly)

mysql>stop group_replication;

mysql>start group_replication; (this step couldn't ONLINE)

What i hope:
i hope 10.5.12.88 node could ONLINE,after i executed above step

thank you
[23 Nov 2021 12:36] MySQL Verification Team
Hi,

> when there are two same relay log file name in relay log index

How did it happen that you have two same relay log files in the index ?

> rm -rf one secondary node group_replication_applier relay log (the last relay log),then again emit start group_replication command, but the secondary node state is always in RECOVERING, can't ONLINE; 

you deleted the data it needs to get on-line, this is expected behavior.

With regards to your reproduction .. I could not reproduce this but I think you have issue with relay_log_recovery that must be enabled on the slave to guarantee resilience. 
https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html#sysvar_relay_log_...
[24 Dec 2021 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".