MySQL Bugs: #105584: when there are two same relay_log file name in relay

Bug #105584	when there are two same relay_log file name in relay_log.index, mgr can't online
Submitted:	16 Nov 2021 3:19	Modified:	23 Dec 2021 12:36
Reporter:	ldd ldd	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S3 (Non-critical)
Version:	8.0.x,5.7.26	OS:	CentOS
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	gr, group replication, master slave, mgr, MySQL

Description:
presupposition：
 relay_log_purge=on

problem:
mgr state is always in recovering，when there are two same relay log file name in relay log index

assume applier relay_log.index contains follow relay log name
relay_log.000001
relay_log.000001
relay_log.000002
relay_log.000003

and group_replication_applier starting replication in log 'relay_log.000001'

How to repeat:
in mgr one seconday node: no any transaction in mgr

1、stop group_replication

2、mv the last group_replication_applier relay log

3、start group_replication (this step mgr could online very fastly)

4、stop group_replication

5、start group_replication (this step mgr state was always in recovering,couldn't online)

Suggested fix:
set global relay_log_purge=off

Hi,

I'm not sure I understand what but you are reporting, what do you expect to happen if you remove the relay log file? 

thanks

sorry confuse you

i mean when i rm -rf one secondary node group_replication_applier relay log (the last relay log),then again emit start group_replication command, but the secondary node state is always in RECOVERING, can't ONLINE; like follow:

mysql> select state from performance_schema.replication_group_members where member_host='10.5.12.88';
+--------------+
| member_state |
+--------------+
| RECOVERING   |
+--------------+
1 row in set (0.00 sec)

How to reproduce：
assume three mysql group replication nodes:
10.5.12.12（primary）
10.5.12.88（secondary）
10.5.12.11（secondary）

mysql -h10.5.12.88 -uroot

mysql>set global relay_log_purge=on;

mysql>stop group_replication;

mysql>exit;

rm -rf 10.5.12.88-relay-bin-group_replication_applier.000049（the last relay log）

mysql -h10.5.12.88 -uroot

mysql>start group_replication;（this step could ONLINE very fastly）

mysql>stop group_replication;

mysql>start group_replication; （this step couldn't ONLINE）

What i hope：
i hope 10.5.12.88 node could ONLINE，after i executed above step

thank you

Hi,

> when there are two same relay log file name in relay log index

How did it happen that you have two same relay log files in the index ?

> rm -rf one secondary node group_replication_applier relay log (the last relay log),then again emit start group_replication command, but the secondary node state is always in RECOVERING, can't ONLINE; 

you deleted the data it needs to get on-line, this is expected behavior.

With regards to your reproduction .. I could not reproduce this but I think you have issue with relay_log_recovery that must be enabled on the slave to guarantee resilience. 
https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html#sysvar_relay_log_...

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".