Bug #91057 GR recovery can retry infinitely if count is modified to lower value on the fly
Submitted: 29 May 2018 10:31 Modified: 6 Sep 2018 15:41
Reporter: Narendra Chauhan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:5.7.23 OS:Any
Assigned to: CPU Architecture:Any

[29 May 2018 10:31] Narendra Chauhan
Description:
Scenario: Check the GR recovery retry mechanism using group_replication_recovery_retry_count variable. Try to change the variable dynamically when recovery is happening.

Observation:-
If the variable's value is modified while retrying to an already attempted value, the retry never stops.
Thus, we can see something like "Attempt X/Y (where X>Y)":-
=======
2018-05-29T10:28:15.539361Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 4/10'
2018-05-29T10:28:20.852109Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 5/10'
2018-05-29T10:28:26.144744Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 6/4'
2018-05-29T10:28:31.485650Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 7/4'
2018-05-29T10:28:36.769742Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 8/4'
2018-05-29T10:28:42.085391Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 9/4'
2018-05-29T10:28:47.353870Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 10/4'
2018-05-29T10:28:52.620113Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 11/4'
2018-05-29T10:28:57.872705Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 12/4'
=======

MTR testcase output:-
=====================
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13019
include/group_replication.inc
Warnings:
Note	####	Sending passwords in plain text without SSL/TLS is extremely insecure.
Note	####	Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[connection server1]

[connection server1]
include/start_and_bootstrap_group_replication.inc
CREATE TABLE t1 (a INT PRIMARY KEY);

[connection server2]
SET SQL_LOG_BIN=0;
CREATE TABLE t1 (a INT PRIMARY KEY);
SET SQL_LOG_BIN=1;
SET GLOBAL group_replication_recovery_reconnect_interval= 5;
SET GLOBAL group_replication_recovery_retry_count= 10;
SET GLOBAL group_replication_group_name=
"aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa";
START GROUP_REPLICATION;

Matching lines are:
2018-05-29T10:28:04.987833Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
2018-05-29T10:28:10.268124Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 3/10'
2018-05-29T10:28:15.539361Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 4/10'
2018-05-29T10:28:20.852109Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 5/10'
Occurrences of 'group recovery connection with another donor. Attempt' in the input file: 4

# We are trying to limit the rety count, but, by mistake it is
# set to a value which is aleady attempted within 20 secs.
SET GLOBAL group_replication_recovery_retry_count= 4;

# Now we will wait for another 60 seconds to ensure that recovery
# count try > 10 and is not planning to stop.

# Also, check the Attempt values it is something like X/4 (where X>4).
Matching lines are:
2018-05-29T10:28:04.987833Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
2018-05-29T10:28:10.268124Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 3/10'
2018-05-29T10:28:15.539361Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 4/10'
2018-05-29T10:28:20.852109Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 5/10'
2018-05-29T10:28:26.144744Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 6/4'
2018-05-29T10:28:31.485650Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 7/4'
2018-05-29T10:28:36.769742Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 8/4'
2018-05-29T10:28:42.085391Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 9/4'
2018-05-29T10:28:47.353870Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 10/4'
2018-05-29T10:28:52.620113Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 11/4'
2018-05-29T10:28:57.872705Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 12/4'
2018-05-29T10:29:03.253693Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 13/4'
2018-05-29T10:29:08.520361Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 14/4'
2018-05-29T10:29:13.824429Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 15/4'
2018-05-29T10:29:19.346954Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 16/4'
2018-05-29T10:29:24.616792Z 16 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 17/4'
Occurrences of 'group recovery connection with another donor. Attempt' in the input file: 16
group_replication.gr_recovery_count_messup [ fail ]

How to repeat:
Run command:- $ ./mtr group_replication.gr_recovery_count_messup.test

Where,
$ cat gr_recovery_count_messup.test
##############################################################################
--let $group_replication_group_name=aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
--source ../inc/have_group_replication_plugin.inc
--let $rpl_skip_group_replication_start= 1
--source ../inc/group_replication.inc

--echo
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
--source ../inc/start_and_bootstrap_group_replication.inc
CREATE TABLE t1 (a INT PRIMARY KEY);

--echo
--let $rpl_connection_name= server2
--source include/rpl_connection.inc
SET SQL_LOG_BIN=0;
CREATE TABLE t1 (a INT PRIMARY KEY);
SET SQL_LOG_BIN=1;

#SET GLOBAL group_replication_exit_state_action= READ_ONLY;
SET GLOBAL group_replication_recovery_reconnect_interval= 5;
SET GLOBAL group_replication_recovery_retry_count= 10;

eval SET GLOBAL group_replication_group_name=
"$group_replication_group_name";

START GROUP_REPLICATION;
--sleep 20

--echo
--let $grep_file= $MYSQLTEST_VARDIR/log/mysqld.2.err
--let $grep_pattern = group recovery connection with another donor. Attempt
--let $grep_output = print_each
--source include/grep_pattern.inc

--echo
--echo # We are trying to limit the rety count, but, by mistake it is
--echo # set to a value which is aleady attempted within 20 secs.
SET GLOBAL group_replication_recovery_retry_count= 4;

--echo
--echo # Now we will wait for another 60 seconds to ensure that recovery
--echo # count try > 10 and is not planning to stop.
--sleep 60

--echo
--echo # Also, check the Attempt values it is something like X/4 (where X>4).
--let $grep_file= $MYSQLTEST_VARDIR/log/mysqld.2.err
--let $grep_pattern = group recovery connection with another donor. Attempt
--let $grep_output = print_each
--source include/grep_pattern.inc

--die "OK TO STOP HERE"
[6 Sep 2018 15:41] David Moss
Posted by developer:
 
Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 5.7.24 / 8.0.13 changelog:
If group_replication_recovery_retry_count variable was modified while the member was already making a reconnection attempt, the connection attempt could enter an infinite loop.