Bug #105877 group replication node always in offline mode after switch
Submitted: 13 Dec 2021 12:05 Modified: 24 Dec 2021 12:43
Reporter: xichao li (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version: OS:Red Hat
Assigned to: CPU Architecture:Any

[13 Dec 2021 12:05] xichao li
Description:
In the MGR cluster of three nodes, set group_REPLICATION_exit_state_Action = OFFLINE_MODE. After failover occurs, the original active node is set to offline mode as expected. After the fault occurred, the faulty node was repaired manually:
stop group_replication;
start group_replication;
Such operations. However, the original faulty node is in offline mode and is not automatically adjusted. Manual modification is required.

How to repeat:
-- 1. Before switching
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| master_uuid                          | member_host | mgrpot | mgrstat | mgrrole   | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name  |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 |   3306 | ONLINE  | PRIMARY   |        0 |       1 |     0 |        0 | gr_applier |
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 |   3306 | ONLINE  | SECONDARY |        0 |       0 |     0 |        0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 |   3306 | ONLINE  | SECONDARY |        0 |       0 |     0 |        0 | gr_applier |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+

-- The original active node is configured:
-- set global group_replication_exit_state_action='OFFLINE_MODE';
-- set global group_replication_unreachable_majority_timeout=13;
-- set global group_replication_autorejoin_tries=0;

select * from performance_schema.global_variables where
 VARIABLE_NAME in ('group_replication_exit_state_action','read_only','super_read_only','offline_mode');
+-------------------------------------+----------------+
| VARIABLE_NAME                       | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE   |
| offline_mode                        | OFF            |
| read_only                           | OFF            |
| super_read_only                     | OFF            |
+-------------------------------------+----------------+

-- 2. Trigger switch
-- 10.0.135.75
iptables -L

iptables -I INPUT -s 10.0.135.76 -ptcp -j DROP
iptables -I INPUT -s 10.0.135.77 -ptcp -j DROP
date # Mon Dec 13 19:57:17 CST 2021
iptables -L

-- 3. After switching state
--  10.0.135.76(The new master node)
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| master_uuid                          | member_host | mgrpot | mgrstat     | mgrrole   | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name  |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 |   3306 | UNREACHABLE | PRIMARY   |        0 |       0 |     0 |        0 | gr_applier |
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 |   3306 | ONLINE      | SECONDARY |        0 |       0 |     0 |        0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 |   3306 | ONLINE      | SECONDARY |        0 |       0 |     0 |        0 | gr_applier |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| master_uuid                          | member_host | mgrpot | mgrstat | mgrrole   | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name  |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 |   3306 | ONLINE  | PRIMARY   |        0 |       0 |     0 |        0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 |   3306 | ONLINE  | SECONDARY |        0 |       0 |     0 |        0 | gr_applier |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+

-- 10.0.135.75(The owner node)
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| master_uuid                          | member_host | mgrpot | mgrstat     | mgrrole   | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name  |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 |   3306 | ERROR       |           |        0 |       0 |     0 |        0 | gr_applier |
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 |   3306 | UNREACHABLE | SECONDARY |        0 |       0 |     0 |        0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 |   3306 | UNREACHABLE | SECONDARY |        0 |       0 |     0 |        0 | gr_applier |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
+--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+
| master_uuid                          | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name  |
+--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 |   3306 | ERROR   |         |        0 |       0 |     0 |        0 | gr_applier |
+--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+

select * from performance_schema.global_variables where
 VARIABLE_NAME in ('group_replication_exit_state_action','read_only','super_read_only','offline_mode');
+-------------------------------------+----------------+
| VARIABLE_NAME                       | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE   |
| offline_mode                        | ON             |
| read_only                           | ON             |
| super_read_only                     | ON             |
+-------------------------------------+----------------+

(root@127.0.0.1) [performance_schema]> use testdb;
Database changed
(root@127.0.0.1) [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| sbtest1          |
| t1               |
| t2               |
| t3               |
+------------------+
4 rows in set (0.00 sec)

(root@127.0.0.1) [testdb]> insert into t2 values(7);
-- ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

[mysql@ygsxgud01 ~]$ mysql -usysbench -psysbench -h10.0.135.75
-- ERROR 3032 (HY000): The server is currently in offline mode

-- 5. Fault Recovery policy
-- 10.0.135.75(The owner node)
iptables -F
iptables -X
date # Mon Dec 13 19:58:54 CST 2021
iptables -L

(root@127.0.0.1) [testdb]> insert into t2 values(7);
-- ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

mysql --login-path=dbadmin

start group_replication;
-- ERROR 3093 (HY000): The START GROUP_REPLICATION command failed since the group is already running.
stop group_replication;
start group_replication;

+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| master_uuid                          | member_host | mgrpot | mgrstat | mgrrole   | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name  |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 |   3306 | ONLINE  | PRIMARY   |        0 |       1 | 13603 |        0 | gr_applier |
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 |   3306 | ONLINE  | SECONDARY |        0 |       1 |     0 |        0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 |   3306 | ONLINE  | SECONDARY |        0 |       1 | 13603 |        0 | gr_applier |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+

-- Group Replication is still in the Offline state after being restarted and needs to be manually handled
+-------------------------------------+----------------+
| VARIABLE_NAME                       | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE   |
| offline_mode                        | ON             |  
| read_only                           | ON             |
| super_read_only                     | ON             |
+-------------------------------------+----------------+

set global offline_mode=off;

+-------------------------------------+----------------+
| VARIABLE_NAME                       | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE   |
| offline_mode                        | OFF            |
| read_only                           | ON             |
| super_read_only                     | ON             |
+-------------------------------------+----------------+

Suggested fix:
After Group Replication is restarted, the offline_mode mode should be adjusted when Group Replication exits.
[24 Dec 2021 12:43] MySQL Verification Team
Thanks for the report. I reproduced the behavior.