Description:
In the MGR cluster of three nodes, set group_REPLICATION_exit_state_Action = OFFLINE_MODE. After failover occurs, the original active node is set to offline mode as expected. After the fault occurred, the faulty node was repaired manually:
stop group_replication;
start group_replication;
Such operations. However, the original faulty node is in offline mode and is not automatically adjusted. Manual modification is required.
How to repeat:
-- 1. Before switching
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ONLINE | PRIMARY | 0 | 1 | 0 | 0 | gr_applier |
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
-- The original active node is configured:
-- set global group_replication_exit_state_action='OFFLINE_MODE';
-- set global group_replication_unreachable_majority_timeout=13;
-- set global group_replication_autorejoin_tries=0;
select * from performance_schema.global_variables where
VARIABLE_NAME in ('group_replication_exit_state_action','read_only','super_read_only','offline_mode');
+-------------------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE |
| offline_mode | OFF |
| read_only | OFF |
| super_read_only | OFF |
+-------------------------------------+----------------+
-- 2. Trigger switch
-- 10.0.135.75
iptables -L
iptables -I INPUT -s 10.0.135.76 -ptcp -j DROP
iptables -I INPUT -s 10.0.135.77 -ptcp -j DROP
date # Mon Dec 13 19:57:17 CST 2021
iptables -L
-- 3. After switching state
-- 10.0.135.76(The new master node)
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | UNREACHABLE | PRIMARY | 0 | 0 | 0 | 0 | gr_applier |
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | PRIMARY | 0 | 0 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
-- 10.0.135.75(The owner node)
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ERROR | | 0 | 0 | 0 | 0 | gr_applier |
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | UNREACHABLE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | UNREACHABLE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
+--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ERROR | | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+
select * from performance_schema.global_variables where
VARIABLE_NAME in ('group_replication_exit_state_action','read_only','super_read_only','offline_mode');
+-------------------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE |
| offline_mode | ON |
| read_only | ON |
| super_read_only | ON |
+-------------------------------------+----------------+
(root@127.0.0.1) [performance_schema]> use testdb;
Database changed
(root@127.0.0.1) [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| sbtest1 |
| t1 |
| t2 |
| t3 |
+------------------+
4 rows in set (0.00 sec)
(root@127.0.0.1) [testdb]> insert into t2 values(7);
-- ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement
[mysql@ygsxgud01 ~]$ mysql -usysbench -psysbench -h10.0.135.75
-- ERROR 3032 (HY000): The server is currently in offline mode
-- 5. Fault Recovery policy
-- 10.0.135.75(The owner node)
iptables -F
iptables -X
date # Mon Dec 13 19:58:54 CST 2021
iptables -L
(root@127.0.0.1) [testdb]> insert into t2 values(7);
-- ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement
mysql --login-path=dbadmin
start group_replication;
-- ERROR 3093 (HY000): The START GROUP_REPLICATION command failed since the group is already running.
stop group_replication;
start group_replication;
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | PRIMARY | 0 | 1 | 13603 | 0 | gr_applier |
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ONLINE | SECONDARY | 0 | 1 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 1 | 13603 | 0 | gr_applier |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
-- Group Replication is still in the Offline state after being restarted and needs to be manually handled
+-------------------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE |
| offline_mode | ON |
| read_only | ON |
| super_read_only | ON |
+-------------------------------------+----------------+
set global offline_mode=off;
+-------------------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE |
| offline_mode | OFF |
| read_only | ON |
| super_read_only | ON |
+-------------------------------------+----------------+
Suggested fix:
After Group Replication is restarted, the offline_mode mode should be adjusted when Group Replication exits.