Description:
In the MGR cluster of three nodes, set group_REPLICATION_exit_state_Action = OFFLINE_MODE. After failover occurs, the original active node is set to offline mode as expected. After the fault occurred, the faulty node was repaired manually:
stop group_replication;
start group_replication;
Such operations. However, the original faulty node is in offline mode and is not automatically adjusted. Manual modification is required.
How to repeat:
-- 1. Before switching
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ONLINE | PRIMARY | 0 | 1 | 0 | 0 | gr_applier |
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
-- The original active node is configured:
-- set global group_replication_exit_state_action='OFFLINE_MODE';
-- set global group_replication_unreachable_majority_timeout=13;
-- set global group_replication_autorejoin_tries=0;
select * from performance_schema.global_variables where
VARIABLE_NAME in ('group_replication_exit_state_action','read_only','super_read_only','offline_mode');
+-------------------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE |
| offline_mode | OFF |
| read_only | OFF |
| super_read_only | OFF |
+-------------------------------------+----------------+
-- 2. Trigger switch
-- 10.0.135.75
iptables -L
iptables -I INPUT -s 10.0.135.76 -ptcp -j DROP
iptables -I INPUT -s 10.0.135.77 -ptcp -j DROP
date # Mon Dec 13 19:57:17 CST 2021
iptables -L
-- 3. After switching state
-- 10.0.135.76(The new master node)
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | UNREACHABLE | PRIMARY | 0 | 0 | 0 | 0 | gr_applier |
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | PRIMARY | 0 | 0 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
-- 10.0.135.75(The owner node)
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ERROR | | 0 | 0 | 0 | 0 | gr_applier |
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | UNREACHABLE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | UNREACHABLE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+
+--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ERROR | | 0 | 0 | 0 | 0 | gr_applier |
+--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+
select * from performance_schema.global_variables where
VARIABLE_NAME in ('group_replication_exit_state_action','read_only','super_read_only','offline_mode');
+-------------------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE |
| offline_mode | ON |
| read_only | ON |
| super_read_only | ON |
+-------------------------------------+----------------+
(root@127.0.0.1) [performance_schema]> use testdb;
Database changed
(root@127.0.0.1) [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| sbtest1 |
| t1 |
| t2 |
| t3 |
+------------------+
4 rows in set (0.00 sec)
(root@127.0.0.1) [testdb]> insert into t2 values(7);
-- ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement
[mysql@ygsxgud01 ~]$ mysql -usysbench -psysbench -h10.0.135.75
-- ERROR 3032 (HY000): The server is currently in offline mode
-- 5. Fault Recovery policy
-- 10.0.135.75(The owner node)
iptables -F
iptables -X
date # Mon Dec 13 19:58:54 CST 2021
iptables -L
(root@127.0.0.1) [testdb]> insert into t2 values(7);
-- ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement
mysql --login-path=dbadmin
start group_replication;
-- ERROR 3093 (HY000): The START GROUP_REPLICATION command failed since the group is already running.
stop group_replication;
start group_replication;
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
| ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | PRIMARY | 0 | 1 | 13603 | 0 | gr_applier |
| c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ONLINE | SECONDARY | 0 | 1 | 0 | 0 | gr_applier |
| cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 1 | 13603 | 0 | gr_applier |
+--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+
-- Group Replication is still in the Offline state after being restarted and needs to be manually handled
+-------------------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE |
| offline_mode | ON |
| read_only | ON |
| super_read_only | ON |
+-------------------------------------+----------------+
set global offline_mode=off;
+-------------------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------------------------+----------------+
| group_replication_exit_state_action | OFFLINE_MODE |
| offline_mode | OFF |
| read_only | ON |
| super_read_only | ON |
+-------------------------------------+----------------+
Suggested fix:
After Group Replication is restarted, the offline_mode mode should be adjusted when Group Replication exits.
Description: In the MGR cluster of three nodes, set group_REPLICATION_exit_state_Action = OFFLINE_MODE. After failover occurs, the original active node is set to offline mode as expected. After the fault occurred, the faulty node was repaired manually: stop group_replication; start group_replication; Such operations. However, the original faulty node is in offline mode and is not automatically adjusted. Manual modification is required. How to repeat: -- 1. Before switching +--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+ | master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name | +--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+ | c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ONLINE | PRIMARY | 0 | 1 | 0 | 0 | gr_applier | | ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier | | cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier | +--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+ -- The original active node is configured: -- set global group_replication_exit_state_action='OFFLINE_MODE'; -- set global group_replication_unreachable_majority_timeout=13; -- set global group_replication_autorejoin_tries=0; select * from performance_schema.global_variables where VARIABLE_NAME in ('group_replication_exit_state_action','read_only','super_read_only','offline_mode'); +-------------------------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +-------------------------------------+----------------+ | group_replication_exit_state_action | OFFLINE_MODE | | offline_mode | OFF | | read_only | OFF | | super_read_only | OFF | +-------------------------------------+----------------+ -- 2. Trigger switch -- 10.0.135.75 iptables -L iptables -I INPUT -s 10.0.135.76 -ptcp -j DROP iptables -I INPUT -s 10.0.135.77 -ptcp -j DROP date # Mon Dec 13 19:57:17 CST 2021 iptables -L -- 3. After switching state -- 10.0.135.76(The new master node) +--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+ | master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name | +--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+ | c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | UNREACHABLE | PRIMARY | 0 | 0 | 0 | 0 | gr_applier | | ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier | | cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier | +--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+ +--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+ | master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name | +--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+ | ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | PRIMARY | 0 | 0 | 0 | 0 | gr_applier | | cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier | +--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+ -- 10.0.135.75(The owner node) +--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+ | master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name | +--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+ | c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ERROR | | 0 | 0 | 0 | 0 | gr_applier | | ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | UNREACHABLE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier | | cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | UNREACHABLE | SECONDARY | 0 | 0 | 0 | 0 | gr_applier | +--------------------------------------+-------------+--------+-------------+-----------+----------+---------+-------+----------+------------+ +--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+ | master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name | +--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+ | c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ERROR | | 0 | 0 | 0 | 0 | gr_applier | +--------------------------------------+-------------+--------+---------+---------+----------+---------+-------+----------+------------+ select * from performance_schema.global_variables where VARIABLE_NAME in ('group_replication_exit_state_action','read_only','super_read_only','offline_mode'); +-------------------------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +-------------------------------------+----------------+ | group_replication_exit_state_action | OFFLINE_MODE | | offline_mode | ON | | read_only | ON | | super_read_only | ON | +-------------------------------------+----------------+ (root@127.0.0.1) [performance_schema]> use testdb; Database changed (root@127.0.0.1) [testdb]> show tables; +------------------+ | Tables_in_testdb | +------------------+ | sbtest1 | | t1 | | t2 | | t3 | +------------------+ 4 rows in set (0.00 sec) (root@127.0.0.1) [testdb]> insert into t2 values(7); -- ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement [mysql@ygsxgud01 ~]$ mysql -usysbench -psysbench -h10.0.135.75 -- ERROR 3032 (HY000): The server is currently in offline mode -- 5. Fault Recovery policy -- 10.0.135.75(The owner node) iptables -F iptables -X date # Mon Dec 13 19:58:54 CST 2021 iptables -L (root@127.0.0.1) [testdb]> insert into t2 values(7); -- ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement mysql --login-path=dbadmin start group_replication; -- ERROR 3093 (HY000): The START GROUP_REPLICATION command failed since the group is already running. stop group_replication; start group_replication; +--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+ | master_uuid | member_host | mgrpot | mgrstat | mgrrole | cfct_qsz | apl_qsz | ckd_t | cfct_dtd | chan_name | +--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+ | ca26d8f2-40a3-11ec-a3a0-0050569bbf6d | 10.0.135.76 | 3306 | ONLINE | PRIMARY | 0 | 1 | 13603 | 0 | gr_applier | | c8a7b448-40a3-11ec-a514-0050569be145 | 10.0.135.75 | 3306 | ONLINE | SECONDARY | 0 | 1 | 0 | 0 | gr_applier | | cb9a5813-40a3-11ec-a099-0050569bdf16 | 10.0.135.77 | 3306 | ONLINE | SECONDARY | 0 | 1 | 13603 | 0 | gr_applier | +--------------------------------------+-------------+--------+---------+-----------+----------+---------+-------+----------+------------+ -- Group Replication is still in the Offline state after being restarted and needs to be manually handled +-------------------------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +-------------------------------------+----------------+ | group_replication_exit_state_action | OFFLINE_MODE | | offline_mode | ON | | read_only | ON | | super_read_only | ON | +-------------------------------------+----------------+ set global offline_mode=off; +-------------------------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +-------------------------------------+----------------+ | group_replication_exit_state_action | OFFLINE_MODE | | offline_mode | OFF | | read_only | ON | | super_read_only | ON | +-------------------------------------+----------------+ Suggested fix: After Group Replication is restarted, the offline_mode mode should be adjusted when Group Replication exits.