Why is group_replication_consistency = AFTER crashing a Group Replication cluster? =================================================================================== Scenario: 2 DC, DC1 production DC2 Disaster Recovery. DC2 replicate from DC1 using Asynchronous replication and Asynchronous Connection Failover. ProxySQL is used for routing the request to the active node only for DC1. sysbench as app to provide some traffic. The following is the specific group replication configuration: ###################################### #Group Replication ###################################### plugin_load_add ='group_replication.so' plugin-load-add ='mysql_clone.so' group_replication_start_on_boot =off group_replication_group_name ="dc1aaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" group_replication_local_address = "10.0.0.14:33061" group_replication_group_seeds = "10.0.0.14:33061,10.0.0.36:33061,10.0.0.81:33061" group_replication_bootstrap_group = off group_replication_ip_allowlist = "10.0.0.0/24,192.168.0.0/24" # from 8.0.27 group_replication_paxos_single_leader = on group_replication_auto_increment_increment = 1 group_replication_communication_max_message_size = 10485760 group_replication_autorejoin_tries = 10 group_replication_consistency = AFTER group_replication_flow_control_period = 10 group_replication_flow_control_hold_percent = 25 group_replication_flow_control_release_percent = 50 group_replication_member_expel_timeout = 20 What happens When running minimal load (2 threads executing R/W operations), we put down the running Primary with a normal shut-down, wait for production to move to new node. After wait few minutes we restart the node we stopped, once the node is up and running we start group_replication on the node again. The node will fail to join given an error, and sometime the whole cluster fails. Let see: node1-DC1 [((none))]>select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 21c596e8-34c9-11ec-b466-06bac859e226 | ip-10-0-0-14 | 3307 | ONLINE | PRIMARY | 8.0.27 | XCom | | group_replication_applier | 52ac8665-417a-11ec-a4a3-0e5b19796bba | ip-10-0-0-81 | 3307 | ONLINE | SECONDARY | 8.0.27 | XCom | | group_replication_applier | 8cdddc65-4179-11ec-a51a-0af0a299c25a | ip-10-0-0-36 | 3307 | ONLINE | SECONDARY | 8.0.27 | XCom | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ 3 rows in set (0.00 sec) On Stop: 2021-11-30T11:07:37.982774Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user . Shutting down mysqld (Version: 8.0.27). 2021-11-30T11:07:41.876261Z 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.' 2021-11-30T11:07:51.891885Z 0 [Warning] [MY-010909] [Server] /opt/mysql_templates/mysql-8P/bin/mysqld: Forcing close of thread 584 user: 'app_test'. 2021-11-30T11:07:51.891960Z 0 [Warning] [MY-010909] [Server] /opt/mysql_templates/mysql-8P/bin/mysqld: Forcing close of thread 107494 user: 'app_test'. 2021-11-30T11:08:42.895440Z 0 [Warning] [MY-011630] [Repl] Plugin group_replication reported: 'Due to a plugin error, some transactions were unable to be certified and will now rollback.' 2021-11-30T11:08:42.895520Z 584 [ERROR] [MY-011615] [Repl] Plugin group_replication reported: 'Error while waiting for conflict detection procedure to finish on session 584' 2021-11-30T11:08:42.895553Z 584 [ERROR] [MY-010207] [Repl] Run function 'before_commit' in plugin 'group_replication' failed 2021-11-30T11:08:42.895562Z 107494 [ERROR] [MY-011615] [Repl] Plugin group_replication reported: 'Error while waiting for conflict detection procedure to finish on session 107494' 2021-11-30T11:08:42.895599Z 107494 [ERROR] [MY-010207] [Repl] Run function 'before_commit' in plugin 'group_replication' failed 2021-11-30T11:08:42.896674Z 0 [System] [MY-011651] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' has been stopped.' 2021-11-30T11:09:17.860333Z 0 [System] [MY-010910] [Server] /opt/mysql_templates/mysql-8P/bin/mysqld: Shutdown complete (mysqld 8.0.27) MySQL Community Server - GPL. We have errors. Restart -------- No error in the log. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. node1-DC1 [((none))]>select * from performance_schema.replication_group_members; +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | | | NULL | OFFLINE | | | | +---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+----------------------------+ 1 row in set (0.00 sec) node1-DC1 [((none))]>SHOW GLOBAL VARIABLES LIKE 'group_replication_consistency'; +-------------------------------+-------+ | Variable_name | Value | +-------------------------------+-------+ | group_replication_consistency | AFTER | +-------------------------------+-------+ 1 row in set (0.00 sec) node1-DC1 [((none))]>show global variables like 'gtid_executed'; +---------------+-------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------------------------------------------------------------------------+ | gtid_executed | 21c596e8-34c9-11ec-b466-06bac859e226:1-32865, dc1aaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-214379:1039352-1092732 | +---------------+-------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) Who is now the Primary? node3-DC1 [((none))]>select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 52ac8665-417a-11ec-a4a3-0e5b19796bba | ip-10-0-0-81 | 3307 | ONLINE | PRIMARY | 8.0.27 | XCom | | group_replication_applier | 8cdddc65-4179-11ec-a51a-0af0a299c25a | ip-10-0-0-36 | 3307 | ONLINE | SECONDARY | 8.0.27 | XCom | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ 2 rows in set (0.00 sec) node3-DC1 [((none))]>show global variables like 'gtid_executed'; +---------------+-------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------------------------------------------------------------------------+ | gtid_executed | 21c596e8-34c9-11ec-b466-06bac859e226:1-32865, dc1aaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-214381:1039352-1098085 | +---------------+-------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) So node3 Start group_replication on node1: node2-DC1 [((none))]>select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 21c596e8-34c9-11ec-b466-06bac859e226 | ip-10-0-0-14 | 3307 | RECOVERING | SECONDARY | 8.0.27 | XCom | | group_replication_applier | 52ac8665-417a-11ec-a4a3-0e5b19796bba | ip-10-0-0-81 | 3307 | ONLINE | PRIMARY | 8.0.27 | XCom | | group_replication_applier | 8cdddc65-4179-11ec-a51a-0af0a299c25a | ip-10-0-0-36 | 3307 | ONLINE | SECONDARY | 8.0.27 | XCom | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ But on Node1: 2021-11-30T11:19:18.911366Z 62 [System] [MY-010562] [Repl] Slave I/O thread for channel 'group_replication_recovery': connected to master 'replica@ip-10-0-0-36:3307',replication started in log 'FIRST' at position 4 2021-11-30T11:20:49.064735Z 61 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='ip-10-0-0-36', master_port= 3307, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. 2021-11-30T11:21:00.378775Z 0 [System] [MY-011490] [Repl] Plugin group_replication reported: 'This server was declared online within the replication group.' 2021-11-30T11:21:00.390737Z 46 [ERROR] [MY-013309] [Repl] Plugin group_replication reported: 'Transaction '2:215270' does not exist on Group Replication consistency manager while receiving remote transaction prepare.' 2021-11-30T11:21:00.390792Z 46 [ERROR] [MY-011452] [Repl] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.' 2021-11-30T11:21:00.390857Z 46 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.' 2021-11-30T11:21:04.220312Z 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.' The node failed to join node1-DC1 [((none))]>stop group_replication; Query OK, 0 rows affected (1.00 sec) Then I set the: node1-DC1 [((none))]>SET GLOBAL group_replication_consistency='EVENTUAL'; Query OK, 0 rows affected (0.00 sec) Restart Group_replication node1-DC1 [((none))]>start group_replication; Query OK, 0 rows affected (8.19 sec) node1-DC1 [((none))]>select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 21c596e8-34c9-11ec-b466-06bac859e226 | ip-10-0-0-14 | 3307 | ONLINE | SECONDARY | 8.0.27 | XCom | | group_replication_applier | 52ac8665-417a-11ec-a4a3-0e5b19796bba | ip-10-0-0-81 | 3307 | ONLINE | PRIMARY | 8.0.27 | XCom | | group_replication_applier | 8cdddc65-4179-11ec-a51a-0af0a299c25a | ip-10-0-0-36 | 3307 | ONLINE | SECONDARY | 8.0.27 | XCom | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+----------------------------+ 3 rows in set (0.00 sec) 2021-11-30T11:21:04.220312Z 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.' 2021-11-30T11:22:43.729651Z 12 [System] [MY-011651] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' has been stopped.' 2021-11-30T11:23:13.217884Z 12 [System] [MY-013587] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' is starting.' 2021-11-30T11:23:13.219367Z 12 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the allowlist. It is mandatory that it is added.' 2021-11-30T11:23:13.219395Z 12 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv6 localhost address to the allowlist. It is mandatory that it is added.' 2021-11-30T11:23:13.220106Z 105 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='', master_port= 0, master_log_file='', master_log_pos= 1058802, master_bind=''. New state master_host='', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. 2021-11-30T11:23:20.219959Z 0 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Shutting down an outgoing connection. This happens because something might be wrong on a bi-directional connection to node 10.0.0.81:33061. Please check the connection status to this member' 2021-11-30T11:23:20.400974Z 12 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address ip-10-0-0-81:3307.' 2021-11-30T11:23:21.401347Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor' 2021-11-30T11:23:21.401552Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to ip-10-0-0-14:3307, ip-10-0-0-81:3307, ip-10-0-0-36:3307 on view 16378602376394822:11.' 2021-11-30T11:23:21.426280Z 120 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='ip-10-0-0-81', master_port= 3307, master_log_file='', master_log_pos= 4, master_bind=''. 2021-11-30T11:23:21.447367Z 121 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2021-11-30T11:23:21.449767Z 121 [System] [MY-010562] [Repl] Slave I/O thread for channel 'group_replication_recovery': connected to master 'replica@ip-10-0-0-81:3307',replication started in log 'FIRST' at position 4 2021-11-30T11:23:35.272964Z 120 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='ip-10-0-0-81', master_port= 3307, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. 2021-11-30T11:23:36.712986Z 0 [System] [MY-011490] [Repl] Plugin group_replication reported: 'This server was declared online within the replication group.' It successfully joins the cluster. What are the expectations? 1) When th enode is stopped no issue in closing the pending threads and certifications. Why this: [ERROR] [MY-010207] [Repl] Run function 'before_commit' in plugin 'group_replication' failed 2) When start node should not fail to rejoin the group if group_replication_consistency=AFTER 3) if group_replication_consistency=AFTER is not supported in joining the cluster, the Node should automatically shift to a supported level and once joined move back to the one declared in the configuration.