Description:
I have 3 node MySQL group replication cluster, with AFTER consistency.
When a node is abruptly shut down or disconnected from cluster, ongoing transactions on primary are rolled back due to certification failure. However, transactions are applied on secondary node. There are no writes done on secondary node. This will lead to data inconsistency issues wherein a client connected to master will see connection terminated from primary (if primary goes down), but the data would actually be present in cluster since secondary node had applied the transaction.
I verified this for following scenarios:
1. one of secondary nodes is shut down abruptly
2. primary is shut down abruptly
In both cases, secondary nodes had more transactions than primary. I verified his using gtid-executed. Eg:
Old: efa6f74a-73f0-11ee-8925-0a67e5184ce8:1,
fa4e6db0-3475-46c9-8e9c-2fab646ed636:1-1584925:2492207-2511313
New: efa6f74a-73f0-11ee-8925-0a67e5184ce8:1,
fa4e6db0-3475-46c9-8e9c-2fab646ed636:1-1584935:2492207-2511313
Configuration:
| group_replication_advertise_recovery_endpoints | DEFAULT |
| group_replication_allow_local_lower_version_join | OFF |
| group_replication_auto_increment_increment | 7 |
| group_replication_autorejoin_tries | 3 |
| group_replication_bootstrap_group | OFF |
| group_replication_clone_threshold | 9223372036854775807 |
| group_replication_communication_debug_options | GCS_DEBUG_NONE |
| group_replication_communication_max_message_size | 10485760 |
| group_replication_components_stop_timeout | 31536000 |
| group_replication_compression_threshold | 1000000 |
| group_replication_consistency | AFTER |
| group_replication_enforce_update_everywhere_checks | OFF |
| group_replication_exit_state_action | READ_ONLY |
| group_replication_flow_control_applier_threshold | 25000 |
| group_replication_flow_control_certifier_threshold | 25000 |
| group_replication_flow_control_hold_percent | 10 |
| group_replication_flow_control_max_quota | 0 |
| group_replication_flow_control_member_quota_percent | 0 |
| group_replication_flow_control_min_quota | 0 |
| group_replication_flow_control_min_recovery_quota | 0 |
| group_replication_flow_control_mode | QUOTA |
| group_replication_flow_control_period | 1 |
| group_replication_flow_control_release_percent | 50 |
| group_replication_force_members | |
| group_replication_group_name | fa4e6db0-3475-46c9-8e9c-2fab646ed636 |
| group_replication_group_seeds | 10.83.54.xx:33061,10.83.57.xx:33061,10.83.38.xx:33061 |
| group_replication_gtid_assignment_block_size | 1000000 |
| group_replication_ip_allowlist | 10.0.0.0/8 |
| group_replication_ip_whitelist | 10.0.0.0/8 |
| group_replication_local_address | 10.83.54.xx:33061 |
| group_replication_member_expel_timeout | 5 |
| group_replication_member_weight | 70 |
| group_replication_message_cache_size | 1073741824 |
| group_replication_poll_spin_loops | 0 |
| group_replication_recovery_complete_at | TRANSACTIONS_APPLIED |
| group_replication_recovery_compression_algorithms | uncompressed |
| group_replication_recovery_get_public_key | OFF |
| group_replication_recovery_public_key_path | |
| group_replication_recovery_reconnect_interval | 60 |
| group_replication_recovery_retry_count | 10 |
| group_replication_recovery_ssl_ca | |
| group_replication_recovery_ssl_capath | |
| group_replication_recovery_ssl_cert | |
| group_replication_recovery_ssl_cipher | |
| group_replication_recovery_ssl_crl | |
| group_replication_recovery_ssl_crlpath | |
| group_replication_recovery_ssl_key | |
| group_replication_recovery_ssl_verify_server_cert | OFF |
| group_replication_recovery_tls_ciphersuites | |
| group_replication_recovery_tls_version | TLSv1,TLSv1.1,TLSv1.2,TLSv1.3 |
| group_replication_recovery_use_ssl | OFF |
| group_replication_recovery_zstd_compression_level | 3 |
| group_replication_single_primary_mode | ON |
| group_replication_ssl_mode | DISABLED |
| group_replication_start_on_boot | OFF |
| group_replication_tls_source | MYSQL_MAIN |
| group_replication_transaction_size_limit | 150000000 |
| group_replication_unreachable_majority_timeout | 0 |
| innodb_replication_delay | 0 |
| replication_optimize_for_static_plugin_config | OFF |
| replication_sender_observe_commit_only | OFF
How to repeat:
1. Set up 3 node group replication cluster
2. set group_replication_consistency=AFTER on all nodes
3. kill mysql process on primary
4. check gtid_executed on old primary and new primary