| Bug #112368 | Mismatched SQL for the same GTID transaction across primary and secondary | ||
|---|---|---|---|
| Submitted: | 17 Sep 2023 14:39 | Modified: | 21 Sep 2023 1:44 |
| Reporter: | huang rong | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server: Group Replication | Severity: | S1 (Critical) |
| Version: | 5.7.36 | OS: | Debian (Debian GNU/Linux 10 (buster)) |
| Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
| Tags: | GTID mismatch sql, InnoDB Cluster | ||
[17 Sep 2023 14:40]
huang rong
On further inspection, I observed that the transaction with GTID '82e0638a-e968-11ec-9a76-9cdc71bc2950:774243853' had different content on db7 compared to db8 and db9. db7 showed an INSERT operation on the stick.data_platform table. ------------ db7 -------------------- # at 7732 #230911 3:45:28 server id 10131714 end_log_pos 7793 GTID last_committed=1 sequence_number=16 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= '82e0638a-e968-11ec-9a76-9cdc71bc2950:774243853'/*!*/; # at 7793 #230911 3:45:28 server id 10131714 end_log_pos 7871 Query thread_id=112 exec_time=0 error_code=0 SET TIMESTAMP=1694418328/*!*/; BEGIN /*!*/; # at 7871 #230911 3:45:28 server id 10131714 end_log_pos 7952 Table_map: `stick`.`data_platform` mapped to number 125 # at 7952 #230911 3:45:28 server id 10131714 end_log_pos 8048 Write_rows: table id 125 flags: STMT_END_F BINLOG ' mMX+ZBMCmZoAUQAAABAfAAAAAH0AAAAAAAEACXN0aWNreWFkcwAVZGF0YV9wbGF0Zm9ybV9tZXRy aWNzAA0D/gEDAwMDCAgICBEDA/cBAAAA mMX+ZB4CmZoAYAAAAHAfAAAAAH0AAAAAAAEAAgAN//8A4AEJAAAGAQEAAAAEAAAAAQAAAAMAAAAD AAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAAAAAGT+xZiJOm0Y '/*!*/; ### INSERT INTO `stick`.`data_platform` ### SET ### @1=2305 /* INT meta=0 nullable=0 is_null=0 */ ### @2=6 /* ENUM(1 byte) meta=63233 nullable=0 is_null=0 */ ### @3=1 /* TINYINT meta=0 nullable=0 is_null=0 */ ### @4=1 /* INT meta=0 nullable=0 is_null=0 */ ### @5=4 /* INT meta=0 nullable=0 is_null=0 */ ### @6=1 /* INT meta=0 nullable=0 is_null=0 */ ### @7=3 /* INT meta=0 nullable=0 is_null=0 */ ### @8=3 /* LONGINT meta=0 nullable=0 is_null=0 */ ### @9=0 /* LONGINT meta=0 nullable=0 is_null=0 */ ### @10=3 /* LONGINT meta=0 nullable=0 is_null=0 */ ### @11=0 /* LONGINT meta=0 nullable=0 is_null=0 */ ### @12=1694418328 /* TIMESTAMP(0) meta=0 nullable=0 is_null=0 */ ### @13=409811593 /* INT meta=0 nullable=0 is_null=0 */ # at 8048 #230911 3:45:28 server id 10131714 end_log_pos 8075 Xid = 771 COMMIT/*!*/; In contrast, db8 showed an INSERT operation on the stick.external table. ----------------- db8 ------------------- # at 211879926 #230911 3:27:59 server id 10131714 end_log_pos 211879987 GTID last_committed=65000 sequence_number=65004 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= '82e0638a-e968-11ec-9a76-9cdc71bc2950:774243853'/*!*/; # at 211879987 #230911 3:27:59 server id 10131714 end_log_pos 211880051 Query thread_id=1696379532 exec_time=166 error_code=0 SET TIMESTAMP=1694417279/*!*/; /*!\C utf8mb4 *//*!*/; SET @@session.character_set_client=45,@@session.collation_connection=45,@@session.collation_server=8/*!*/; BEGIN /*!*/; # at 211880051 #230911 3:27:59 server id 10131714 end_log_pos 211880121 Table_map: `stick`.`external` mapped to number 110 # at 211880121 #230911 3:27:59 server id 10131714 end_log_pos 211880180 Write_rows: table id 110 flags: STMT_END_F BINLOG ' f8H+ZBMCmZoARgAAALkIoQwAAG4AAAAAAAEACXN0aWNreWFkcwALZXh0ZXJuYWxfYWQACQ/8/AEJ ARL8AQb8AwMDAAPGAQ== f8H+ZB4CmZoAOwAAAPQIoQwAAG4AAAAAAAEAAgAJ///G/xIAMTIzNzQxMF8yNzYzMDAxNjE2AJIW AAA= '/*!*/; ### INSERT INTO `stick`.`external` ### SET ### @1='1237410_2763001616' /* VARSTRING(1020) meta=1020 nullable=0 is_null=0 */ ### @2=NULL /* MEDIUMBLOB/MEDIUMTEXT meta=3 nullable=1 is_null=1 */ ### @3=NULL /* MEDIUMBLOB/MEDIUMTEXT meta=3 nullable=1 is_null=1 */ ### @4=0 /* TINYINT meta=0 nullable=0 is_null=0 */ ### @5=5778 /* MEDIUMINT meta=0 nullable=0 is_null=0 */ ### @6=0 /* TINYINT meta=0 nullable=0 is_null=0 */ ### @7=NULL /* DATETIME(0) meta=0 nullable=1 is_null=1 */ ### @8=NULL /* MEDIUMBLOB/MEDIUMTEXT meta=3 nullable=1 is_null=1 */ ### @9=NULL /* TINYINT meta=0 nullable=1 is_null=1 */ # at 211880180 #230911 3:27:59 server id 10131714 end_log_pos 211880207 Xid = 5027513433 COMMIT/*!*/;
[17 Sep 2023 14:59]
huang rong
error.log of db9
Attachment: error.log.db9 (application/octet-stream, text), 56.75 KiB.
[17 Sep 2023 14:59]
huang rong
error.log of db8
Attachment: error.log.db8 (application/octet-stream, text), 59.01 KiB.
[17 Sep 2023 15:00]
huang rong
error.log of db7
Attachment: error.log.db7.sql (application/octet-stream, text), 55.75 KiB.
[17 Sep 2023 15:02]
huang rong
my.cnf of db7 ------------- # # Ansible managed # [mysql_safe] pid_file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock nice = 0 [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld] query_cache_type = 1 query_cache_size = 134217728 default_password_lifetime=0 relay-log=db7-fw-us-east-relay-bin disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY" lower-case-table-names=1 innodb_flush_log_at_trx_commit = 1 sync_binlog=1 skip-slave-start slave_parallel_workers=4 slave_preserve_commit_order=1 slave_parallel_type=LOGICAL_CLOCK tmp_table_size = 67108864 thread_stack = 192K skip-name-resolve = 1 innodb_undo_tablespaces = 3 server-id = 10131714 gtid-mode = ON innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:50G read_rnd_buffer_size = 8388608 max_binlog_size = 500M innodb_page_cleaners = 16 innodb_write_io_threads = 16 port = 3306 tmpdir = /var/tmp max_connections = 1500 read_only = 1 innodb_buffer_pool_size = 200GB log-bin = mysql-bin #binlog-ignore-db = mysql binlog_transaction_dependency_tracking=WRITESET innodb_flush_method = O_DIRECT basedir = /usr bind-address = 10.1.1.14 max_allowed_packet = 16M datadir = /var/lib/mysql auto-increment-increment = 1 enforce-gtid-consistency = 1 range_optimizer_max_mem_size = 0 max_connect_errors = 3600 table_open_cache = 6000 slow-query-log = 1 innodb_undo_log_truncate = 1 back_log = 200 explicit_defaults_for_timestamp = 1 innodb_log_file_size = 1024M sql_mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION" innodb_read_io_threads = 16 innodb_status_output_locks = 1 group-concat-max-len = 1024000 read_buffer_size = 4194304 binlog-format = row user = mysql #replicate-do-db = stickyads pid-file = /var/run/mysqld/mysqld.pid join_buffer_size = 8388608 thread_cache_size = 8 innodb_flush_log_at_trx_commit = 1 auto-increment-offset = 2 log-bin-index = mysql-bin.index innodb_buffer_pool_dump_at_shutdown = 0 socket = /var/run/mysqld/mysqld.sock slow-query-log-file = /var/log/mysql/mysqld-slow.log log-error = /var/log/mysql/error.log sort_buffer_size = 4194304 log-slave-updates = 1 innodb_buffer_pool_instances = 16 skip-external-locking = 1 expire_logs_days = 33 net_read_timeout = 28800 net_write_timeout = 28800 super_read_only = ON innodb_buffer_pool_load_at_startup = 0 long_query_time = 1 innodb_undo_logs = 128 innodb_io_capacity = 5000 innodb_max_undo_log_size = 2G key_buffer_size = 128M innodb_io_capacity_max = 10000 binlog_checksum = NONE master_info_repository = TABLE relay_log_info_repository = TABLE transaction_write_set_extraction = XXHASH64 loose_group_replication_allow_local_disjoint_gtids_join = OFF loose_group_replication_allow_local_lower_version_join = OFF loose_group_replication_auto_increment_increment = 7 loose_group_replication_bootstrap_group = OFF #loose_group_replication_bootstrap_group = ON loose_group_replication_components_stop_timeout = 31536000 loose_group_replication_compression_threshold = 1000000 loose_group_replication_enforce_update_everywhere_checks = OFF loose_group_replication_exit_state_action = READ_ONLY loose_group_replication_flow_control_applier_threshold = 25000 loose_group_replication_flow_control_certifier_threshold = 25000 loose_group_replication_flow_control_mode = QUOTA loose_group_replication_force_members = loose_group_replication_group_name = 82e0638a-e968-11ec-9a76-9cdc71bc2950 loose_group_replication_group_seeds = db8:33061,db9:33061 loose_group_replication_gtid_assignment_block_size = 1000000 loose_group_replication_ip_whitelist = AUTOMATIC loose_group_replication_local_address = db7:33061 loose_group_replication_member_weight = 80 loose_group_replication_poll_spin_loops = 0 loose_group_replication_recovery_complete_at = TRANSACTIONS_APPLIED loose_group_replication_recovery_reconnect_interval = 60 loose_group_replication_recovery_retry_count = 10 loose_group_replication_recovery_ssl_ca = loose_group_replication_recovery_ssl_capath = loose_group_replication_recovery_ssl_cert = loose_group_replication_recovery_ssl_cipher = loose_group_replication_recovery_ssl_crl = loose_group_replication_recovery_ssl_crlpath = loose_group_replication_recovery_ssl_key = loose_group_replication_recovery_ssl_verify_server_cert = OFF loose_group_replication_recovery_use_ssl = ON loose_group_replication_single_primary_mode = ON loose_group_replication_ssl_mode = REQUIRED loose_group_replication_start_on_boot = ON loose_group_replication_transaction_size_limit = 0 loose_group_replication_unreachable_majority_timeout = 0 [mysqldump] quick = 1 quote-names = 1 max_allowed_packet = 64M # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored.
[17 Sep 2023 15:02]
huang rong
my.cnf of db8 ------------------ # # Ansible managed # [mysql_safe] pid_file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock nice = 0 [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld] query_cache_type = 1 query_cache_size = 134217728 default_password_lifetime=0 relay-log=db8-fw-us-east-relay-bin disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY" skip-slave-start slave_parallel_workers=4 slave_preserve_commit_order=1 slave_parallel_type=LOGICAL_CLOCK lower-case-table-names=1 innodb_flush_log_at_trx_commit = 1 sync_binlog=1 tmp_table_size = 67108864 thread_stack = 192K skip-name-resolve = 1 innodb_undo_tablespaces = 3 server-id = 10131711 gtid-mode = ON innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:50G #binlog-do-db = stickyads read_rnd_buffer_size = 8388608 max_binlog_size = 500M innodb_page_cleaners = 16 innodb_write_io_threads = 16 port = 3306 tmpdir = /var/tmp max_connections = 1500 read_only = 1 innodb_buffer_pool_size = 200GB log-bin = mysql-bin #binlog-ignore-db = mysql binlog_transaction_dependency_tracking=WRITESET innodb_flush_method = O_DIRECT basedir = /usr bind-address = 10.1.1.11 max_allowed_packet = 16M datadir = /var/lib/mysql auto-increment-increment = 1 enforce-gtid-consistency = 1 range_optimizer_max_mem_size = 0 max_connect_errors = 3600 table_open_cache = 6000 slow-query-log = 1 innodb_undo_log_truncate = 1 back_log = 200 explicit_defaults_for_timestamp = 1 innodb_log_file_size = 1024M sql_mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION" innodb_read_io_threads = 16 innodb_status_output_locks = 1 group-concat-max-len = 1024000 read_buffer_size = 4194304 binlog-format = row user = mysql #replicate-do-db = stickyads pid-file = /var/run/mysqld/mysqld.pid join_buffer_size = 8388608 thread_cache_size = 8 innodb_flush_log_at_trx_commit = 1 auto-increment-offset = 2 log-bin-index = mysql-bin.index innodb_buffer_pool_dump_at_shutdown = 0 socket = /var/run/mysqld/mysqld.sock slow-query-log-file = /var/log/mysql/mysqld-slow.log log-error = /var/log/mysql/error.log sort_buffer_size = 4194304 log-slave-updates = 1 innodb_buffer_pool_instances = 16 skip-external-locking = 1 expire_logs_days = 30 net_read_timeout = 28800 net_write_timeout = 28800 super_read_only = ON innodb_buffer_pool_load_at_startup = 0 long_query_time = 1 innodb_undo_logs = 128 innodb_io_capacity = 5000 innodb_max_undo_log_size = 2G key_buffer_size = 128M innodb_io_capacity_max = 10000 binlog_checksum = NONE master_info_repository = TABLE relay_log_info_repository = TABLE transaction_write_set_extraction = XXHASH64 loose_group_replication_allow_local_disjoint_gtids_join = OFF loose_group_replication_allow_local_lower_version_join = OFF loose_group_replication_auto_increment_increment = 7 loose_group_replication_bootstrap_group = OFF loose_group_replication_components_stop_timeout = 31536000 loose_group_replication_compression_threshold = 1000000 loose_group_replication_enforce_update_everywhere_checks = OFF loose_group_replication_exit_state_action = READ_ONLY loose_group_replication_flow_control_applier_threshold = 25000 loose_group_replication_flow_control_certifier_threshold = 25000 loose_group_replication_flow_control_mode = QUOTA loose_group_replication_force_members = loose_group_replication_group_name = 82e0638a-e968-11ec-9a76-9cdc71bc2950 loose_group_replication_group_seeds = db7:33061,db9:33061 loose_group_replication_gtid_assignment_block_size = 1000000 loose_group_replication_ip_whitelist = AUTOMATIC loose_group_replication_local_address = db8:33061 loose_group_replication_member_weight = 70 loose_group_replication_poll_spin_loops = 0 loose_group_replication_recovery_complete_at = TRANSACTIONS_APPLIED loose_group_replication_recovery_reconnect_interval = 60 loose_group_replication_recovery_retry_count = 10 loose_group_replication_recovery_ssl_ca = loose_group_replication_recovery_ssl_capath = loose_group_replication_recovery_ssl_cert = loose_group_replication_recovery_ssl_cipher = loose_group_replication_recovery_ssl_crl = loose_group_replication_recovery_ssl_crlpath = loose_group_replication_recovery_ssl_key = loose_group_replication_recovery_ssl_verify_server_cert = OFF loose_group_replication_recovery_use_ssl = ON loose_group_replication_single_primary_mode = ON loose_group_replication_ssl_mode = REQUIRED loose_group_replication_start_on_boot = ON loose_group_replication_transaction_size_limit = 0 loose_group_replication_unreachable_majority_timeout = 0 [mysqldump] quick = 1 quote-names = 1 max_allowed_packet = 64M # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. #
[17 Sep 2023 15:03]
huang rong
my.cnf of db9 ------------------------- # # Ansible managed # [mysql_safe] pid_file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock nice = 0 [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld] query_cache_type = 1 query_cache_size = 134217728 default_password_lifetime=0 relay-log=db9-fw-us-east-relay-bin disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY" skip-slave-start slave_parallel_workers=4 slave_preserve_commit_order=1 slave_parallel_type=LOGICAL_CLOCK lower-case-table-names=1 innodb_flush_log_at_trx_commit = 1 sync_binlog=1 tmp_table_size = 67108864 thread_stack = 192K skip-name-resolve = 1 innodb_undo_tablespaces = 3 server-id = 10131712 gtid-mode = ON innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:50G #binlog-do-db = stickyads read_rnd_buffer_size = 8388608 max_binlog_size = 500M innodb_page_cleaners = 16 innodb_write_io_threads = 16 port = 3306 tmpdir = /var/tmp max_connections = 1500 read_only = 1 innodb_buffer_pool_size = 200GB log-bin = mysql-bin #binlog-ignore-db = mysql binlog_transaction_dependency_tracking=WRITESET innodb_flush_method = O_DIRECT basedir = /usr bind-address = 10.1.1.12 max_allowed_packet = 16M datadir = /var/lib/mysql auto-increment-increment = 1 enforce-gtid-consistency = 1 range_optimizer_max_mem_size = 0 max_connect_errors = 3600 table_open_cache = 6000 slow-query-log = 1 innodb_undo_log_truncate = 1 back_log = 200 explicit_defaults_for_timestamp = 1 innodb_log_file_size = 1024M sql_mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION" innodb_read_io_threads = 16 innodb_status_output_locks = 1 group-concat-max-len = 1024000 read_buffer_size = 4194304 binlog-format = row user = mysql #replicate-do-db = stickyads pid-file = /var/run/mysqld/mysqld.pid join_buffer_size = 8388608 thread_cache_size = 8 innodb_flush_log_at_trx_commit = 1 auto-increment-offset = 2 log-bin-index = mysql-bin.index innodb_buffer_pool_dump_at_shutdown = 0 socket = /var/run/mysqld/mysqld.sock slow-query-log-file = /var/log/mysql/mysqld-slow.log log-error = /var/log/mysql/error.log sort_buffer_size = 4194304 log-slave-updates = 1 innodb_buffer_pool_instances = 16 skip-external-locking = 1 expire_logs_days = 33 net_read_timeout = 28800 net_write_timeout = 28800 super_read_only = ON innodb_buffer_pool_load_at_startup = 0 long_query_time = 1 innodb_undo_logs = 128 innodb_io_capacity = 5000 innodb_max_undo_log_size = 2G key_buffer_size = 128M innodb_io_capacity_max = 10000 binlog_checksum = NONE master_info_repository = TABLE relay_log_info_repository = TABLE transaction_write_set_extraction = XXHASH64 loose_group_replication_allow_local_disjoint_gtids_join = OFF loose_group_replication_allow_local_lower_version_join = OFF loose_group_replication_auto_increment_increment = 7 loose_group_replication_bootstrap_group = OFF loose_group_replication_components_stop_timeout = 31536000 loose_group_replication_compression_threshold = 1000000 loose_group_replication_enforce_update_everywhere_checks = OFF loose_group_replication_exit_state_action = READ_ONLY loose_group_replication_flow_control_applier_threshold = 25000 loose_group_replication_flow_control_certifier_threshold = 25000 loose_group_replication_flow_control_mode = QUOTA loose_group_replication_force_members = loose_group_replication_group_name = 82e0638a-e968-11ec-9a76-9cdc71bc2950 loose_group_replication_group_seeds = db7:33061,db8:33061 loose_group_replication_gtid_assignment_block_size = 1000000 loose_group_replication_ip_whitelist = AUTOMATIC loose_group_replication_local_address = db9:33061 loose_group_replication_member_weight = 60 loose_group_replication_poll_spin_loops = 0 loose_group_replication_recovery_complete_at = TRANSACTIONS_APPLIED loose_group_replication_recovery_reconnect_interval = 60 loose_group_replication_recovery_retry_count = 10 loose_group_replication_recovery_ssl_ca = loose_group_replication_recovery_ssl_capath = loose_group_replication_recovery_ssl_cert = loose_group_replication_recovery_ssl_cipher = loose_group_replication_recovery_ssl_crl = loose_group_replication_recovery_ssl_crlpath = loose_group_replication_recovery_ssl_key = loose_group_replication_recovery_ssl_verify_server_cert = OFF loose_group_replication_recovery_use_ssl = ON loose_group_replication_single_primary_mode = ON loose_group_replication_ssl_mode = REQUIRED loose_group_replication_start_on_boot = ON loose_group_replication_transaction_size_limit = 0 loose_group_replication_unreachable_majority_timeout = 0 [mysqldump] quick = 1 quote-names = 1 max_allowed_packet = 64M # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. #
[20 Sep 2023 19:53]
MySQL Verification Team
Hi, I spent a while working on this bug report but I am not able to reproduce this. I see in data provided that your system did show a problem but I need a way to properly reproduce this and I cannot.
[21 Sep 2023 1:44]
huang rong
Thank you for the time you've invested in this bug. Concurrently, I've been fervently attempting to reproduce this phenomenon, albeit to no avail thus far. From my comprehension, in a single-primary mode MGR, the primary node, upon initiating a transaction, doesn't undergo conflict detection. Prior to the completion of the transaction, it logs to binlog. Post the binlog inscription on the primary, it then synchronizes to the secondary nodes. Even if the log's two-phase commit remains unfulfilled and the primary node reboots for crash recovery, it merely reapplies the logs rather than generating a fresh binlog. This encapsulates my grasp of the MySQL mechanism. Yet, judging by the content I've shared, transactions at the same GTID point between primary and secondary nodes aren't consistent, which strikes me as profoundly perplexing.
[21 Sep 2023 2:29]
MySQL Verification Team
Thanks. Data looks weird to say the least. Replication team will take a look but since we cannot reproduce this it could be a temporary hw error or some weird glitch that will never happen again and on top it is 5.7.36 that is not very modern release.

Description: I am utilizing an InnoDB cluster with three nodes set up with MySQL Group Replication (MGR) in single-primary mode. I encountered a data inconsistency issue after the MGR was restarted. To elaborate on the issue: On 2023-09-11T07:29:21.328364Z, a large transaction was submitted on the primary node, which took around 3 minutes to complete. During this process, the primary node(db7) detected that the other two secondary(db8/db9) nodes were unreachable and vice versa. ------- db7 ------------- 2023-09-11T07:29:21.328364Z 0 [Warning] Plugin group_replication reported: 'Member with address db9:3306 has beco me unreachable.' able again.' 2023-09-11T07:29:27.409471Z 0 [Warning] Plugin group_replication reported: 'Member with address db8:3306 has beco me unreachable.' 2023-09-11T07:29:33.306208Z 0 [ERROR] Plugin group_replication reported: 'Member was expelled from the group due to network failures, cha nging member status to ERROR.' 2023-09-11T07:29:33.928083Z 1696404486 [ERROR] Plugin group_replication reported: 'Transaction cannot be executed while Group Replication is on ERROR state. Check for errors and restart the plugin' 2023-09-11T07:29:33.928119Z 1696404486 [ERROR] Run function 'before_commit' in plugin 'group_replication' failed 2023-09-11T07:29:33.975833Z 1696404486 [ERROR] Plugin group_replication reported: 'Transaction cannot be executed while Group Replication is on ERROR state. Check for errors and restart the plugin' ------- db8 ------------- 2023-09-11T07:30:03.744547Z 0 [Warning] Plugin group_replication reported: 'Member with address db7:3306 has beco me unreachable.' 2023-09-11T07:30:03.744598Z 0 [Warning] Plugin group_replication reported: 'Member with address db7:3306 is reach able again.' 2023-09-11T07:30:03.744644Z 0 [Note] Plugin group_replication reported: '[GCS] Removing members that have failed while processing new vie w.' 2023-09-11T07:30:03.746467Z 0 [Warning] Plugin group_replication reported: 'Member with address db7:3306 has beco me unreachable.' 2023-09-11T07:30:03.746500Z 0 [Note] Plugin group_replication reported: '[GCS] Removing members that have failed while processing new vie w.' 2023-09-11T07:30:20.118593Z 0 [Warning] Plugin group_replication reported: 'Member with address db7:3306 is reach able again.' 2023-09-11T07:30:20.119125Z 0 [ERROR] Plugin group_replication reported: 'Member was expelled from the group due to network failures, cha nging member status to ERROR.' 2023-09-11T07:30:20.134095Z 0 [Note] Plugin group_replication reported: 'Going to wait for view modification' 2023-09-11T07:30:20.134159Z 0 [ERROR] Plugin group_replication reported: 'Message received while the plugin is not ready, message discard ed' When viewing from the InnoDB cluster's Admin end, each node displayed "status": "ERROR". After restarting the MySQL group_replication for all three nodes, the cluster did not recover. I had to shut down the MySQL instances on all nodes and examined their GTID sets. I found that db7 had the most up-to-date GTID set, whereas db8 and db9 were consistent but lagging behind db7. Consequently, I restarted the MGR cluster using db7 as the primary node, which restored read/write capabilities. However, upon trying to add the other two nodes back to the MGR, they reported a primary key conflict (Duplicate entry error). "db8:3306": { "address": "db8:3306", "instanceErrors": [ "ERROR: GR Recovery channel applier stopped with an error: Worker 1 failed executing transaction '82e0638a-e968-11ec-9a76-9cdc71bc2950:774245300' at master log mysql-bin.004136, end_log_pos 535344; Could not execute Write_rows event on table stick.campaigns; Duplicate entry '2761231785' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 535344 (1062) at 2023-09-11 03:51:35" ], "memberRole": "SECONDARY", "mode": "R/O", "readReplicas": {}, "recovery": { "applierError": "Worker 1 failed executing transaction '82e0638a-e968-11ec-9a76-9cdc71bc2950:774245300' at master log mysql-bin.004136, end_log_pos 535344; Could not execute Write_rows event on table stick.campaigns; Duplicate entry '2761231785' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 535344", "applierErrorNumber": 1062, "state": "APPLIER_ERROR" }, "recoveryStatusText": "Distributed recovery in progress", "role": "HA", "status": "RECOVERING", "version": "5.7.36" }, On further inspection, I observed that the transaction with GTID '82e0638a-e968-11ec-9a76-9cdc71bc2950:774243853' had different content on db7 compared to db8 and db9. db7 showed an INSERT operation on the stick.data_platform table In contrast, db8 showed an INSERT operation on the stick.external table. The mismatched transaction content for the same GTID across nodes is perplexing and suggests an issue with the internal operations of MySQL, possibly a bug. How to repeat: In the testing environment, I continually conducted read-write operations with sysbench. Concurrently, I executed a substantial transaction, endeavoring to replicate the previous circumstances. However, my replication merely resulted in the primary node being evicted from MGR, falling short of fully re-enacting the issue at hand. # load 2m data time sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-host=10.1.1.1 --mysql-port=6446 --mysql-user=root --mysql-password=111111 --mysql-db=sbtest --tables=4 --table-size=4000000 prepare # r/w testing sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-host=10.1.1.1 --mysql-port=6446 --mysql-user=root --mysql-password=111111 --mysql-db=sbtest --tables=2 --table-size=4000000 --time=600 --threads=5 run # clean up sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-host=10.1.1.1 --mysql-port=6446 --mysql-user=root --mysql-password=111111 --mysql-db=sbtest --tables=2 cleanup # big transaction; select @@hostname; start transaction; update sbtest.sbtest1 set k=11+1 where 1>0; commit; # big transaction; select @@hostname; start transaction; update sbtest.sbtest1 t1,sbtest.sbtest2 t2 set t1.k=11 where t2.id=t1.id; commit;