Bug #117687 The mgr secondary node causes transaction submission timeout on the primary node
Submitted: 12 Mar 14:59 Modified: 12 Mar 15:55
Reporter: xichao li (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.28 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[12 Mar 14:59] xichao li
Description:
Recently, we encountering an issue in a 3-node MySQL 8.0.27 MGR cluster (single-primary mode) where one of the slave nodes experienced server anomalies for several seconds due to infrastructure issues. Following the failure of the slave node, it was observed that many transactions on the primary node were experiencing longer commit times during this period, affecting transactions.

After identifying the issue, we recreated the problem in a test environment. It was found that for the aforementioned MGR cluster, if a node fails due to network issues (simulated during tests using iptables to mimic network interruptions) or server-related reasons, it severely impacts the transaction commits on the primary node (the achievable TPS seems to drop to around 4, almost causing all transactions to halt).

How to repeat:
Observation:
Method 1: Simulate a failure of one slave node and execute a large number of transactions on the primary node, finding that its TPS can only reach 4.
The simulation steps are as follows:

1). mgr config
### MGR-Base ###
group_replication_start_on_boot = off
group_replication_bootstrap_group = off
group_replication_group_name = "e6d778cc-5d7d-11ec-8820-0050569be145"
group_replication_local_address = "10.211.55.4:23306"
group_replication_group_seeds = "10.211.55.4:23308,10.211.55.4:23306"
group_replication_single_primary_mode = true
group_replication_enforce_update_everywhere_checks = false
group_replication_ip_allowlist= "127.0.0.1,10.211.55.4"
report_host="10.211.55.4"
### MGR-Tune ###
group_replication_consistency = EVENTUAL
group_replication_flow_control_mode = DISABLED
group_replication_exit_state_action = OFFLINE_MODE
group_replication_unreachable_majority_timeout = 13
group_replication_autorejoin_tries = 0
super_read_only = on
group_replication_poll_spin_loops=200
group_replication_member_expel_timeout = 5
internal_tmp_mem_storage_engine=memory
port=3306

2) Preparing the test data
create table tb3(id bigint primary key auto_increment not null,name varchar(100),dt datetime);
alter table tb3 add index idx_dt(dt);

delimiter //
drop procedure batch_insert_1 //
create procedure batch_insert_1()
begin
declare num int;
set num=1;
while num < 5000001 do
insert into tb3(name,dt) values(concat("AbcDE", num),now());
commit;
select max(dt) from tb3;
set num=num+1;
end while;
end
//
delimiter ;

3). Loop execution to test concurrency reduction:
Start-up test:
mysql -uroot -S /tmp/mysql.sock db1 -e "call batch_insert_1()" >>/tmp/run_`date +%Y%m%d_%H%M%S`.log

// Observe test results
cd /tmp/;tail -300f `ls -tr | grep run_ | tail -n 1` | grep 2025

// Simulates a short - time network interruption
date;
iptables -I INPUT -s 10.211.55.25 -p tcp -j DROP
iptables -I INPUT -s 10.211.55.25 -p udp -j DROP
sleep 8
date;
iptables -F
iptables -X
iptables -L

4). When performed manually, the test delay increases
Alternatively, you can simulate a slave node failure, manually execute the sql, and find that its execution actually takes about 0.25 seconds. The steps are as follows:

date;
iptables -I INPUT -s 10.211.55.25 -p tcp -j DROP
iptables -I INPUT -s 10.211.55.25 -p udp -j DROP
sleep 8
date;
iptables -F
iptables -X
iptables -L

select current_timestamp(6); insert into tb3(name,dt) values('AbcDE',now());select current_timestamp(6); 

Suggested fix:
Repair Suggestions:

Based on the analysis of the code, the preliminary findings are as follows:

1). According to debug logs, it can be confirmed that the delay mainly occurs during the execution of the group_replication_trans_before_commit() function. Therefore, it can be determined that the slow execution is primarily caused by the MGR layer.

2). The entire process of the Paxos protocol is divided into operations such as PREPARE, PROPOSE, ACCEPT, LEARN, READ, and EXECUTE. By setting appropriate log markers in the corresponding tasks, the time consumption during the PREPARE to LEARN operations is not affected by failures and can be completed quickly.

3). In subsequent executor_task() operations, due to the need to satisfy the characteristic of "ordered message delivery," nodes wait in the rv queue for operations with smaller message sequence numbers to complete. A failure of one slave node leads to delays in processing messages initiated by that node. During the failure period, messages will not reach the failed node and require multiple attempts in find_value(), with consistency being achieved through push_msg_3p() on the fourth attempt.
Our analysis shows that there is a waiting period during each attempt, calculated by the wakeup_delay() function, with the first four waiting times being approximately 0 / 0.1 / 1.41 / 1.98 seconds respectively.
Therefore, if it requires reaching the third attempt, a wait time of 0+0.1+1.41 = 1.51 seconds may be needed, which aligns with the observed concurrency of about 4 from multiple tests.

4). In summary, the root cause of the timeout issue in transaction commits on the primary node of the MGR cluster, caused by a slave node failure, is the Paxos protocol's requirement for "ordered message delivery," leading to waiting for preceding messages to complete. However, from the perspective of practical business impact, especially in single-primary mode, a failure in one slave node should not affect transaction commits on the primary node.

Therefore, it is suggested to parameterize the logic of wakeup_delay(), replacing the currently fixed coding of minimum_threshold (0.1) with a configurable value. For instance, in an internal network environment, a lower initial delay value (such as 0.01) could be set.
[12 Mar 15:55] MySQL Verification Team
This is not a bug but intended behavior (that can be partially configurable too).

Please open the SR with our support team if you need help configuring your GR to behave more like you want. If you do not have access to MySQL support you can get some help on https://forums.mysql.com/

Thank you for using MySQL