MySQL Bugs: #99370: Semi-synchronous replication executes un-acknowledged transactions

Bug #99370	Semi-synchronous replication executes un-acknowledged transactions
Submitted:	27 Apr 2020 15:12	Modified:	19 May 2020 16:40
Reporter:	Erol Guven (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S2 (Serious)
Version:	5.7.29	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
If the master node goes down before it receives a replication acknowledgement from the slave, it still executes that transaction when the server restarts.

This problem is also documented in MySQL and InnoDB Crash Recovery section of the this article https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replicat...

Some relevant configs from my.cnf

# master is configured to wait "forever" until it receives at least one slave replicating
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=604800000

gtid-mode=ON
enforce-gtid-consistency
sync_binlog=1
innodb_flush_log_at_trx_commit=1

How to repeat:
How to reproduce:
1. Setup master-slave semi-synchronous cluster with at least one slave
2. Stop slave thread on the slave node(s)
3. Write a transaction on master. It will wait until, the transaction is replicated. The transaction is not yet committed.
4. Stop and restart master node.
5. Upon restart, notice that the transaction from step#3 is now applied on the database.

Desired behavior:
The transaction from step#3 is not applied. MySQL should have recorded that the transaction was never acknowledged as committed, thus it should not have been applied.

Suggested fix:
Record acknowledged transactions and replay only those from the binary logs. Purge the un-acknowledged transactions from the binary logs.

Hi Erol,

This is expected behavior, a limitation that applies to the whole concept of semi-sync. It is not properly documented so I'll convert this into documentation bug so our doc team can improve on the details in the documentation.

When using semi-sync, the correct way to recover after a master failure without losing losslessness is to *discard* the master and fail over to the most up-to-date slave. This is consistent because transactions for which the master did not receive an ACK have neither been acknowledged to the committing client nor externalized to other clients, so we simply lose them.

all best
Bogdan

p.s.
You might want to switch to group replication where this problem does not exist

Posted by developer:
 
Thanks for raising this issue. I've added a note to
https://dev.mysql.com/doc/refman/8.0/en/replication-semisync.html 
and earlier release versions:

Important:
 With semisynchronous replication, if the master crashes and a failover to a slave is carried out, the failed master should not be reused as the replication master, and should be discarded. It could have transactions that were not acknowledged by any slave, which were therefore not committed before the failover. 
 If your goal is to implement a fault-tolerant replication topology where all the servers receive the same transactions in the same order, and a server that crashes can rejoin the group and be brought up to date automatically, you can use Group Replication to achieve this.