Bug #99370 Semi-synchronous replication executes un-acknowledged transactions
Submitted: 27 Apr 2020 15:12 Modified: 19 May 2020 16:40
Reporter: Erol Guven (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S2 (Serious)
Version:5.7.29 OS:Any
Assigned to: CPU Architecture:Any

[27 Apr 2020 15:12] Erol Guven
Description:
If the master node goes down before it receives a replication acknowledgement from the slave, it still executes that transaction when the server restarts.

This problem is also documented in MySQL and InnoDB Crash Recovery section of the this article https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replicat...

Some relevant configs from my.cnf

# master is configured to wait "forever" until it receives at least one slave replicating
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=604800000

gtid-mode=ON
enforce-gtid-consistency
sync_binlog=1
innodb_flush_log_at_trx_commit=1

How to repeat:
How to reproduce:
1. Setup master-slave semi-synchronous cluster with at least one slave
2. Stop slave thread on the slave node(s)
3. Write a transaction on master. It will wait until, the transaction is replicated. The transaction is not yet committed.
4. Stop and restart master node.
5. Upon restart, notice that the transaction from step#3 is now applied on the database.

Desired behavior:
The transaction from step#3 is not applied. MySQL should have recorded that the transaction was never acknowledged as committed, thus it should not have been applied.

Suggested fix:
Record acknowledged transactions and replay only those from the binary logs. Purge the un-acknowledged transactions from the binary logs.
[5 May 2020 11:40] MySQL Verification Team
Hi Erol,

This is expected behavior, a limitation that applies to the whole concept of semi-sync. It is not properly documented so I'll convert this into documentation bug so our doc team can improve on the details in the documentation.

When using semi-sync, the correct way to recover after a master failure without losing losslessness is to *discard* the master and fail over to the most up-to-date slave. This is consistent because transactions for which the master did not receive an ACK have neither been acknowledged to the committing client nor externalized to other clients, so we simply lose them.

all best
Bogdan

p.s.
You might want to switch to group replication where this problem does not exist
[19 May 2020 16:40] Margaret Fisher
Posted by developer:
 
Thanks for raising this issue. I've added a note to
https://dev.mysql.com/doc/refman/8.0/en/replication-semisync.html 
and earlier release versions:

Important:
 With semisynchronous replication, if the master crashes and a failover to a slave is carried out, the failed master should not be reused as the replication master, and should be discarded. It could have transactions that were not acknowledged by any slave, which were therefore not committed before the failover. 
 If your goal is to implement a fault-tolerant replication topology where all the servers receive the same transactions in the same order, and a server that crashes can rejoin the group and be brought up to date automatically, you can use Group Replication to achieve this.