Bug #87385 Partial external XA transactions are not rolled back correctly
Submitted: 11 Aug 2017 6:32 Modified: 11 Aug 2017 20:49
Reporter: Wei Zhao (OCA) Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:mysql-5.7.17 OS:Any
Assigned to: CPU Architecture:Any
Tags: replication, ROLLBACK partial, xa

[11 Aug 2017 6:32] Wei Zhao
The function coord_handle_partial_binlogged_transaction() injects a 'ROLLBACK' query log event to rollback a partial transaction when a FormatDescriptionEvent is received from master at slave IO thread reconnection.

However if the partial transaction is an external XA transaction branch, the way to rollback it is to use two(or only the latter one) query log events containing XA END 'gtid' and/or XA ROLLBACK 'gtid' queries respectively, instead. the normal 'ROLLBACK' won't work for XA.

I made a fix and it proves to work correctly. I extracted the relevant changes to form a patch from several sparse commits so you might need a little tweak but it generally will work.

How to repeat:
Set up a master-slave replication, and run massive XA transactions on master and do 'stop slave; start slave' frequently, and/or kill slave's 'mysqld' frequently.

Suggested fix:
See my patch.
[11 Aug 2017 6:33] Wei Zhao
this patch fixes the bug

Attachment: fde-xa.diff (application/octet-stream, text), 11.45 KiB.

[11 Aug 2017 7:33] Wei Zhao
adding the mysql version that I am using to find&fix the bug.
[11 Aug 2017 20:49] Bogdan Kecman

This was not an easy one to reproduce but I managed to do it.
Thanks for both bug submission and the patch :)

all best
[17 Nov 2017 1:55] Wei Zhao
adding the patch as contribution

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: fde-xa.diff (application/octet-stream, text), 11.45 KiB.