Bug #99045 deadlock during XA cause slave SQL thread XAER_RMFAIL error
Submitted: 24 Mar 2020 3:30 Modified: 26 Mar 2020 6:36
Reporter: Jie Zhou Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: XA transactions Severity:S2 (Serious)
Version:5.7, 8.0, 5.7.29 OS:Any
Assigned to: CPU Architecture:Any

[24 Mar 2020 3:30] Jie Zhou
Description:
1. execute another DML after a deadlock happens during XA.
2. master generates a illegal binlog sequence:
    ----
    SET @@SESSION.GTID_NEXT= 'ANONYMOUS'
    XA START X'32',X'',1 
    table_id: 114 (test.t)  
    table_id: 114 flags: STMT_END_F  
    COMMIT
    ----
3. slave SQL thread report error: XAER_RMFAIL: The command cannot be executed when global transaction is in the ACTIVE state' on query.

this can be reproduced in lastest version of 5.7/8.0

my.cnf:
binlog-format=ROW

How to repeat:
connection 1:
use test; 
CREATE TABLE t (i INT) ENGINE = InnoDB; 
INSERT INTO t (i) VALUES(1);
xa start '1';
SELECT * FROM t WHERE i = 1 LOCK IN SHARE MODE;  # try to make a deadlock

connection 2:
use test;
xa start '2';
update t set i=i+1 where i=1;  # blocked by connection 1

connection 1:
update t set i=i+1 where i=1; # cause connection 2 report a deadlock 
xa end '1';xa prepare '1';xa commit '1'; 

connection 2:
insert into t values (5); # illegal binlog is written
xa end '2';
ERROR 1614 (XA102): XA_RBDEADLOCK: Transaction branch was rolled back: deadlock was detected

show binlog events in 'mysql-bin.000001';
...
| master-bin.000001 | 2457 | Anonymous_Gtid           |         1 |          79 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'                                                  |
| master-bin.000001 | 2536 | Query                    |         1 |         169 | XA START X'32',X'',1                                                                  |
| master-bin.000001 | 2626 | Table_map                |         1 |         216 | table_id: 114 (test.t)                                                                |
| master-bin.000001 | 2673 | Write_rows               |         1 |         256 | table_id: 114 flags: STMT_END_F                                                       |
| master-bin.000001 | 2713 | Query                    |         1 |         332 | COMMIT                                                                                |

Slave SQL thread reports an error when finding a COMMIT after XA START .
[24 Mar 2020 6:34] MySQL Verification Team
Hello Jie Zhou,

Thank you for the report and test case.
Verified as described with 5.7.29 build.

Thanks,
Umesh
[26 Mar 2020 6:36] Jie Zhou
I did some debug work and find the problem.
The deadlock case run into function `trans_rollback_implicit`, it clears OPTION_BEGIN flag.
So the subsequent DML fails to know it belongs to XA and starts a new autocommit trx.

My suggested fix:
--- a/sql/transaction.cc
+++ b/sql/transaction.cc
@@ -450,7 +450,8 @@ bool trans_rollback_implicit(THD *thd)
     ~(SERVER_STATUS_IN_TRANS | SERVER_STATUS_IN_TRANS_READONLY);
   DBUG_PRINT("info", ("clearing SERVER_STATUS_IN_TRANS"));
   res= ha_rollback_trans(thd, true);
-  thd->variables.option_bits&= ~OPTION_BEGIN;
+  if (thd->get_transaction()->xid_state()->has_state(XID_STATE::XA_NOTR))
+    thd->variables.option_bits&= ~OPTION_BEGIN;
   thd->get_transaction()->reset_unsafe_rollback_flags(
     Transaction_ctx::SESSION);
[3 Jun 2021 10:55] Jay Chu
On MySQL community 5.7.34 -> community 5.7.34 (replica instance), problem still persists.
[21 Feb 2023 16:54] MySQL Verification Team
Bug #110151 marked as duplicate of this one