Bug #101351 Replication failure(errno 1399) on dml in XA tr after rollback on timeout
Submitted: 28 Oct 2020 7:28 Modified: 10 Nov 2020 12:22
Reporter: ting du Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7, 5.7.32 OS:Linux
Assigned to: CPU Architecture:x86

[28 Oct 2020 7:28] ting du
Description:
If set innodb_rollback_on_timeout=on, replication may break upon executing a dml immediately after a transaction rollback due to timeout while in an XA transaction.

Regardless of whether or not the offending update is considered valid on the existing connection, it is expected that replication should still continue to work despite any errors due to transaction rollback.

How to repeat:
1. set up two MySQL instance, and set innodb_rollback_on_timeout=on.

2. set up replication from one(master) to the other(slave).

3. prepare data(executed on master node):
create database test;
use test;
create table Table1 (id bigint auto_increment primary key not null, str varchar(50));
insert into Table1(str) values("a");
insert into Table1(str) values("b");

4. create two connections on master node(connection 1 and connection 2).

5. connection 1:
use test;
xa start 'xat1';
select * from Table1 where id = 2 for update;
update Table1 set str = (select uuid()) where id = 2;

6. connection 2:
use test;
xa start 'xat2';
select * from Table1 where id = 1 for update;
update Table1 set str = (select uuid()) where id = 1;

7. connection 1:
select * from Table1 where id = 1 for update;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
this select will fail.(innodb_lock_wait_timeout default value is 50, you can set it to a smaller value, such as 5)

8. connection 2:
xa end 'xat2';
xa prepare 'xat2';
xa commit 'xat2';

9. connection 1:
select * from Table1;
insert into Table1(str) values("c");  // Upon running this, MySQL replication slave shows error.
select * from Table1;
xa end 'xat1';
xa rollback 'xat1';   // after execute xa end, transaction is in ROLLBACK ONLY state. only can execute xa rollback.

10. connection 2:
select * from Table1;  // "c" is inserted.
[28 Oct 2020 7:39] ting du
When transaction is in ACTIVE state, if there any failure of any dml cause the transaction to rollback, dml after rollback(by server) execute will lead replication failue.
the binlog is:
# at 1572
#201028 15:24:39 server id 1  end_log_pos 1633  GTID    last_committed=6        sequence_number=7       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'eb147b26-18ed-11eb-8eb1-6805caa65bc1:7'/*!*/;
# at 1633
#201028 15:24:39 server id 1  end_log_pos 1722  Query   thread_id=3     exec_time=0     error_code=0
SET TIMESTAMP=1603869879/*!*/;
XA START X'78617431',X'',1
/*!*/;
# at 1722
#201028 15:24:39 server id 1  end_log_pos 1770  Table_map: `test`.`Table1` mapped to number 112
# at 1770
#201028 15:24:39 server id 1  end_log_pos 1812  Write_rows: table id 112 flags: STMT_END_F
### INSERT INTO `test`.`Table1`
### SET
###   @1=3 /* LONGINT meta=0 nullable=0 is_null=0 */
###   @2='c' /* VARSTRING(50) meta=50 nullable=1 is_null=0 */
# at 1812
#201028 15:24:39 server id 1  end_log_pos 1881  Query   thread_id=3     exec_time=0     error_code=0
SET TIMESTAMP=1603869879/*!*/;
COMMIT
/*!*/;
dml after rollback(by server) is begin with "xa start", and end with "commit" in binlog.

the cause of this bug is same with bug 91633, but has different scenario.
[10 Nov 2020 11:47] MySQL Verification Team
Hello ting du,

Thank you for the report and feedback.

regards,
Umesh
[10 Nov 2020 12:22] ting du
Thank you for your reply. Do you have any plans to fix this bug?