Bug #94814 slave replication lock wait timeout because of wrong trx order in binlog file
Submitted: 28 Mar 11:12 Modified: 15 Apr 10:04
Reporter: Zhenghu Wen (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: XA transactions Severity:S2 (Serious)
Version:5.7, 8.0, 5.7.25 OS:Any
Assigned to: CPU Architecture:Any
Tags: Group Commit, xa trx

[28 Mar 11:12] Zhenghu Wen
Description:
1. a table has a unique index(idx_a) and two record with idx_a value x,z

2. do xa prepare trx1:
delete record a(with idx_a value x), and insert a* (has the same idx_a value x).

trx1 will lock another record(idx_a vlaue z), where z>x,and just behind x. (unique check failed by delete record)

3. if we then insert a record b(with idx_a value y, x < y < z).  
this insert will blocked. (trx2)

4.if we exit the trx1 session (the xa trx obj of server will be detached), and relogin , then exec xa commit(trx3) . 

5. in binlog file, trx2(insert sql) will come before trx3, which is wrong order. it will break slave replication.  

How to repeat:
setup a mysqld with binlog enabled(row)

1. create a table and insert 2 records
CREATE TABLE `mytable` (
  `Id` bigint(20) NOT NULL,
  `TicketId` bigint(20) DEFAULT '0',
  `AttrId` bigint(20) DEFAULT '0',
  `AttrValueId` bigint(20) DEFAULT '0',
  `Position` int(10) DEFAULT '0',
  `db_create_time` timestamp NOT NULL DEFAULT '2000-01-01 00:00:00',
  `db_update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`Id`),
  UNIQUE KEY `idx_ticketid_attrid` (`TicketId`,`AttrId`),
  KEY `idx_ticketid_pos` (`TicketId`,`Position`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into `mytable` values (506612498,90374047,2,506612498,1,'2019-03-27 19:04:50','2019-03-27 19:04:50');

insert into `mytable` values (506623656,90375007,1,506623656,0,'2019-03-27 19:08:15','2019-03-27 19:08:15');

2. in session a
node1-wzh>xa start '1';delete from  `mytable` where id=506612498;insert into `mytable` values (506698389,90374047,2,506698389,1,'2019-03-27 20:17:15','2019-03-27 20:17:15');xa end '1';xa prepare '1';
Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

3. in session b 
set session innodb_lock_wait_timeout=5000;
node1-wzh>insert into `Music_ConcertTicketSKUAttr` values (506698396,90375003,1,506698396,0,'2019-03-27 20:17:15','2019-03-27 20:17:15');

the insert sql will be blocked by wait lock which is aquire by xa prepare trx;

mysql> select * from INNODB_LOCK_WAITS\G
*************************** 1. row ***************************
requesting_trx_id: 2558007194
requested_lock_id: 2558007194:388:3406:130
  blocking_trx_id: 2558004747
 blocking_lock_id: 2558004747:388:3406:130
1 row in set, 1 warning (0.00 sec)

mysql> select * from INNODB_LOCKS\G
*************************** 1. row ***************************
    lock_id: 2558007194:388:3406:130
lock_trx_id: 2558007194
  lock_mode: X,GAP
  lock_type: RECORD
 lock_table: `topic4`.`Music_ConcertTicketSKUAttr`
 lock_index: idx_ticketid_attrid
 lock_space: 388
  lock_page: 3406
   lock_rec: 130
  lock_data: 90375007, 1
*************************** 2. row ***************************
    lock_id: 2558004747:388:3406:130
lock_trx_id: 2558004747
  lock_mode: S
  lock_type: RECORD
 lock_table: `topic4`.`Music_ConcertTicketSKUAttr`
 lock_index: idx_ticketid_attrid
 lock_space: 388
  lock_page: 3406
   lock_rec: 130
  lock_data: 90375007, 1
2 rows in set, 1 warning (0.00 sec)

4. gdb attach the mysqld process and set 2 break 
gdb) b innobase_commit_by_xid
Breakpoint 1 at 0x10e2710: file xxxx.
(gdb) b transaction_cache_delete
Breakpoint 2 at 0xe90650: file xxxx.
(gdb) c
Continuing.
[Switching to Thread 0x7fa1b9a3d700 (LWP 1006125)]

5. in session a
exit the session and relogin, then exec xa commit
node1-wzh>xa recover;
+----------+--------------+--------------+------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------+
|        1 |            1 |            0 | 1    |
+----------+--------------+--------------+------+
1 row in set (0.00 sec)

node1-wzh>exit
Bye

node1-performance_schema>xa commit '1';

6. gdb will break in engine commit, we just continuing
Breakpoint 1, innobase_commit_by_xid (hton=0xc392320, xid=0x1adf2998) at xxxx	{
(gdb) c
Continuing.

7. in session b, the insert sql finished
node1-wzh>insert into `mytable` values (506698396,90375003,1,506698396,0,'2019-03-27 20:17:15','2019-03-27 20:17:15');
Query OK, 1 row affected (4 min 34.91 sec)

8. do show binlog events or mysqlbinlog:

#190328 18:39:02 server id 1  end_log_pos 12098         GTID    last_committed=26       sequence_number=27      rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'e1e67e3b-513f-11e9-9a44-c81f66e48c6e:27'/*!*/;
# at 12098
#190328 18:39:02 server id 1  end_log_pos 12173         Query   thread_id=11    exec_time=275   error_code=0
SET TIMESTAMP=1553769542/*!*/;
BEGIN
/*!*/;
# at 12173
# at 12300
#190328 18:39:02 server id 1  end_log_pos 12353         Table_map: `wzh`.`mytable` mapped to number 225
# at 12353
#190328 18:39:02 server id 1  end_log_pos 12429         Write_rows: table id 225 flags: STMT_END_F

BINLOG '
RqScXBMBAAAANQAAAEEwAAAAAOEAAAAAAAEAA3d6aAAHbXl0YWJsZQAHCAgICAMREQIAAB4=
RqScXB4BAAAATAAAAI0wAAAAAOEAAAAAAAEAAgAH/4CcmjMeAAAAAFsDYwUAAAAAAQAAAAAAAACc
mjMeAAAAAAAAAABcm2nLXJtpyw==
'/*!*/;
### INSERT INTO `wzh`.`mytable`
### SET
###   @1=506698396
###   @2=90375003
###   @3=1
###   @4=506698396
###   @5=0
###   @6=1553689035
###   @7=1553689035
# at 12429
#190328 18:39:02 server id 1  end_log_pos 12456         Xid = 619
COMMIT/*!*/;
# at 12456
#190328 18:40:19 server id 1  end_log_pos 12517         GTID    last_committed=26       sequence_number=28      rbr_only=no
SET @@SESSION.GTID_NEXT= 'e1e67e3b-513f-11e9-9a44-c81f66e48c6e:28'/*!*/;
# at 12517
#190328 18:40:19 server id 1  end_log_pos 12615         Query   thread_id=7     exec_time=198   error_code=0
SET TIMESTAMP=1553769619/*!*/;
XA COMMIT X'31',X'',1
/*!*/;

9. the insert sql  comes before xa commit, it is wrong order. if slave replay binlog by this order, the insert sql will lock wait timeout.
[28 Mar 11:18] Zhenghu Wen
Music_ConcertTicketSKUAttr should be mytable
[29 Mar 1:37] Zhenghu Wen
example based on 5.7.20
[15 Apr 10:04] Umesh Shastry
Hello Zhenghu,

Thank you for the report.
Verified as described with 5.7.25 build(standalone, no repl).

Thanks,
Umesh