Bug #98139 Committing a XA transaction causes a wrong sequence of events in binlog
Submitted: 7 Jan 2020 9:04 Modified: 17 Jan 2020 2:58
Reporter: Dehao Wang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:8.0.16 OS:Any
Assigned to: CPU Architecture:Any

[7 Jan 2020 9:04] Dehao Wang
Description:
When the XA COMMIT and XA PREPARE of same transaction are executed in different sessions, XA COMMIT will enter ha_commit_or_rollback_by_xid:
static void ha_commit_or_rollback_by_xid(THD *thd, XID *xid, bool commit)
{
  plugin_foreach(NULL, commit ? xacommit_handlerton : xarollback_handlerton,
                 MYSQL_STORAGE_ENGINE_PLUGIN, xid);
}

However,InnoBD commits before binlog in this function. As a result, the row lock is released before the COMMIT event is written to the binlog.This gap should not exist! When another XA transaction obtains the row lock,updates the same row and writes UPDATES and PREPARE events to the binlog right before last transaction's COMMIT event is written, a wrong sequence of binlog events appears. Just looks like this(txnB and txnA update the same row):
Line 10924: txnB XA PREPARE 
Line 15371: txnA XA PREPARE
Line 15382: txnB XA ROLLBACK
Line 19852: txnA XA ROLLBACK

When the slave tries to replay "txnA XA PREPARE" ,txnA can not obtain the row lock because txnB holds it. The replay thread will wait forever.

How to repeat:
We use our own stress test tool to repeat this bug. Let multiple XA Transactions update the same row, and send XA COMMIT using another seesion.

Writing a mtr case with debug-sync can repeat this bug, too.

Suggested fix:
let binlog commit before InnoDB in ha_commit_or_rollback_by_xid:

static void ha_commit_or_rollback_by_xid(THD *thd, XID *xid, bool commit)
{
  if (binlog_hton) {
    if (commit) {
        binlog_hton->commit_by_xid(binlog_hton, xid);
      } else {
        binlog_hton->rollback_by_xid(binlog_hton, xid);
      }
    }
    plugin_foreach_without_binlog(NULL, commit ? xacommit_handlerton : 
                                  xarollback_handlerton,
                                  MYSQL_STORAGE_ENGINE_PLUGIN, xid);
}
[8 Jan 2020 13:19] MySQL Verification Team
Hi Mr. Wang,

Thank you for your bug report.

There is a problem, however, with your report. XA transactions are invented for managing transactions over N different servers. However, the standard stipulates that management of the transaction for each one of the servers involved is executed through a single connection to that particular server.

Therefore, we do not support XA management of the same server via more then one single connection.

Hence, I do not see how we can accommodate your request ......
[10 Jan 2020 10:16] Dehao Wang
Hi, Mr. Milivojevic

Thank you for replying.

Usually, an application server sends XA PREPARE and XA COMMIT through one connection. However, if the application server is down after sending XA PREPARE but before sending XA COMMIT,this transaction is hanging there. After the application server restarts, it has to rollback or commit this transaction by using another connection. 

MySQL allows users to commit/rollback a XA transaction by using another connection if the former connection is closed or killed. I think there is still a big risk.
[10 Jan 2020 13:05] MySQL Verification Team
Hi Mr. Wang,

Yes, MySQL allows XA ROLLBACK from another connection, but XA COMMIT (from another connection) is not supported, nor are there any plans to .....
[16 Jan 2020 3:32] Dehao Wang
Hi, Mr. Milivojevic

Are you sure that XA COMMIT (from another connection) is not supported? I do some test,MySQL version:8.0.16

mysql> create table xa_test (id int primary key, c1 int) engine=innodb;
Query OK, 0 rows affected (2.65 sec)

mysql> xa begin "abc";
Query OK, 0 rows affected (0.00 sec)

mysql> insert into xa_test values (1,1);
Query OK, 1 row affected (0.01 sec)

mysql> insert into xa_test values (2,2);
Query OK, 1 row affected (0.00 sec)

mysql> xa end "abc";
Query OK, 0 rows affected (0.00 sec)

mysql> xa prepare "abc";
Query OK, 0 rows affected (0.00 sec)

mysql> xa recover;
+----------+--------------+--------------+------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------+
|        1 |            3 |            0 | abc  |
+----------+--------------+--------------+------+
1 row in set (0.00 sec)

mysql> quit

another session:
 
mysql> xa recover;
+----------+--------------+--------------+------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------+
|        1 |            3 |            0 | abc  |
+----------+--------------+--------------+------+
1 row in set (0.00 sec)

mysql> use test
Database changed
mysql> select * from xa_test;
Empty set (0.00 sec)

mysql> xa commit "abc";
Query OK, 0 rows affected (0.00 sec)

mysql> select * from xa_test;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)

I guess MySQL allows user to commit in another connection...

Anyway, either XA CCOMMIT or XA ROLLBACK in another connection will trigger this problem.
[16 Jan 2020 13:18] MySQL Verification Team
Yes, I am quite sure about that.

This is fully documented in our 8.0 Reference Manual, chapter 13.3.8.
[16 Jan 2020 13:18] MySQL Verification Team
Yes, I am quite sure about that.

This is fully documented in our 8.0 Reference Manual, chapter 13.3.8.
[16 Jan 2020 14:46] MySQL Verification Team
13.3.8.2.

It is quite clear from that sub-chapter.
[16 Jan 2020 14:50] MySQL Verification Team
Since execution in a single connection is mentioned, but not very explicitly, I can accept this as a documentation bug.
[17 Jan 2020 2:58] Dehao Wang
Alright,XA COMMIT is not allowed to executed in another connection(although a MySQL instance will succeed to do it without showing any warnings.). What about XA ROLLBACK? You mentioned it is allowed. 

When a XA transaction is rollbacked in another connection, A wrong sequence of events will be written to binlog(txnA and txnB updated the same row):
Line 10924: txnB XA PREPARE 
Line 15371: txnA XA PREPARE
Line 15382: txnB XA ROLLBACK
Line 19852: txnA XA ROLLBACK 

This really happened in our test environment,because of this function:

static void ha_commit_or_rollback_by_xid(THD *thd, XID *xid, bool commit)
{
  plugin_foreach(NULL, commit ? xacommit_handlerton : xarollback_handlerton,
                 MYSQL_STORAGE_ENGINE_PLUGIN, xid);
}

If txnA and txnB updated the same row, the replay thread of the slave will wait util timeout. When the slave tries to replay "txnA XA PREPARE" ,txnA can not obtain the row lock because txnB holds it.