Bug #84499 Transaction branch in inconsistent state after xa commit lock wait timeout
Submitted: 13 Jan 2017 7:40 Modified: 26 Jan 2017 6:16
Reporter: Wei Zhao (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: XA transactions Severity:S3 (Non-critical)
Version:5.7.16 OS:Any
Assigned to: CPU Architecture:Any
Tags: FTWRL, XA COMMIT

[13 Jan 2017 7:40] Wei Zhao
Description:

If 'xa commit' is blocked by FTWRL, after lock timeout, the transaction branch is in inconsistent states within the master node, and in slave nodes:
1. the prepared transaction branch is rolled back in innodb
2. 'xa recover' no longer lists this prepared transaction branch on master. 
3. in binlog it's not binlogged as 'aborted', so still in 'prepared' state, and this prepared transaction is replicated to slaves and becomes prepared on slaves.
4. A slave node replicating the binlog has a prepared transaction branch and it can be listed by 'xa recover'. and you can do 'xa commit' to commit the transaction branch on the slave, so that the slave and the master have inconsistent data.

How to repeat:
On master:

mysql> create table t1 (a int);
Query OK, 0 rows affected (0.03 sec)

mysql> show create table t1;
+-------+---------------------------------------------------------------------------------------+
| Table | Create Table                                                                          |
+-------+---------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `a` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+---------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> xa start '123';
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values(1),(2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> xa end '123';
Query OK, 0 rows affected (0.00 sec)

mysql> xa prepare '123';
Query OK, 0 rows affected (0.00 sec)

-- Here start another mysql session to connect to the same DB instance, and run -- 'flush table with read lock'.

mysql> xa commit '123';
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

mysql> xa recover;
Empty set (0.00 sec)

mysql> show binary logs;
+---------------------------+-----------+
| Log_name                  | File_size |
+---------------------------+-----------+
| mysql-instance-dzw.000001 |      1351 |
+---------------------------+-----------+
1 row in set (0.00 sec)

mysql>  show binlog events in 'mysql-instance-dzw.000001';
-- some omitted rows ...
| mysql-instance-dzw.000001 |  398 | Gtid           |        19 |         463 | SET @@SESSION.GTID_NEXT= '7c792bd1-d95d-11e6-8a5e-6d4619a90727:2'                                                    |
| mysql-instance-dzw.000001 |  463 | Query          |        19 |         557 | create database test                                                                                                 |
| mysql-instance-dzw.000001 |  557 | Gtid           |        19 |         622 | SET @@SESSION.GTID_NEXT= '7c792bd1-d95d-11e6-8a5e-6d4619a90727:3'                                                    |
| mysql-instance-dzw.000001 |  622 | Query          |        19 |         719 | use `test`; create table t1 (a int)                                                                                  |
| mysql-instance-dzw.000001 |  719 | Gtid           |        19 |         784 | SET @@SESSION.GTID_NEXT= '7c792bd1-d95d-11e6-8a5e-6d4619a90727:4'                                                    |
| mysql-instance-dzw.000001 |  784 | Query          |        19 |         875 | XA START X'313233',X'',1                                                                                             |
| mysql-instance-dzw.000001 |  875 | Table_map      |        19 |         920 | table_id: 108 (test.t1)                                                                                              |
| mysql-instance-dzw.000001 |  920 | Write_rows     |        19 |         965 | table_id: 108 flags: STMT_END_F                                                                                      |
| mysql-instance-dzw.000001 |  965 | Query          |        19 |        1054 | XA END X'313233',X'',1                                                                                               |
| mysql-instance-dzw.000001 | 1054 | XA_prepare     |        19 |        1093 | XA PREPARE X'313233',X'',1

On slave:

mysql> select*from t1;
Empty set (0.00 sec)

mysql> xa recover;
+----------+--------------+--------------+------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------+
|        1 |            3 |            0 | 123  |
+----------+--------------+--------------+------+
1 row in set (0.00 sec)

mysql> xa commit '123';
Query OK, 0 rows affected (0.01 sec)

mysql> select*from t1;
+------+
| a    |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)
[13 Jan 2017 14:31] MySQL Verification Team
This bug is actually a duplicate. Duplicate of:

https://bugs.mysql.com/bug.php?id=84323

Beside that, InnoDB does not work with FTWRL like other storage engines. All that InnoDB does is that it stops the second phase of the commit, until UNLOCK is issued or until timeout strikes.

Hence, XA transactions should never been supposed to be used with FTWRL.

As this is not properly documented, our manual must be appended for this misuse.
[18 Jan 2017 18:48] Daniel Price
Posted by developer:
 
The following documentation was updated:

http://dev.mysql.com/doc/refman/5.7/en/flush.html
http://dev.mysql.com/doc/refman/5.7/en/xa-restrictions.html

The changes should appear online soon.

Thank you for the bug report.
[18 Jan 2017 19:07] Daniel Price
Posted by developer:
 
Documentation update:

"FLUSH TABLES WITH READ LOCK is not compatible with XA transactions."
[19 Jan 2017 14:56] MySQL Verification Team
Latest analysis has proved that it is necessary to add XA_ROLLBACK in the cases like this. Then, XA consistency in the replication will be maintained.
[19 Jul 2018 20:17] Paul DuBois
Fixed by Bug#84442 in 5.7.19.