Bug #75204 MySQL/InnoDB forgets prepared xa transaction on restart
Submitted: 13 Dec 2014 16:28 Modified: 14 May 2015 12:45
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: XA transactions Severity:S3 (Non-critical)
Version:5.7.5-m15, 5.6.22 OS:Any
Assigned to: CPU Architecture:Any
Tags: recover, recovery, restart, xa, xa_recover

[13 Dec 2014 16:28] Daniël van Eeden
Description:
Source:
Distributed Transaction Processing: The XA Specification
ISBN: 1 872630 24 3
X/Open Document Number: XO/CAE/91/300
'7.2 Resource Manager Requirements', 'Support for Recovery' (page 67)
http://pubs.opengroup.org/onlinepubs/009680699/toc.pdf

"An RM must track the status of all transaction branches in which it is involved.
After responding affirmatively to the TM’s xa_prepare() call, the RM cannot erase its
knowledge of the branch, or of the work it has done in support of that branch, until
it receives and successfully performs the TM’s order to commit or roll back the
branch. If an RM heuristically completes a branch, it cannot forget the branch until
explicitly permitted to by the TM. On command, an RM must return a list of all its
branches that it has prepared to commit or has heuristically completed.

When an RM recovers from its own failure, it recovers prepared and heuristically
completed transaction branches. It forgets all other branches."

This lets me believe that after a crash or restart of mysqld the transaction must be recoverable. This works if I do a "kill -9 <mysqld_pid>", but doesn't work if I restart the server.

This is related to: Bug #12161 	Xa recovery and client disconnection

How to repeat:
mysql> XA RECOVER;
Empty set (0.00 sec)

mysql> XA START 'foo-3';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t2 VALUES(3);
Query OK, 1 row affected (0.00 sec)

mysql> XA END 'foo-3';
Query OK, 0 rows affected (0.00 sec)

mysql> XA PREPARE 'foo-3';
Query OK, 0 rows affected (0.00 sec)

mysql> XA RECOVER;
+----------+--------------+--------------+-------+
| formatID | gtrid_length | bqual_length | data  |
+----------+--------------+--------------+-------+
|        1 |            5 |            0 | foo-3 |
+----------+--------------+--------------+-------+
1 row in set (0.00 sec)

mysql> \! ./restart
.. sandbox server started
mysql> XA RECOVER;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    1
Current database: test

Empty set (0.02 sec)

Suggested fix:
Make sure prepared transactions are recoverable after a restart.
[13 Dec 2014 16:55] Daniël van Eeden
This behaviour is documentad both for this bug and Bug #12161. So apparently this is a missing feature which is required by the standard.

http://dev.mysql.com/doc/refman/5.7/en/xa-restrictions.html
[13 Dec 2014 17:40] Daniël van Eeden
This event happens when I set innodb_fast_shutdown=2 to force InnoDB recovery on startup.

It seems that the best way to restart a master with XA transactions is to do a kill -9 and then re-sync the slaves (xa commit doesn't write binlog after recovery).
[13 Dec 2014 22:06] Daniël van Eeden
With PostgreSQL 9.4beta3:

postgres=# create table t1 (id int);
CREATE TABLE
postgres=# begin;
BEGIN
postgres=# insert into t1 values(1);
INSERT 0 1
postgres=# PREPARE TRANSACTION 'foobar';
PREPARE TRANSACTION
postgres=# select * from pg_prepared_xacts;
 transaction |  gid   |           prepared            |  owner   | database 
-------------+--------+-------------------------------+----------+----------
         687 | foobar | 2014-12-13 21:52:49.724968+00 | postgres | postgres
(1 row)

Then a fast shutdown from another session...

postgres=# select * from pg_prepared_xacts;
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select * from pg_prepared_xacts;
 transaction |  gid   |           prepared            |  owner   | database 
-------------+--------+-------------------------------+----------+----------
         687 | foobar | 2014-12-13 21:52:49.724968+00 | postgres | postgres
(1 row)
[15 Dec 2014 13:27] MySQL Verification Team
Hello Daniël,

Thank you for the report and test case.
Observed this behavior with 5.6.22.

Thanks,
Umesh
[15 Dec 2014 13:27] MySQL Verification Team
// 5.6.22

[(none)]> show variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 5.6.22                       |
| protocol_version        | 10                           |
| slave_type_conversions  |                              |
| version                 | 5.6.22                       |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Linux                        |
+-------------------------+------------------------------+
7 rows in set (0.00 sec)

[(none)]> create database if not exists test;
Query OK, 1 row affected, 1 warning (0.00 sec)

[(none)]> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
[test]> create table t2(id int)engine=innodb;
Query OK, 0 rows affected (0.07 sec)

[test]> XA RECOVER;
Empty set (0.00 sec)

[test]> XA START 'foo-3';
Query OK, 0 rows affected (0.00 sec)

[test]> INSERT INTO t2 VALUES(3);
Query OK, 1 row affected (0.01 sec)

[test]>  XA END 'foo-3';
Query OK, 0 rows affected (0.00 sec)

[test]> XA PREPARE 'foo-3';
Query OK, 0 rows affected (0.00 sec)

[test]> XA RECOVER;
+----------+--------------+--------------+-------+
| formatID | gtrid_length | bqual_length | data  |
+----------+--------------+--------------+-------+
|        1 |            5 |            0 | foo-3 |
+----------+--------------+--------------+-------+
1 row in set (0.00 sec)

[test]>
[test]> \! /etc/init.d/mysqld restart
Stopping mysqld:                                           [  OK  ]
Starting mysqld:                                           [  OK  ]
[test]>
[test]> XA RECOVER;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    2
Current database: test

Empty set (0.00 sec)

[test]> XA RECOVER;
Empty set (0.00 sec)
[15 Dec 2014 13:32] MySQL Verification Team
// with kill -9 - XA recovered

[root@cluster-repo ~]# mysql -u root -p -S /var/lib/mysql/mysql.sock
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.6.22 MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

[(none)]> use test
Database changed
[test]> create table t2(id int)engine=innodb;
Query OK, 0 rows affected (0.02 sec)

[test]> XA RECOVER;
Empty set (0.00 sec)

[test]> XA START 'foo-3';
Query OK, 0 rows affected (0.00 sec)

[test]> INSERT INTO t2 VALUES(3);
Query OK, 1 row affected (0.00 sec)

[test]>  XA END 'foo-3';
Query OK, 0 rows affected (0.00 sec)

[test]> XA PREPARE 'foo-3';
Query OK, 0 rows affected (0.00 sec)

[test]>  XA RECOVER;
+----------+--------------+--------------+-------+
| formatID | gtrid_length | bqual_length | data  |
+----------+--------------+--------------+-------+
|        1 |            5 |            0 | foo-3 |
+----------+--------------+--------------+-------+
1 row in set (0.00 sec)

[test]>
[test]> \! ps aux|grep mysqld
root     21287  0.0  0.0 106228  1380 pts/1    S    20:25   0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/lib/mysql/mysqld.pid --basedir=/usr --user=mysql
mysql    21503  0.1  7.4 1242164 456536 pts/1  Sl   20:25   0:01 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/mysqld.log --pid-file=/var/lib/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock
root     21567  0.0  0.0 106096  1136 pts/1    S+   20:36   0:00 sh -c  ps aux|grep mysqld
root     21569  0.0  0.0 103252   828 pts/1    S+   20:36   0:00 grep mysqld
[test]>
[test]> \! kill -9 21287 21503
[test]>
[test]>
[test]> \! /etc/init.d/mysqld start
Starting mysqld:                                           [  OK  ]
[test]>
[test]>  XA RECOVER;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    2
Current database: test

+----------+--------------+--------------+-------+
| formatID | gtrid_length | bqual_length | data  |
+----------+--------------+--------------+-------+
|        1 |            5 |            0 | foo-3 |
+----------+--------------+--------------+-------+
1 row in set (0.00 sec)
[20 Dec 2014 9:57] Daniël van Eeden
This also happens with TokuDB in 5.5.40-tokudb-7.5.3
https://tokutek.atlassian.net/browse/DB-784
[14 May 2015 12:45] Paul DuBois
Fixed in 5.7.7. See changelog entry in Bug#12161.