Bug #83295 replication error occurs, use xa transaction(one phase)
Submitted: 7 Oct 2016 2:47 Modified: 27 Sep 2017 13:24
Reporter: Hiroyuki Itoh Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.15 OS:Any
Assigned to: CPU Architecture:Any
Tags: replication, XA Transaction

[7 Oct 2016 2:47] Hiroyuki Itoh
Description:
Use xa transaction and execute DELETE or UPDATE statement. 

If `0 rows affected`,  replication error occurs. 
But over 1 rows affected, replication error not occurs.
And if use `XA PREPARE` (not use `ONE PHASE`), replication error not occurs.

How to repeat:
o MySQL Master(3306 port)
o MySQL Slave (3307 port)

execute query as below on Master.
--------------------------------------------
CREATE DATABASE /*! IF NOT EXISTS*/ xatest;
DROP TABLE IF EXISTS `xatest`;
CREATE TABLE `xatest` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
XA START 'xatest';
DELETE FROM xatest where id=1;
XA END 'xatest';
XA COMMIT 'xatest' ONE PHASE;
--------------------------------------------

replication error message:
Last_SQL_Error: Error 'XAER_RMFAIL: The command cannot be executed when global transaction is in the  ACTIVE state' on query. Default database: 'xatest'. Query: 'COMMIT'

general_log(Master):
mysql> select event_time,argument from mysql.general_log where event_time between '2016-10-07 11:34:28' and '2016-10-07 11:34:32';
+----------------------------+------------------------------------------------------------------------------------------------------+
| event_time                 | argument                                                                                             |
+----------------------------+------------------------------------------------------------------------------------------------------+
| 2016-10-07 11:34:30.715193 | CREATE DATABASE /*! IF NOT EXISTS*/ xatest                                                           |
| 2016-10-07 11:34:30.718987 | DROP TABLE IF EXISTS `xatest`                                                                        |
| 2016-10-07 11:34:30.775041 | CREATE TABLE `xatest` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB |
| 2016-10-07 11:34:30.795014 | XA START 'xatest'                                                                                    |
| 2016-10-07 11:34:30.815122 | DELETE FROM xatest where id=1                                                                        |
| 2016-10-07 11:34:30.825208 | XA END 'xatest'                                                                                      |
| 2016-10-07 11:34:31.181188 | XA COMMIT 'xatest' ONE PHASE                                                                         |
+----------------------------+------------------------------------------------------------------------------------------------------+
7 rows in set (0.00 sec)

mysql>

general_log(Slave):
mysql> select event_time,argument from mysql.general_log where event_time between '2016-10-07 11:34:28' and '2016-10-07 11:34:32';
+----------------------------+------------------------------------------------------------------------------------------------------+
| event_time                 | argument                                                                                             |
+----------------------------+------------------------------------------------------------------------------------------------------+
| 2016-10-07 11:34:30.717733 | CREATE DATABASE /*! IF NOT EXISTS*/ xatest                                                           |
| 2016-10-07 11:34:30.784941 | DROP TABLE IF EXISTS `xatest` /* generated by server */                                              |
| 2016-10-07 11:34:30.808585 | CREATE TABLE `xatest` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB |
| 2016-10-07 11:34:31.183254 | XA START X'786174657374',X'',1                                                                       |
| 2016-10-07 11:34:31.183766 | DELETE FROM xatest where id=1                                                                        |
| 2016-10-07 11:34:31.183842 | COMMIT                                                                                               |
+----------------------------+------------------------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)

mysql>

Suggested fix:
replication error not occurs.
[7 Oct 2016 4:07] Hiroyuki Itoh
I forget `use` clause.

Please execute query as below:
--------------------------------------------
CREATE DATABASE /*! IF NOT EXISTS*/ xatest;
USE xatest;
DROP TABLE IF EXISTS `xatest`;
CREATE TABLE `xatest` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
XA START 'xatest';
DELETE FROM xatest where id=1;
XA END 'xatest';
XA COMMIT 'xatest' ONE PHASE;
--------------------------------------------
[7 Oct 2016 4:50] Hiroyuki Itoh
Set binlog_format = ROW, replication error not occurs.
Set binlog_format = STATEMENT or MIXED, replication error occurs.

So I change Severity S2 to S3.
[7 Oct 2016 6:05] MySQL Verification Team
Hello Hiroyuki Itoh,

Thank you for the report and test case.
Observed this with 5.7.15 build.

Thanks,
Umesh
[26 Oct 2016 16:00] chandra nanjegowda
I am seeing this issue in row based replication also.
[26 Oct 2016 16:05] chandra nanjegowda
Any work around for this issue? was this issue introduce in 5.7.15 or pervious  versions also have this issue? 

will it get fixed in the next release? 

Thank you.
[1 Nov 2016 17:14] Goran Opacic
ROW based replication - 5.7.15
I have the same bug
[31 Jan 2017 16:01] Matthew Pitts
I am seeing similar behavior after upgrading from 5.6 to 5.7 in a test environment that uses XA and MIXED replication. I can not yet confirm that the statements producing the replication errors are the same 'zero rows affected' types of queries described herein, but I will try to determine this if possible.

I will try to test the binlog_format=ROW as a workaround, but IMO this seems like a pretty serious issue even if that works.
[31 Jan 2017 16:02] Matthew Pitts
Sorry, I should have specified I'm using 5.7.16.
[31 Jan 2017 19:09] Matthew Pitts
I can confirm that the aforementioned SQL statements do indeed break replication with my 5.7.16 setup using MIXED binlog_format.
[31 Jan 2017 19:22] Matthew Pitts
I can also confirm that switching to binlog_format=ROW in my 5.7.16 setup does allow the aforementioned statements to run and NOT produce a replication error.
[13 Mar 2017 13:43] Kristofer Grahn
I have the same issue on 5.7.17
Using binlog_format = statement
Delete with 0 rows affected breaks replication.
[14 Mar 2017 12:28] Andreas Wederbrand
We're seeing something similar after upgrading from 5.6 to 5.7

When we look at the binlog it looks like the master writes
XA START;
... some statements ...
SET @@session.sql_mode=1075838976/*!*/;
COMMIT; -- wrong, this should be XA END and so on

This causes the slave to die but the master never complained. 

It looks like a 5.7 master writes the transaction in a bad way, ie with COMMIT instead of XA END. 

Is this what all of you are seeing as well?
[16 Mar 2017 11:41] Andreas Wederbrand
I've written a GIST on how to reproduce this using the docker image and what to look for in the binlog and general log
(https://gist.github.com/wederbrand/72b0afee7a8e0216ca6c3c2c23e71a2a/)

The vital parts is this in the general log:
 2017-03-16T11:38:23.447000Z	    3 Query	XA START 'xatest'
 2017-03-16T11:38:23.447092Z	    3 Query	DELETE FROM xatest where id=2
 2017-03-16T11:38:23.447229Z	    3 Query	XA END 'xatest'
 2017-03-16T11:38:23.447304Z	    3 Query	XA COMMIT 'xatest' ONE PHASE

that becomes this in the binlog

 # at 327
 #170316 11:38:23 server id 1  end_log_pos 434 CRC32 0x350b6fa0 	Query	thread_id=3	exec_time=0	error_code=0 
 use `xatest`/*!*/;
 SET TIMESTAMP=1489664303/*!*/;
 DELETE FROM xatest where id=2
 /*!*/;
 # at 434
 #170316 11:38:23 server id 1  end_log_pos 509 CRC32 0x834a74d5 	Query	thread_id=3	exec_time=0	error_code=0
 SET TIMESTAMP=1489664303/*!*/;
 COMMIT
 /*!*/;
[16 Mar 2017 12:18] Matthew Pitts
Running with ROW format for a while I have not seen any more XAER_RMFAIL errors causing replication failure. However, replication is still stopping due to some type of error indicated with text like [1]. 

I'm having a hard time correlating this back to the binlog entries as I am not very experienced with that tool, but I do see some 'ROLLBACK' statements that seem to occur near the position where replication stopped. I'm wondering if there's an issue with XA replication and ROLLBACKs. I also see transactions on the SLAVE in 'xa recover' which seem to be 'stuck'. Since the only writes on the slave come via replication, perhaps this could indicate that some transactions that ran via replication did not COMMIT or ROLLBACK properly.

Is there a replication statement timeout option or anything like that that would cause a ROLLBACK attempt to just give up?

This may all be a separate issue from the reported bug too.

Thanks for any support/help.

[1] Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
[16 May 2017 8:19] Tibor Korocz
Hi, 

Is there any news about this bug?I know it is not critical bug but in terms that this bug blocks all the MySQL upgrade to 5.7 where XA transactions are in used I would say it is critical. 

Thanks.
[1 Jun 2017 4:56] Jesper wisborg Krogh
Posted by developer:
 
If you encounter this bug in 5.7.9-5.7.16, then another possibility is Bug 83588 (internal Bug 24966941 which in turn is a duplicate of Bug 24764800). This bug happens when the replication slave encounters a log wait timeout or a deadlock. This bug is fixed in 5.7.17:

----------------------------------------------------------

Replication: When using XA transactions, if a lock wait timeout or deadlock occurred for the applier (SQL) thread on a replication slave, the automatic retry did not work. The cause was that while the SQL thread would do a rollback, it would not roll the XA transaction back. This meant that when the transaction was retried, the first event was XA START which was invalid as the XA transaction was already in progress, leading to an XAER_RMFAIL error. (Bug #24764800)

References: See also: Bug #24923091, Bug #24966941.

----------------------------------------------------------

In all cases, statement based events should not be used with XA transactions in 5.7 and later. The reason is that XA transactions are now logged to the binary log when they are prepared, and the commit is logged in a second event. So multiple XA transactions can get interleaved. Statement based events will cause more locks than row based events making it more likely for a deadlock between two transactions to occur. See also Bug 85639/25786490.

Note: Bug 83588 and Bug 85639 were still private at the time this was written. They should become public at some point.
[14 Jul 2017 10:20] MySQL Verification Team
Bug #87008 marked as duplicate of this
[27 Sep 2017 13:24] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL 8.0.4 and 5.7.21:

With statement-based replication in use, an UPDATE or DELETE statement inside an XA transaction that did not affect any rows caused a replication error. An XA END statement was not written to the binary log, so slave servers identified the XA transaction as still being active at the time of the commit request. The required XA END statement is now written even if no rows were affected by the transaction.