Description:
If the server is killed during XA ROLLBACK, crash recovery will wrongly consider the transaction to still be in XA PREPARE state, even though it may have been partially rolled back.
How to repeat:
--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
# Embedded server does not support restarting
--source include/not_embedded.inc
create table t(a serial, b char(255) unique) engine=innodb;
connect (con1,localhost,root);
XA START 'zombie';
insert into t(a) values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0);
insert into t(a) select 0 from t;
insert into t(a) select 0 from t;
insert into t(a) select 0 from t;
insert into t(a) select 0 from t;
update t set b=a;
SELECT COUNT(*) FROM t;
XA END 'zombie';
XA PREPARE 'zombie';
SET DEBUG_SYNC='trx_xa_rollback SIGNAL s1 WAIT_FOR s2';
--send XA ROLLBACK 'zombie'
connection default;
SET DEBUG_SYNC='now WAIT_FOR s1';
SET GLOBAL innodb_log_checkpoint_now=ON;
--source include/kill_and_restart_mysqld.inc
disconnect con1;
XA COMMIT 'zombie';
SELECT COUNT(*) FROM t;
DROP TABLE t;
and a patch against MySQL 5.7 to add a DEBUG_SYNC point:
diff --git a/storage/innobase/trx/trx0roll.cc b/storage/innobase/trx/trx0roll.cc
index feaadff..b230f1a 100644
--- a/storage/innobase/trx/trx0roll.cc
+++ b/storage/innobase/trx/trx0roll.cc
@@ -222,6 +222,7 @@ trx_rollback_for_mysql(
case TRX_STATE_PREPARED:
ut_ad(!trx_is_autocommit_non_locking(trx));
+ DEBUG_SYNC_C("trx_xa_rollback");
return(trx_rollback_for_mysql_low(trx));
case TRX_STATE_COMMITTED_IN_MEMORY:
The XA COMMIT should fail, because the XA ROLLBACK was started already. But it would happily mark the transaction as committed, even though (if we ignore the DEBUG_SYNC point) the trx_rollback_for_mysql_low(trx) could have partially applied the undo log, potentially making some indexes inconsistent with each other (causing CHECK TABLE failure).
This differs from a kill&restart during non-XA transaction processing. For example, if we crashed during a non-XA ROLLBACK, the transaction would remain in ACTIVE state, and it would be eventually picked up by the background thread.
Suggested fix:
Change the persistent insert_undo and update_undo logs back to ACTIVE state before entering trx_rollback_for_mysql_low(trx), so that crash recovery would replay the ROLLBACK in case the server is killed before XA ROLLBACK completes.