Bug #59440 Race condition in XA ROLLBACK and XA COMMIT after server restart
Submitted: 12 Jan 2011 9:12 Modified: 10 Feb 2011 23:38
Reporter: Marko Mäkelä Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.1+ OS:Any
Assigned to: Marko Mäkelä CPU Architecture:Any
Tags: 2PC, race condition, xa, XA COMMIT, XA RECOVER, XA ROLLBACK

[12 Jan 2011 9:12] Marko Mäkelä
Description:
I noticed this when reviewing the locking rules of a feature. Normally, transactions are only committed or rolled back by the thread that is associated with the client connection and the transaction. The recovery of XA transactions that are in the PREPARED state after server restart is an exception to this: they can be committed or rolled back by any thread that knows the XID.

Because the server is releasing locks between mapping the XID to and InnoDB transaction, and actually rolling back or committing the transaction, it can occur that multiple connections may initiate a XA COMMIT or XA ROLLBACK at the same time on the same XID and transaction.

How to repeat:
create table t(a int)engine=innodb;
xa start 'c0de';
insert into t values(42);
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
xa end 'c0de';
xa prepare 'c0de';

-- restart the server
-- execute the following on two connections
xa rollback 'c0de';

For better success, add a 10-second sleep and a printout to innobase_rollback_by_xid() in storage/innobase/handler/ha_innodb.cc, just before the innobase_rollback_trx(trx) call:

		fprintf(stderr, "XA rollback sleep\n");
		os_thread_sleep(10000000);
		fprintf(stderr, "XA rollback sleep end\n");

Suggested fix:
In trx_get_trx_by_xid(xid), while holding the lock, remove the xid from the lookup table, e.g., by zeroing out trx->xid.
[27 Jan 2011 11:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129738
[27 Jan 2011 11:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129739
[30 Jan 2011 16:58] Bugs System
Pushed into mysql-5.1 5.1.56 (revid:vasil.dimov@oracle.com-20110130164158-1q99a41kb2wvkw3a) (version source revid:vasil.dimov@oracle.com-20110130164158-1q99a41kb2wvkw3a) (merge vers: 5.1.56) (pib:24)
[30 Jan 2011 16:59] Bugs System
Pushed into mysql-trunk 5.6.2 (revid:vasil.dimov@oracle.com-20110130165639-1pr3opz839b98q5j) (version source revid:vasil.dimov@oracle.com-20110130165522-m0o6al0pn5ig9kv3) (merge vers: 5.6.2) (pib:24)
[30 Jan 2011 17:00] Bugs System
Pushed into mysql-5.5 5.5.10 (revid:vasil.dimov@oracle.com-20110130165343-he9art47agq1a3gr) (version source revid:vasil.dimov@oracle.com-20110130165137-5lvzsq9j29j0hp1s) (merge vers: 5.5.10) (pib:24)
[4 Feb 2011 9:35] MySQL Verification Team
Marko, Sunny, was it intentional that you left the debugging code:

fprintf(stderr, "XA rollback sleep\n");
os_thread_sleep(10000000);
fprintf(stderr, "XA rollback sleep end\n");

in mysql-trunk ???
[4 Feb 2011 22:20] Sunny Bains
Shane, I couldn't find the debug code that you mention in any of
the InnoDB files. I grepped for 'os_thread_sleep(10000000)' and only
found one instance in os0file.c and no instances of 'XA rollback sleep'.