Bug #46944 Internal prepared XA transction XIDs are not removed if server_id changes
Submitted: 26 Aug 21:16 Modified: 17 Nov 17:41
Reporter: Harrison Fisk
Status: Closed
Category:Server Severity:S3 (Non-critical)
Version:5.0 OS:Any
Assigned to: Kristofer Pettersson Target Version:5.1+
Tags: xa, xid
Triage: Triaged: D3 (Medium)

[26 Aug 21:16] Harrison Fisk
Description:
When MySQL crashes (or a snapshot is taken which simulates a crash), then it is possible
that internal XA transactions (used to sync the binary log and InnoDB) can be left in a
PREPARED state, whereas they should be rolled back.  This is done when the server_id
changes before the restart occurs.  

This can leave rows locked in InnoDB which will persist across a restart.

The most common time this occurs is when you take a snapshot to prepare another slave. 
It could also occur on a normal system after a crash if the server_id is changed for some
reason.

See the following URL for more details:

http://harrison-fisk.blogspot.com/2009/01/xa-and-persistent-innodb-locks.html

You can then do an XA RECOVER and see the internal XID showing up and then roll it back
manually.

How to repeat:
1.  Run a lot of transactions very quickly with the binary log enabled.
2.  Take a snapshot of the system.
3.  Change the server_id and restart.
4.  Notice the prepared transactions still present.

Suggested fix:
The internal XID is generated by combining the prefix MySQLXid + server_id + query_id.

During startup, ha_recovery() is called, which loops through the prepared XIDs and uses
xid_t::get_my_xid() to verify that they were created by internal MySQL processing.
get_my_xid() uses the prefix and the server_id to see if they do indeed belong to the
server or not.

So if a snapshot is taken and the server_id is changed on restart, then it will not think
it is the owner of the XID and will leave it in the prepared state.

I think it should just be enough to use the special prefix and to not use the server_id
as well.  I believe the only drawback would be it would prevent people from using
"MySQLXid" as a prefix in manually created XID values, which could be documented.

A diff to do this is:

=== modified file 'sql/handler.h'
--- sql/handler.h	2008-11-25 06:22:02 +0000
+++ sql/handler.h	2009-08-26 19:13:14 +0000
@@ -275,7 +275,6 @@
   my_xid get_my_xid()
   {
     return gtrid_length == MYSQL_XID_GTRID_LEN && bqual_length == 0 &&
-           !memcmp(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)) &&
            !memcmp(data, MYSQL_XID_PREFIX, MYSQL_XID_PREFIX_LEN) ?
            quick_get_my_xid() : 0;
   }
[12 Oct 14:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/86571

3119 Kristofer Pettersson	2009-10-12
      Bug#46944 Internal prepared XA transction XIDs are not
                removed if server_id changes
      
      When MySQL crashes (or a snapshot is taken which simulates
      a crash), then it is possible that internal XA
      transactions (used to sync the binary log and InnoDB)
      can be left in a PREPARED state, whereas they should be
      rolled back.  This is done when the server_id changes
      before the restart occurs.  
      
      This patch releases he restriction that the server_id
      should be consistent if the XID is to be considerred
      valid. The rollback phase should then be able to
      clean up all pending XA transactions.
[4 Nov 10:25] Bugs System
Pushed into 5.1.41 (revid:joro@sun.com-20091104092152-qz96bzlf2o1japwc) (version source
revid:kristofer.pettersson@sun.com-20091019090224-sxcpk82z9akeppxh) (merge vers: 5.1.41)
(pib:13)
[11 Nov 7:50] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091110093407-rw5g8dys2baqkt67) (version
source revid:alik@sun.com-20091109080109-7dxapd5y5pxlu08w) (merge vers: 6.0.14-alpha)
(pib:13)
[11 Nov 7:58] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091109115615-nuohp02h8mdrz8m2) (version
source revid:alik@sun.com-20091105090203-cls5j6k3ohu04xpt) (merge vers: 5.5.0-beta)
(pib:13)
[17 Nov 17:41] Paul DuBois
Noted in 5.1.41, 5.5.0, 6.0.14 changelogs.

When MySQL crashed (or a snapshot was taken that simulates a crash),
it was possible that internal XA transactions (used to synchronize
the binary log and InnoDB) could be left in a PREPARED state, whereas
they should be rolled back. This occurred when the server_id value
changed before the restart, because that value was used to construct
XID values. 

Now the restriction is relaxed that the server_id value be consistent
for XID values to be considered valid. The rollback phase should then
be able to clean up all pending XA transactions.