Bug #36246 Incorrect handling of TC-TakeOver in cascading master failure
Submitted: 22 Apr 2008 9:10 Modified: 31 May 2008 10:47
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[22 Apr 2008 9:10] Jonas Oreland
When a node X dies,
the master Y will start take-over of that nodes transactions (TC-TakeOver)

If the master dies during this take-over,
the new master Z will currently *not* complete the take-over of X

This can lead to stale operations on datanodes,
with pain and misery as a consequence

How to repeat:
will write new test prg

Suggested fix:
[25 Apr 2008 7:54] Jonas Oreland
pushed to 51-ndb, telco* and drop6
(50-ndb was locked for unknown reason)
[20 May 2008 9:34] Jon Stephens
Documented in the 5.1.24-ndb-6.3.14 changelog as follows:

        Under certain rare circumstances, the failure of the new master node
        while attempting a node takeover would cause takeover errors to repeat
        without being resolved.

Left Patch Queued status pending further merges.
[31 May 2008 10:47] Jon Stephens
Closed per yesterday's discussion with Jonas.
[12 Dec 2008 23:29] Bugs System
Pushed into 6.0.6-alpha  (revid:sp1r-jonas@perch.ndb.mysql.com-20080423140838-48946) (version source revid:jonas@mysql.com-20080808094047-4e1yiarqa2t3opg3) (pib:5)