Bug #75919 Failure in system restart when performing copy fragments as master node
Submitted: 16 Feb 2015 13:27 Modified: 13 Mar 2015 12:39
Reporter: Mikael Ronström Email Updates:
Status: Closed Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.4.4 OS:Any
Assigned to: CPU Architecture:Any

[16 Feb 2015 13:27] Mikael Ronström
We fail in ndbrequire checking that we only have one take over record
when we start a takeover. This logic assumes that the master node can
never perform copy fragment phase. This turns out to be a wrong
assumption since in a system restart a master node can be old and need
to use copy fragment phase to get an up-to-date version of its data.

How to repeat:
testSystemRestart -n Bug41915 D2

Suggested fix:
Ensure that we allow for 2 take over records if we are master node.
[13 Mar 2015 12:39] Jon Stephens
Documented fix in the NDB 7.4.5 changelog, as follows:

    NDB node takeover code made the assumption that there would be 
    only one takeover record when starting a takeover, based on the 
    further assumption that the master node could never perform copying 
    of fragments. However, this is not the case in a system restart,         
    where a master node can have stale data and so need to perform
    such copying to bring itself up to date.