MySQL Bugs: #75919: Failure in system restart when performing copy fragments as master node

Bug #75919	Failure in system restart when performing copy fragments as master node
Submitted:	16 Feb 2015 13:27	Modified:	13 Mar 2015 12:39
Reporter:	Mikael Ronström	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.4.4	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
We fail in ndbrequire checking that we only have one take over record
when we start a takeover. This logic assumes that the master node can
never perform copy fragment phase. This turns out to be a wrong
assumption since in a system restart a master node can be old and need
to use copy fragment phase to get an up-to-date version of its data.

How to repeat:
testSystemRestart -n Bug41915 D2

Suggested fix:
Ensure that we allow for 2 take over records if we are master node.

Documented fix in the NDB 7.4.5 changelog, as follows:

    NDB node takeover code made the assumption that there would be 
    only one takeover record when starting a takeover, based on the 
    further assumption that the master node could never perform copying 
    of fragments. However, this is not the case in a system restart,         
    where a master node can have stale data and so need to perform
    such copying to bring itself up to date.
      
Closed.