Bug #44844 Optimized NodeRecovery might be incorrectly disabled if node is down too long
Submitted: 13 May 2009 11:38 Modified: 20 May 2009 9:45
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:* OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[13 May 2009 11:38] Jonas Oreland
Description:
If a node is down "too" long, it might be that it does
not run optimized node recovery when later starting.

Too long = The starting nodes LCP depends on a GCI that
  has is not restorable in running cluster.

But, the starting node does in fact have sufficient REDO
so the fix is mearly to check you're own REDO instead of
the REDO from the running cluster

This bug is very similar to bug#26913, have approx. the same symptom,
but is due to a different cause

How to repeat:
.

Suggested fix:
.
[13 May 2009 11:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73933

2950 Jonas Oreland	2009-05-13
      ndb - bug#44844
        Don't remove crashed replicas too early,
        which can disable optimized NR
[13 May 2009 12:41] Bugs System
Pushed into 5.1.34-ndb-7.0.6 (revid:jonas@mysql.com-20090513123752-1v4cldk31xww6e1s) (version source revid:jonas@mysql.com-20090513122702-trk3j80liv8bwehy) (merge vers: 5.1.34-ndb-7.0.6) (pib:6)
[13 May 2009 12:42] Bugs System
Pushed into 5.1.34-ndb-6.3.25 (revid:jonas@mysql.com-20090513115722-p2hj3dkob2jaelpt) (version source revid:jonas@mysql.com-20090513115722-p2hj3dkob2jaelpt) (merge vers: 5.1.34-ndb-6.3.25) (pib:6)
[18 May 2009 9:48] Jonas Oreland
note: won't fix in 6.2
[20 May 2009 9:45] Jon Stephens
Documented bugfix in the NDB-6.3.25 and 7.0.6 changelogs as follows:

        When a data node was down so long that its most recent local
        checkpoint depended on a global checkpoint that was no longer
        restorable, it was possible for it to be unable to use optimized
        node recovery when being restarted later.