Bug #52217 Crash during start after long series with mix of system/node restarts
Submitted: 19 Mar 2010 13:23 Modified: 29 Mar 2010 7:47
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[19 Mar 2010 13:23] Jonas Oreland
Description:
After running a mixed series of node/system restarts,
a system restart can fail, with ndbrequire in DBLQH.

Problem is that the cnewestCompleteGci can be set to low
for a node performing a node restart, which leads to 
it reporting incorrect GCI-intervals for it's first
LCP.

How to repeat:
testSystemRestart -n SR_DD_3* 
repeats problem sporadically.

Suggested fix:
1) Make sure that starting node participates in 1 GCP prior
   to first LCP

2) For take-over system restart, add if-statement increasing cnewestCompleteGci
   to atleast value of keepGci
[19 Mar 2010 13:23] Jonas Oreland
NOTE: It could also lead to system restart hanging
[19 Mar 2010 13:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103821

3158 Jonas Oreland	2010-03-19
      ndb - bug#52217 - make sure that cnewestCompleteGci is set properly during node restart (or take-over during sr)
[19 Mar 2010 14:24] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100319141318-tm95wrhkwhi10048) (version source revid:jonas@mysql.com-20100319132416-fy2klgc0phg3jgtb) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[19 Mar 2010 14:24] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100319141557-7n4n8yjjlza9ua1t) (version source revid:jonas@mysql.com-20100319141557-7n4n8yjjlza9ua1t) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[19 Mar 2010 14:25] Jonas Oreland
pushed to 6.3.33, 7.0.14 and 7.1.3
[22 Mar 2010 12:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103979

3163 Jonas Oreland	2010-03-22
      ndb - addendum to bug#52217 - never send a keepGci > newestRestorableGCI from DIH to LQH
[22 Mar 2010 12:39] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100322123623-du2q1655e3umtrb4) (version source revid:jonas@mysql.com-20100322123623-du2q1655e3umtrb4) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[22 Mar 2010 12:43] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100322123839-w1lmo3s0u6c0d8r4) (version source revid:jonas@mysql.com-20100322123839-w1lmo3s0u6c0d8r4) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[29 Mar 2010 7:47] Jon Stephens
Documented bugfix in the NDB-6.3.33, 7.0.14, and 7.1.3 changelogs, as follows:

        After running a mixed series of node and system restarts, a
        system restart could hang or fail altogether. This was caused by
        setting the value of the newest completed global checkpoint too
        low for a data node performing a node restart, which led to the
        node reporting incorrect GCI intervals for its first local
        checkpoint.

Closed.