MySQL Bugs: #52217: Crash during start after long series with mix of system/node restarts

Bug #52217	Crash during start after long series with mix of system/node restarts
Submitted:	19 Mar 2010 13:23	Modified:	29 Mar 2010 7:47
Reporter:	Jonas Oreland	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	mysql-5.1-telco-6.3	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
After running a mixed series of node/system restarts,
a system restart can fail, with ndbrequire in DBLQH.

Problem is that the cnewestCompleteGci can be set to low
for a node performing a node restart, which leads to 
it reporting incorrect GCI-intervals for it's first
LCP.

How to repeat:
testSystemRestart -n SR_DD_3* 
repeats problem sporadically.

Suggested fix:
1) Make sure that starting node participates in 1 GCP prior
   to first LCP

2) For take-over system restart, add if-statement increasing cnewestCompleteGci
   to atleast value of keepGci

NOTE: It could also lead to system restart hanging

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103821

3158 Jonas Oreland	2010-03-19
      ndb - bug#52217 - make sure that cnewestCompleteGci is set properly during node restart (or take-over during sr)

Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100319141318-tm95wrhkwhi10048) (version source revid:jonas@mysql.com-20100319132416-fy2klgc0phg3jgtb) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)

Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100319141557-7n4n8yjjlza9ua1t) (version source revid:jonas@mysql.com-20100319141557-7n4n8yjjlza9ua1t) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)

pushed to 6.3.33, 7.0.14 and 7.1.3

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103979

3163 Jonas Oreland	2010-03-22
      ndb - addendum to bug#52217 - never send a keepGci > newestRestorableGCI from DIH to LQH

Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100322123623-du2q1655e3umtrb4) (version source revid:jonas@mysql.com-20100322123623-du2q1655e3umtrb4) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)

Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100322123839-w1lmo3s0u6c0d8r4) (version source revid:jonas@mysql.com-20100322123839-w1lmo3s0u6c0d8r4) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)

Documented bugfix in the NDB-6.3.33, 7.0.14, and 7.1.3 changelogs, as follows:

        After running a mixed series of node and system restarts, a
        system restart could hang or fail altogether. This was caused by
        setting the value of the newest completed global checkpoint too
        low for a data node performing a node restart, which led to the
        node reporting incorrect GCI intervals for its first local
        checkpoint.

Closed.