Bug #25984 Eight node restart failures in a row can kill a cluster
Submitted: 31 Jan 2007 15:30 Modified: 8 Feb 2007 7:40
Reporter: Hartmut Holzgraefe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:4.1,5.0,5.1 OS:Any (*)
Assigned to: Jonas Oreland

[31 Jan 2007 15:30] Hartmut Holzgraefe
Description:
A node failing during restart eight times in a row will cause other nodes to fail with 

Status: Ndbd file system error, restart node initial
Message: Too many crashed replicas (8 consecutive node restart failures) (Ndbd file system limit exceeded)
Error: 6300
Error data: DbdihMain.cpp
Error object: DBDIH (Line: 12062) 0x0000000a
Program: /usr/mysql/libexec/ndbd

which brings down the cluster.

How to repeat:
In a working cluster: restart a node and kill it during restart often enough

Suggested fix:
short term: raise the limit from 8 to a higher value

long term: completely remove the limitation
[2 Feb 2007 16:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19256

ChangeSet@1.2419, 2007-02-02 17:07:15+01:00, jonas@eel.(none) +3 -0
  ndb - bug#25984 - more than 7 failed node restart can cause cluster failure
  new behaviour is as follows:
  1) node is refused to start, and should fail with message in error log that it must be restarted --initial
  2) if cluster failure in this situation, node must also be restarted --intial
     if not SR will fail, with this message
[7 Feb 2007 17:14] Tomas Ulin
pushed to 5.1.16
[8 Feb 2007 7:40] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented in 5.1.16 changelog.
[5 Mar 2007 15:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21156

ChangeSet@1.2106, 2007-03-05 16:16:01+01:00, jonas@perch.ndb.mysql.com +3 -0
  ndb - wl2325-5.0
    Bug #25984 8 failed node restart kills alive cluster
[28 May 2009 6:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75101

2978 Jonas Oreland	2009-05-28
      ndb - fix bug#25984 - that broke
[28 May 2009 9:16] Bugs System
Pushed into 5.1.34-ndb-7.0.7 (revid:jonas@mysql.com-20090528091244-75lkl1qtv3xj6fou) (version source revid:jonas@mysql.com-20090528070844-e66w6dq51tfrdzvc) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)
[28 May 2009 9:17] Bugs System
Pushed into 5.1.34-ndb-6.3.26 (revid:jonas@mysql.com-20090528090921-r0h8b75anphuf8w7) (version source revid:jonas@mysql.com-20090528060313-moq2kyeuk8qje9bs) (merge vers: 5.1.34-ndb-6.3.26) (pib:6)
[28 Aug 2009 8:53] Jon Stephens
Added following to NDB-6.3.26/7.0.7 version of changelog entry, after discussion with Jonas:

        NOTE: This issue, originally resolved in MySQL 5.1.16, re-occurred 
        due to a later (unrelated) change. The fix has been re-applied.