Bug #40370 master node can die during node-restart
Submitted: 28 Oct 2008 10:50 Modified: 29 Oct 2008 6:36
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:6.3.18 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[28 Oct 2008 10:50] Jonas Oreland
Description:
during node-restart, the master could die with ndbrequire in dbdih
this happened if lcp/gcp was started/triggered exactly during START_ME

this was introduced in 6.3.18 :-( by me

How to repeat:
testNodeRestart -l 10 -n pnr_lcp T1 on >2 node cluster

Suggested fix:
fix incorrect ndbrequire
[28 Oct 2008 10:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57186

2722 Jonas Oreland	2008-10-28
      ndb - bug#40370 - master node can die during node-restart
        - fix incorrect assertion
        - improve testNodeRestart -n pnr to restart half cluster
          instead of just 2 nodes at a time
[28 Oct 2008 10:54] Bugs System
Pushed into 5.1.28-ndb-6.3.19  (revid:jonas@mysql.com-20081028105758-grbcgsv502qapltu) (version source revid:jonas@mysql.com-20081028105758-grbcgsv502qapltu) (pib:5)
[28 Oct 2008 10:58] Bugs System
Pushed into 5.1.28-ndb-6.4.0  (revid:jonas@mysql.com-20081028105758-grbcgsv502qapltu) (version source revid:jonas@mysql.com-20081028110017-9fukws1kacd7vn6b) (pib:5)
[29 Oct 2008 6:36] Jon Stephens
Documented bugfix in the ndb-6.3.19 changelog as follows:

        A restarting data node could fail with an error in the
        DBDIH kernel block if a local or global checkpoint
        was started or triggered just as the node made a request for data from
        another data node.