MySQL Bugs: #40370: master node can die during node-restart

Bug #40370	master node can die during node-restart
Submitted:	28 Oct 2008 10:50	Modified:	29 Oct 2008 6:36
Reporter:	Jonas Oreland	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	6.3.18	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
during node-restart, the master could die with ndbrequire in dbdih
this happened if lcp/gcp was started/triggered exactly during START_ME

this was introduced in 6.3.18 :-( by me

How to repeat:
testNodeRestart -l 10 -n pnr_lcp T1 on >2 node cluster

Suggested fix:
fix incorrect ndbrequire

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57186

2722 Jonas Oreland	2008-10-28
      ndb - bug#40370 - master node can die during node-restart
        - fix incorrect assertion
        - improve testNodeRestart -n pnr to restart half cluster
          instead of just 2 nodes at a time

Pushed into 5.1.28-ndb-6.3.19  (revid:jonas@mysql.com-20081028105758-grbcgsv502qapltu) (version source revid:jonas@mysql.com-20081028105758-grbcgsv502qapltu) (pib:5)

Pushed into 5.1.28-ndb-6.4.0  (revid:jonas@mysql.com-20081028105758-grbcgsv502qapltu) (version source revid:jonas@mysql.com-20081028110017-9fukws1kacd7vn6b) (pib:5)

Documented bugfix in the ndb-6.3.19 changelog as follows:

        A restarting data node could fail with an error in the
        DBDIH kernel block if a local or global checkpoint
        was started or triggered just as the node made a request for data from
        another data node.