MySQL Bugs: #43156: Incorrectly handled node-recovery during system restart, can lead to failure

Bug #43156	Incorrectly handled node-recovery during system restart, can lead to failure
Submitted:	24 Feb 2009 16:45	Modified:	16 Apr 2009 17:01
Reporter:	Andrew Hutchings	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	*	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
It appears to be triggered by:

void
Dbacc::accIsLockedLab(Signal* signal, OperationrecPtr lockOwnerPtr)
{
  ndbrequire(csystemRestart == ZFALSE);

How to repeat:
Traffic whilst node starting (have not been able to repeat)

Suggested fix:

Error was actually due to another problem, the message was just part of the fallout.  This bug report is bogus.

Hi,

We had a DN crash twice with this same error report. We need to catch this crash and produce a more meaningful error message that can help customers/support resolve whatever is causing this.

/Jeb

I can (with some problem) reproduce this.
The situation occurs if getting node-restart during system-restart.
I.e in the system restart, one node doesnt not have sufficient REDO
    and is started using NR code.
Then this ACC variable is not updated correctly.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72284

2910 Jonas Oreland	2009-04-16
      ndb - bug#43156
        Remove variable in ACC that was not properly maintained
        (in case of node-recovery during SR)
        The test program "testSystemRestart -n to" is changed (only in 6.3)
        to verify this bug/fix

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72289

2941 Jonas Oreland	2009-04-16
      ndb - bug#43156 - modify testSystemRestart -n to, to test bug/fix

Pushed into 5.1.32-ndb-7.0.6 (revid:jonas@mysql.com-20090416130342-ztzdb072wg43p9ne) (version source revid:jonas@mysql.com-20090416130342-ztzdb072wg43p9ne) (merge vers: 5.1.32-ndb-7.0.6) (pib:6)

Pushed into 5.1.32-ndb-6.2.18 (revid:jonas@mysql.com-20090416123231-jrgi5tefen616px3) (version source revid:jonas@mysql.com-20090416123231-jrgi5tefen616px3) (merge vers: 5.1.32-ndb-6.2.18) (pib:6)

Pushed into 5.1.32-ndb-6.3.25 (revid:jonas@mysql.com-20090416125715-1d8d101i32os2a0a) (version source revid:jonas@mysql.com-20090416125715-1d8d101i32os2a0a) (merge vers: 5.1.32-ndb-6.3.25) (pib:6)

Documented bugfix in the NDB-6.2.18, 6.3.25, and 7.0.6 changelogs as follows:

        In some cases, data node restarts during a system restart could 
        fail due to insufficient redo log space.