Bug #43156 Incorrectly handled node-recovery during system restart, can lead to failure
Submitted: 24 Feb 2009 16:45 Modified: 16 Apr 2009 17:01
Reporter: Andrew Hutchings Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:* OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[24 Feb 2009 16:45] Andrew Hutchings
Description:
It appears to be triggered by:

void
Dbacc::accIsLockedLab(Signal* signal, OperationrecPtr lockOwnerPtr)
{
  ndbrequire(csystemRestart == ZFALSE);

How to repeat:
Traffic whilst node starting (have not been able to repeat)

Suggested fix:
[25 Feb 2009 20:20] Andrew Hutchings
Error was actually due to another problem, the message was just part of the fallout.  This bug report is bogus.
[4 Mar 2009 21:02] Jonathan Miller
Hi,

We had a DN crash twice with this same error report. We need to catch this crash and produce a more meaningful error message that can help customers/support resolve whatever is causing this.

/Jeb
[9 Mar 2009 14:05] Jonas Oreland
I can (with some problem) reproduce this.
The situation occurs if getting node-restart during system-restart.
I.e in the system restart, one node doesnt not have sufficient REDO
    and is started using NR code.
Then this ACC variable is not updated correctly.
[16 Apr 2009 12:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72284

2910 Jonas Oreland	2009-04-16
      ndb - bug#43156
        Remove variable in ACC that was not properly maintained
        (in case of node-recovery during SR)
        The test program "testSystemRestart -n to" is changed (only in 6.3)
        to verify this bug/fix
[16 Apr 2009 12:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72289

2941 Jonas Oreland	2009-04-16
      ndb - bug#43156 - modify testSystemRestart -n to, to test bug/fix
[16 Apr 2009 13:07] Bugs System
Pushed into 5.1.32-ndb-7.0.6 (revid:jonas@mysql.com-20090416130342-ztzdb072wg43p9ne) (version source revid:jonas@mysql.com-20090416130342-ztzdb072wg43p9ne) (merge vers: 5.1.32-ndb-7.0.6) (pib:6)
[16 Apr 2009 13:07] Bugs System
Pushed into 5.1.32-ndb-6.2.18 (revid:jonas@mysql.com-20090416123231-jrgi5tefen616px3) (version source revid:jonas@mysql.com-20090416123231-jrgi5tefen616px3) (merge vers: 5.1.32-ndb-6.2.18) (pib:6)
[16 Apr 2009 13:08] Bugs System
Pushed into 5.1.32-ndb-6.3.25 (revid:jonas@mysql.com-20090416125715-1d8d101i32os2a0a) (version source revid:jonas@mysql.com-20090416125715-1d8d101i32os2a0a) (merge vers: 5.1.32-ndb-6.3.25) (pib:6)
[16 Apr 2009 17:01] Jon Stephens
Documented bugfix in the NDB-6.2.18, 6.3.25, and 7.0.6 changelogs as follows:

        In some cases, data node restarts during a system restart could 
        fail due to insufficient redo log space.