Bug #34702 Node failure(sp2) during initial node restart, can lead to subsequent failures
Submitted: 20 Feb 2008 20:32 Modified: 31 May 2008 12:35
Reporter: Jonas Oreland
Status: Closed
Category:Server: Cluster Severity:S3 (Non-critical)
Version:* OS:Any
Assigned to: Jonas Oreland Target Version:
Triage: D2 (Serious) / R2 (Low) / E3 (Medium)

[20 Feb 2008 20:32] Jonas Oreland
Description:
Node failure during initial node restart (sp 2)
Followed by subsequent start
Can lead to crash of master node,
  as it incorrectly gave start node (second time)
  permission using START_PERMCONF even if invalidate node LCP was still running

How to repeat:
.

Suggested fix:
.
[20 Feb 2008 20:33] Jonas Oreland
likelihood increases if no of tables is "big", as invalidate node LCP
  will then take longer
[21 Feb 2008 10:22] Jonas Oreland
pushed to 6.2.13 (and tagged a release)
pending merge to 6.3

wont fix in 4.1,5.0,5.1
[22 Feb 2008 11:47] Jon Stephens
Documented in the 5.1.23-ndb-6.2.13 changelog as follows:

        A node failure during an initial node restart followed by
        another node start could cause the master data node to fail,
        because it incorrectly gave the node permission to start even if
        the invalidated node's LCP was still running.

Left in PQ status pending merge to ndb-6.3.
[31 May 2008 12:35] Jon Stephens
Also documented for 5.1.24-ndb-6.3.13 (actually pushed to ndb-6.1.11 but release was
pulled and changelog entries re-tagged as 6.3.13). Closed per yesterday's discussion with
Jonas.
[27 Jun 2008 10:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48607

2632 jonas@mysql.com	2008-06-27
      ndb -
        increase timeout for testNodeRestart -n Bug34702 T1