MySQL Bugs: #34702: Node failure(sp2) during initial node restart, can lead to subsequent failures

Bug #34702	Node failure(sp2) during initial node restart, can lead to subsequent failures
Submitted:	20 Feb 2008 19:32	Modified:	31 May 2008 10:35
Reporter:	Jonas Oreland	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	*	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
Node failure during initial node restart (sp 2)
Followed by subsequent start
Can lead to crash of master node,
  as it incorrectly gave start node (second time)
  permission using START_PERMCONF even if invalidate node LCP was still running

How to repeat:
.

Suggested fix:
.

likelihood increases if no of tables is "big", as invalidate node LCP
  will then take longer

pushed to 6.2.13 (and tagged a release)
pending merge to 6.3

wont fix in 4.1,5.0,5.1

Documented in the 5.1.23-ndb-6.2.13 changelog as follows:

        A node failure during an initial node restart followed by
        another node start could cause the master data node to fail,
        because it incorrectly gave the node permission to start even if
        the invalidated node's LCP was still running.

Left in PQ status pending merge to ndb-6.3.

Also documented for 5.1.24-ndb-6.3.13 (actually pushed to ndb-6.1.11 but release was pulled and changelog entries re-tagged as 6.3.13). Closed per yesterday's discussion with Jonas.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48607

2632 jonas@mysql.com	2008-06-27
      ndb -
        increase timeout for testNodeRestart -n Bug34702 T1