MySQL Bugs: #50062: Some node restarts not quite in parallel

Bug #50062	Some node restarts not quite in parallel
Submitted:	4 Jan 2010 21:40	Modified:	6 Jul 2010 11:24
Reporter:	Andrew Hutchings	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	6.3	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
When a cluster is running, multiple nodes taken down and then those nodes initial started only one node seems to be able to enter phases 2-4 at a time.

How to repeat:
1. Start a 4 data node cluster
2. Stop 2 nodes (from different node groups)
3. Start both nodes again with --initial simultaneously

One of the nodes will wait until the other has completed phase 4 before it will complete phase 2.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110597

3209 Jonas Oreland	2010-06-09
      ndb - bug#50062 - Move initialization of REDO-log during initial start/node-restart to startphase 2, so that it's can be performed by several nodes in parallel

Pushed into 5.1.44-ndb-7.0.16 (revid:jonas@mysql.com-20100609105615-3lgp4z2gsd3ef4o8) (version source revid:jonas@mysql.com-20100609105615-3lgp4z2gsd3ef4o8) (merge vers: 5.1.44-ndb-7.0.16) (pib:16)

DOCS: Some part of node-restart will be run only 1-node-at-a-time
regardless of parallel node-restart.
Prior to this patch, initialization of REDO was run in the 1-node-at-time
for initial node restart.
This patch changes so this step can be run by several nodes in parallel.

Note: during non-initial node-restart, REDO log is (of course) not reinitialized.

Pushed to 6.3.35, 7.0.16 and 7.1.5

Documented in the NDB-6.3.35, 7.0.16, and 7.1.5 changelogs, as follows:

      During initial node restarts, initialization of the REDO log was 
      always performed 1 node at a time, during start phase 4. Now this is 
      done during start phase 2, so that the initialization can be performed 
      in parallel, thus decreasing the time required for initial restarts of
      multiple nodes.

Closed.