Bug #50062 Some node restarts not quite in parallel
Submitted: 4 Jan 2010 21:40 Modified: 6 Jul 2010 11:24
Reporter: Andrew Hutchings Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:6.3 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[4 Jan 2010 21:40] Andrew Hutchings
Description:
When a cluster is running, multiple nodes taken down and then those nodes initial started only one node seems to be able to enter phases 2-4 at a time.

How to repeat:
1. Start a 4 data node cluster
2. Stop 2 nodes (from different node groups)
3. Start both nodes again with --initial simultaneously

One of the nodes will wait until the other has completed phase 4 before it will complete phase 2.
[9 Jun 2010 10:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110597

3209 Jonas Oreland	2010-06-09
      ndb - bug#50062 - Move initialization of REDO-log during initial start/node-restart to startphase 2, so that it's can be performed by several nodes in parallel
[9 Jun 2010 11:06] Bugs System
Pushed into 5.1.44-ndb-7.0.16 (revid:jonas@mysql.com-20100609105615-3lgp4z2gsd3ef4o8) (version source revid:jonas@mysql.com-20100609105615-3lgp4z2gsd3ef4o8) (merge vers: 5.1.44-ndb-7.0.16) (pib:16)
[9 Jun 2010 11:15] Jonas Oreland
DOCS: Some part of node-restart will be run only 1-node-at-a-time
regardless of parallel node-restart.
Prior to this patch, initialization of REDO was run in the 1-node-at-time
for initial node restart.
This patch changes so this step can be run by several nodes in parallel.

Note: during non-initial node-restart, REDO log is (of course) not reinitialized.

Pushed to 6.3.35, 7.0.16 and 7.1.5
[6 Jul 2010 11:24] Jon Stephens
Documented in the NDB-6.3.35, 7.0.16, and 7.1.5 changelogs, as follows:

      During initial node restarts, initialization of the REDO log was 
      always performed 1 node at a time, during start phase 4. Now this is 
      done during start phase 2, so that the initialization can be performed 
      in parallel, thus decreasing the time required for initial restarts of
      multiple nodes.

Closed.