MySQL Bugs: #43053: Parallel node-recovery can be serialized by LCP

Bug #43053	Parallel node-recovery can be serialized by LCP
Submitted:	20 Feb 2009 10:13	Modified:	20 Feb 2009 14:55
Reporter:	Jonas Oreland	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	>= 6.3	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
Parallel node recovery as impl. in 6.3 is
- serial when copying/syncing dictionary/checkpoint information
- parallel when copying/syncing data

The syncing of dictionary/checkpoint information can also not run in parallel
with local checkpoint.

So, if several nodes start in parallel, they can be serialized to wait for LCP
if LCP is running/starting continuously.

This can impose extra restart times.

How to repeat:
see above

Suggested fix:
This fix, introduces a new parameter MaxLCPStartDelay (in seconds)
which makes LCP to be delayed up to this value (if there are nodes waiting), 
to allow several nodes to be started and sync dictionary/checkpoint information.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67009

2870 Jonas Oreland	2009-02-20
      ndb - bug#43053 - introduce parameter that can delay LCP to allow for "more" parallel node recovery

Pushed into 5.1.32-ndb-6.3.23 (revid:jonas@mysql.com-20090220102059-h4gkj6mio06bhtwn) (version source revid:jonas@mysql.com-20090220102059-h4gkj6mio06bhtwn) (merge vers: 5.1.32-ndb-6.3.23) (pib:6)

Pushed into 5.1.32-ndb-6.4.3 (revid:jonas@mysql.com-20090220103615-q13lhmhbzdri4u4t) (version source revid:jonas@mysql.com-20090220103615-q13lhmhbzdri4u4t) (merge vers: 5.1.32-ndb-6.4.3) (pib:6)

Documented in the NDB-6.3.23 and 6.4.3 changelogs as follows:

        A new data node configuration parameter MaxLCPStartDelay has
        been introduced to facilitate parallel node recovery by causing
        a local checkpoint to be delayed while recovering nodes are
        synchronizing data dictionaries and other meta-information. For
        more information about this parameter, see "Defining MySQL Cluster 
        Data Nodes" 
(http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html).