Bug #57322 TimeBetweenGlobalCheckpoints can be set lower than 3 * HeartbeatIntervalDbDb
Submitted: 7 Oct 2010 17:07 Modified: 20 Oct 2010 12:57
Reporter: Daniel Smythe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:6.3.26 + OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: GCP stop, Network Partitioning, TimeBetweenGlobalCheckpoints

[7 Oct 2010 17:07] Daniel Smythe
Description:
It is possible to set TimeBetweenGlobalCheckpoints lower than 3 * HeartbeatIntervalDbDb. This can result in GCP Stop during a network partition before nodes are voted out. 

How to repeat:
HeartbeatIntervalDbDb = 400
TimeBetweenGlobalCheckpoints = 1000

Sever network between nodes. GCP Stop should occur before node is voted out.

Suggested fix:
Perhaps TimeBetweenGlobalCheckpoints should be set internally to the greater of 3*HeartbeatIntervalDbDb, or the configured TimeBetweenGlobalCheckpoints at a minimum. Perhaps even 4*HeartbeatIntervalDbDb. A message should be logged to report the internal change.
[20 Oct 2010 9:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/121273

3311 Jonas Oreland	2010-10-20
      ndb - bug#57322 - compute correct "failure times" when setting max-lag values for gcp and micro-gcp
[20 Oct 2010 9:56] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:jonas@mysql.com-20101020094501-9c07g1dk6ltsmn2o) (version source revid:jonas@mysql.com-20101020094501-9c07g1dk6ltsmn2o) (merge vers: 5.1.51-ndb-6.3.39) (pib:21)
[20 Oct 2010 9:56] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:jonas@mysql.com-20101020094955-fr3nxe2j2h106p12) (version source revid:jonas@mysql.com-20101020094955-fr3nxe2j2h106p12) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)
[20 Oct 2010 9:59] Jonas Oreland
pushed to 6.3.39, 7.0.20 and 7.1.9
[20 Oct 2010 10:06] Jonas Oreland
Explanation:
1) GCP stop is detected using 2 "max-lag" variables (one for GCP and one for epochs), which maximum time that gcp/epoch can be unchanged.
2) If e.g TimeBetweenEpochsTimeout=100 but HeartbeatDBDB=1500
  a node failure can be fired after 4 missed heartbeats (e.g 6000 ms)
  That means that the TimeBetweenEpochsTimeout would be exceeded,
    and a gcp would "incorrectly" be detected.
3) Therefor the TimeBetweenEpochsTimeout is automatically adjusted based on the
  values of HeartbeatDBDB and ArbitTimeout.

However: The automatic adjustment didn't correctly take into consideration
  that during cascading node-failures, there can be several "iterations" of
  (4 * HeartbeatDBDB + ArbitTimeout) timeouts going on until all node-failures
  has internally been resolved. Therefor a "incorrect" GCP detection could
  happen with cascading node failures.

So the patch fixes so that is also considered.
(btw: given this I think that synposis is a bit missleading...)
[20 Oct 2010 12:57] Jon Stephens
Documented as follows in the NDB-6.3.39, 7.0.20, and 7.1.9 changelogs:

        A GCP stop is detected using 2 parameters which determine the
        maximum time that a global checkpoint or epoch can go unchanged;
        one of these controls this timeout for GCPs and one controls the
        timeout for epochs. Suppose the cluster is configured such that
        TimeBetweenEpochsTimeout is 100 ms but HeartbeatDBDB is 1500 ms.
        A node failure can be signalled after 4 missed
        heartbeats—in this case, 6000 ms. However, this would
        exceed TimeBetweenEpochsTimeout, causing false detection of a
        GCP. To prevent this from happening, the configured value for
        TimeBetweenEpochsTimeout is automatically adjusted, based on the
        values of HeartbeatDBDB and ArbitrationTimeout.

        The current issue arose when the automatic adjustment routine
        did not correctly take into consideration the fact that, during
        cascading node-failures, several intervals of length 4 *
        (HeartbeatDBDB + ArbitrationTimeout) may elapse before all node
        failures have internally been resolved. This could cause false
        GCP detection in the event of a cascading node failure.

Closed.