Bug #43567 TimeBetweenLocalCheckpoints is not time-between but rather time inbetween
Submitted: 11 Mar 2009 16:10 Modified: 12 Mar 2009 18:23
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:* OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[11 Mar 2009 16:10] Jonas Oreland
Description:
Prior to this bug, the following was true

start LCP
stop LCP
start "counting" TimeBetweenLocalCheckpoints

this meant that all traffic occuring during LCP was running,
was not actually accounted for when considering when to start new LCP.

this behavior could lead to LCP not being started often enough,
causing potential 410 (out of REDO)

How to repeat:
read code

Suggested fix:
start "counting" TimeBetweenLocalCheckpoints at time when LCP starts
[11 Mar 2009 16:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68923

2903 Jonas Oreland	2009-03-11
      ndb - bug#43567 - Fix TimeBetweenLocalCheckpoints
[11 Mar 2009 16:32] Bugs System
Pushed into 5.1.32-ndb-6.3.24 (revid:jonas@mysql.com-20090311161947-s731pmsq9slkel16) (version source revid:jonas@mysql.com-20090311161040-h17mpz070j6sn3hx) (merge vers: 5.1.32-ndb-6.3.24) (pib:6)
[11 Mar 2009 16:33] Bugs System
Pushed into 5.1.32-ndb-7.0.4 (revid:jonas@mysql.com-20090311162759-uf49sdcim2trbjef) (version source revid:jonas@mysql.com-20090311162658-5s65avbnpmvet52o) (merge vers: 5.1.32-ndb-7.0.4) (pib:6)
[11 Mar 2009 16:38] Jonas Oreland
won't fix in 6.2
[12 Mar 2009 18:23] Jon Stephens
Documented bugfix in the NDB-6.3.24 and 7.0.4 changelogs as follows:

        TimeBetweenLocalCheckpoints was measured from the end of one
        local checkpoint to the beginning of the next, rather than from
        the beginning of one LCP to the beginning of the next. This
        meant that the time spent performing the LCP was not taken into
        account when determining the TimeBetweenLocalCheckpoints
        interval, so that LCPs were not started often enough, possibly
        causing data nodes to run out of redo log space prematurely.