MySQL Bugs: #43567: TimeBetweenLocalCheckpoints is not time-between but rather time inbetween

Bug #43567	TimeBetweenLocalCheckpoints is not time-between but rather time inbetween
Submitted:	11 Mar 2009 16:10	Modified:	12 Mar 2009 18:23
Reporter:	Jonas Oreland	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	*	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
Prior to this bug, the following was true

start LCP
stop LCP
start "counting" TimeBetweenLocalCheckpoints

this meant that all traffic occuring during LCP was running,
was not actually accounted for when considering when to start new LCP.

this behavior could lead to LCP not being started often enough,
causing potential 410 (out of REDO)

How to repeat:
read code

Suggested fix:
start "counting" TimeBetweenLocalCheckpoints at time when LCP starts

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68923

2903 Jonas Oreland	2009-03-11
      ndb - bug#43567 - Fix TimeBetweenLocalCheckpoints

Pushed into 5.1.32-ndb-6.3.24 (revid:jonas@mysql.com-20090311161947-s731pmsq9slkel16) (version source revid:jonas@mysql.com-20090311161040-h17mpz070j6sn3hx) (merge vers: 5.1.32-ndb-6.3.24) (pib:6)

Pushed into 5.1.32-ndb-7.0.4 (revid:jonas@mysql.com-20090311162759-uf49sdcim2trbjef) (version source revid:jonas@mysql.com-20090311162658-5s65avbnpmvet52o) (merge vers: 5.1.32-ndb-7.0.4) (pib:6)

won't fix in 6.2

Documented bugfix in the NDB-6.3.24 and 7.0.4 changelogs as follows:

        TimeBetweenLocalCheckpoints was measured from the end of one
        local checkpoint to the beginning of the next, rather than from
        the beginning of one LCP to the beginning of the next. This
        meant that the time spent performing the LCP was not taken into
        account when determining the TimeBetweenLocalCheckpoints
        interval, so that LCPs were not started often enough, possibly
        causing data nodes to run out of redo log space prematurely.