Bug #51723 TAIL_PROBLEM is not set appropriately during node-restart
Submitted: 4 Mar 2010 14:33 Modified: 5 Mar 2010 13:49
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[4 Mar 2010 14:33] Jonas Oreland
Description:
The redo log protects it self from becoming full by checking how much is free
each time it passes a mega-byte limit.

If too little redo-log is available then, it set the state TAIL_PROBLEM
which results in transactions being aborted with error code 410
(out of redolog error)

However, this state is not set after a node restart,
which means that if a node directly after a node restart
has little or no redo-log, it can shortly after crash with
error "Fatal error due to end of REDO log"

How to repeat:
run testSystemRestart -n to T1 sufficiently long
with small enough redo-log

Suggested fix:
check free space during node-restart
[5 Mar 2010 6:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102387

3129 Jonas Oreland	2010-03-05
      ndb - bug#51723 - make sure tail-problem is sent after NR
[5 Mar 2010 6:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102389

3130 Jonas Oreland	2010-03-05
      ndb - bug#51723 - missing jam
[5 Mar 2010 7:31] Bugs System
Pushed into 5.1.41-ndb-6.3.33 (revid:jonas@mysql.com-20100305063206-cn48uat5vhh47spu) (version source revid:jonas@mysql.com-20100305063206-cn48uat5vhh47spu) (merge vers: 5.1.41-ndb-6.3.33) (pib:16)
[5 Mar 2010 7:31] Bugs System
Pushed into 5.1.41-ndb-7.0.14 (revid:jonas@mysql.com-20100305063436-nnfc4130ju8pr33k) (version source revid:jonas@mysql.com-20100305063436-nnfc4130ju8pr33k) (merge vers: 5.1.41-ndb-7.0.14) (pib:16)
[5 Mar 2010 7:41] Jonas Oreland
pushed to 6.3.33, 7.0.14 and 7.1.3
[5 Mar 2010 13:49] Jon Stephens
Documented as follows in the NDB-6.3.33, 7.0.14, and 7.1.3 changelogs:

        The redo log protects itself from being filled up by
        periodically checking how much space remains free. If
        insufficient redo log space is available, it sets the state
        TAIL_PROBLEM which results in transactions being aborted with
        error code 410 (out of redo log). However, this state was not
        set following a node restart, which meant that if a data node
        had insufficient redo log space following a node restart, it
        could crash a short time later with -Fatal error due to end of 
        REDO log-. Now, this space is checked during a node restart.

Closed.