Bug #47171 MaxNoOfOpenFiles exceeded during REDO invalidation
Submitted: 7 Sep 2009 11:40 Modified: 16 Sep 2009 13:48
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[7 Sep 2009 11:40] Jonas Oreland
Description:
When node/cluster is restarting, it will 
1) first "run" redo until last restorable GCI
2) then it will scan the REDO forward, search for entries that has should be
   invalidated (so they don't turn-up in subsequent restart)

   (i.e if restoring gci 25, then there might be entries belonging to gci 26
    in the log)

Currently during 2), files are not closed as scanning forward in log.
This can (in pathological cases) cause MaxNoOfOpenFiles to be exceeded
causing node(cluster) to crash.

How to repeat:
Add assertion that no more than 4 files should be open per part.
Set TimeBetweenGlobalCheckpoints = 30000
    TimeBetweenLocalCheckpoints = 31
    NoOfFragmentLogfiles = 16
    FragmentLogFileSize = 16M

run update until out-of-redo is reported,
try to restart

Suggested fix:
open/close files according to "normal" rules while invalidating
[8 Sep 2009 10:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82660

3035 Jonas Oreland	2009-09-08
      ndb - bug#47171 - tentative patch for autotest
[8 Sep 2009 10:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82664

3036 Jonas Oreland	2009-09-08
      ndb - bug#47171 - tentative patch for autotest
[11 Sep 2009 7:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83003

3044 Jonas Oreland	2009-09-11
      ndb - bug#47171
        ndbd could crash on MaxNoOfOpenFiles during REDO invalidation
          due to careless opening of files
[11 Sep 2009 7:57] Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:jonas@mysql.com-20090911075708-h4ihzy233qknt4vr) (version source revid:jonas@mysql.com-20090911075708-h4ihzy233qknt4vr) (merge vers: 5.1.35-ndb-7.1.0) (pib:11)
[15 Sep 2009 12:23] Jonas Oreland
pushed to 6.3.27, 7.0.8
[16 Sep 2009 13:48] Jon Stephens
Documented bugfix in the NDB-6.3.27 and 7.0.8 changelogs as follows:

        When a data node restarts, first runs the redo log until it
        reaches the latest restorable GCI; after this it scans the
        remainder of the redo log file, searching for entries that
        should be invalidated so they are not used in any subsequent
        restarts. (It is possible, for example, if restoring GCI number
        25, that there might be entries belonging to GCI 26 in the redo
        log.) However, under certain rare conditions, during the
        invalidation process, the redo log files themselves were not
        always closed while scanning ahead in the redo log. In rare
        cases, this could lead to MaxNoOfOpenFiles being exceeded,
        causing a the data node to crash.

Closed.