| Bug #47171 | MaxNoOfOpenFiles exceeded during REDO invalidation | ||
|---|---|---|---|
| Submitted: | 7 Sep 2009 11:40 | Modified: | 16 Sep 2009 13:48 |
| Reporter: | Jonas Oreland | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
| Version: | OS: | Any | |
| Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[8 Sep 2009 10:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/82660 3035 Jonas Oreland 2009-09-08 ndb - bug#47171 - tentative patch for autotest
[8 Sep 2009 10:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/82664 3036 Jonas Oreland 2009-09-08 ndb - bug#47171 - tentative patch for autotest
[11 Sep 2009 7:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/83003 3044 Jonas Oreland 2009-09-11 ndb - bug#47171 ndbd could crash on MaxNoOfOpenFiles during REDO invalidation due to careless opening of files
[11 Sep 2009 7:57]
Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:jonas@mysql.com-20090911075708-h4ihzy233qknt4vr) (version source revid:jonas@mysql.com-20090911075708-h4ihzy233qknt4vr) (merge vers: 5.1.35-ndb-7.1.0) (pib:11)
[15 Sep 2009 12:23]
Jonas Oreland
pushed to 6.3.27, 7.0.8
[16 Sep 2009 13:48]
Jon Stephens
Documented bugfix in the NDB-6.3.27 and 7.0.8 changelogs as follows:
When a data node restarts, first runs the redo log until it
reaches the latest restorable GCI; after this it scans the
remainder of the redo log file, searching for entries that
should be invalidated so they are not used in any subsequent
restarts. (It is possible, for example, if restoring GCI number
25, that there might be entries belonging to GCI 26 in the redo
log.) However, under certain rare conditions, during the
invalidation process, the redo log files themselves were not
always closed while scanning ahead in the redo log. In rare
cases, this could lead to MaxNoOfOpenFiles being exceeded,
causing a the data node to crash.
Closed.

Description: When node/cluster is restarting, it will 1) first "run" redo until last restorable GCI 2) then it will scan the REDO forward, search for entries that has should be invalidated (so they don't turn-up in subsequent restart) (i.e if restoring gci 25, then there might be entries belonging to gci 26 in the log) Currently during 2), files are not closed as scanning forward in log. This can (in pathological cases) cause MaxNoOfOpenFiles to be exceeded causing node(cluster) to crash. How to repeat: Add assertion that no more than 4 files should be open per part. Set TimeBetweenGlobalCheckpoints = 30000 TimeBetweenLocalCheckpoints = 31 NoOfFragmentLogfiles = 16 FragmentLogFileSize = 16M run update until out-of-redo is reported, try to restart Suggested fix: open/close files according to "normal" rules while invalidating