| Bug #56961 | Incorrect REDO invalidation can lead to subsequent "error reading redo log" | ||
|---|---|---|---|
| Submitted: | 23 Sep 2010 6:13 | Modified: | 23 Sep 2010 10:18 |
| Reporter: | Jonas Oreland | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
| Version: | mysql-5.1-telco-6.3 | OS: | Any |
| Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[23 Sep 2010 6:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/118873 3301 Jonas Oreland 2010-09-23 [merge] ndb - bug#56961 - fix redo invalidation handling end of file not written
[23 Sep 2010 6:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/118876 3781 Jonas Oreland 2010-09-23 [merge] ndb - merge bug#56961 into 70
[23 Sep 2010 6:29]
Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.47-ndb-7.0.19 (revid:jonas@mysql.com-20100923062337-532mlci8mn53188k) (version source revid:jonas@mysql.com-20100923062337-532mlci8mn53188k) (merge vers: 5.1.47-ndb-7.0.19) (pib:21)
[23 Sep 2010 6:30]
Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.47-ndb-6.3.38 (revid:jonas@mysql.com-20100923061724-9lapdqjviz1kmwyx) (version source revid:jonas@mysql.com-20100923061724-9lapdqjviz1kmwyx) (merge vers: 5.1.47-ndb-6.3.38) (pib:21)
[23 Sep 2010 6:34]
Jonas Oreland
pushed to 6.3.38, 7.0.19 and 7.1.8
[23 Sep 2010 10:18]
Jon Stephens
Documented bugfix in the NDB-6.3.38, 7.0.19, and 7.1.8 changelogs, as follows:
A data node can be shut down having completed and synchronized a
given GCI x, and having written a great many log records
belonging to the next GCI x+1, as part of normal operations.
However, when starting, completing, and synchronizing GCI x+1,
then the log records from original start must not be read. To
make sure that this does not happen, the REDO log reader finds
the last GCI to restore, scans forward from that point, and
erases any log records that were not (and should never be) used.
The current issue occurred because this scan stopped immediately
as soon as it encountered an empty page. This was problematic
because the REDO log is divided into several files; thus, it
could be that there were log records in the beginning of the
next file, even if the end of the previous file was empty. These
log records were never invalidated; following a start or
restart, they could be reused, leading to a corrupt REDO log.
Closed.
[29 Sep 2010 10:55]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/119379 3288 Martin Skold 2010-09-29 [merge] Merge removed: cluster_change_hist.txt modified: mysql-test/collections/default.experimental mysql-test/suite/ndb/r/ndb_database.result mysql-test/suite/ndb/t/ndb_database.test sql/ha_ndbcluster.cc sql/ha_ndbcluster.h sql/ha_ndbcluster_binlog.cc sql/handler.cc sql/handler.h sql/sql_show.cc sql/sql_table.cc storage/ndb/include/kernel/GlobalSignalNumbers.h storage/ndb/include/kernel/signaldata/FsReadWriteReq.hpp storage/ndb/include/mgmapi/mgmapi.h storage/ndb/include/ndbapi/NdbDictionary.hpp storage/ndb/src/kernel/blocks/ERROR_codes.txt storage/ndb/src/kernel/blocks/dbdict/Dbdict.cpp storage/ndb/src/kernel/blocks/dbdih/DbdihMain.cpp storage/ndb/src/kernel/blocks/dblqh/Dblqh.hpp storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp storage/ndb/src/kernel/blocks/dbtup/Dbtup.hpp storage/ndb/src/kernel/blocks/dbtup/DbtupIndex.cpp storage/ndb/src/kernel/blocks/dbtup/DbtupMeta.cpp storage/ndb/src/kernel/blocks/dbtux/Dbtux.hpp storage/ndb/src/kernel/blocks/dbtux/DbtuxBuild.cpp storage/ndb/src/kernel/blocks/dbtux/DbtuxMaint.cpp storage/ndb/src/kernel/blocks/dbtux/DbtuxNode.cpp storage/ndb/src/kernel/blocks/dbtux/DbtuxTree.cpp storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.hpp storage/ndb/src/kernel/blocks/ndbfs/Ndbfs.cpp storage/ndb/src/kernel/blocks/ndbfs/Ndbfs.hpp storage/ndb/src/kernel/blocks/ndbfs/VoidFs.cpp storage/ndb/src/kernel/blocks/suma/Suma.cpp storage/ndb/src/kernel/blocks/suma/Suma.hpp storage/ndb/src/kernel/main.cpp storage/ndb/src/ndbapi/DictCache.cpp storage/ndb/src/ndbapi/DictCache.hpp storage/ndb/src/ndbapi/NdbDictionary.cpp storage/ndb/src/ndbapi/NdbDictionaryImpl.cpp storage/ndb/src/ndbapi/NdbDictionaryImpl.hpp storage/ndb/test/include/NdbRestarter.hpp storage/ndb/test/ndbapi/testIndex.cpp storage/ndb/test/ndbapi/testRestartGci.cpp storage/ndb/test/ndbapi/testSystemRestart.cpp storage/ndb/test/run-test/daily-basic-tests.txt storage/ndb/test/src/NdbRestarter.cpp

Description: When node is shutdown, it can be that it has completed and synced gci X but written lots of log records belonging to X+1 This is in itself not a problem. Problem is if you start, complete & sync X+1 stop, then the log records from original start must not be read. To make sure that this does not happen, the REDO reader will once found last GCI to restore, scan forward and erase log-records that was not used (and should never be used) This bug is that this scan, scanned pages forward from end of log, stopping directly when finding a page that was "empty" However, since REDO log is divided into several files, it could be that there was log records in beginning of next file, even of end of previous file was empty. These (in the beginning of next) was then never invalidated, and could after start/stop/start be reused leading to a corrupt REDO log How to repeat: new testcase Suggested fix: code is rewritten to 1) scan first page of each file until a empty page is found 2) backtrack to last file, and scan that linearly to find last page to invalidate