Bug #39678 | Assertion (bdb) fails in IndexRootPage::indexMerge during recovery | ||
---|---|---|---|
Submitted: | 26 Sep 2008 18:11 | Modified: | 15 May 2009 12:45 |
Reporter: | John Embretsen | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0-falcon-team | OS: | Any |
Assigned to: | Christopher Powers | CPU Architecture: | Any |
Tags: | F_ONLINE ALTER, pb2, pushbuild, test failure |
[26 Sep 2008 18:11]
John Embretsen
[21 Jan 2009 21:04]
Christopher Powers
Did some debugging on this... The assertion occurs in IndexRootPage::indexMerge(), line 951, during the processing of an SRLUpdateIndex record for indexId == 1288: Error::debugBreak() Error::error(char const*, ...) Error::assertionFailed(char const*, char const*, int) IndexRootPage::indexMerge(Dbb*, int, SRLUpdateIndex*, unsigned int) <<<< SRLUpdateIndex::execute() SRLUpdateIndex::redo() SerialLog::recover() Database::openDatabase(char const*) Connection::getDatabase(char const*, char const*, Threads*) Connection::openDatabase(char const*, char const*, char const*, char const*, char const*, Threads*) StorageDatabase::getOpenConnection() StorageHandler::initialize() StorageInterface::falcon_init(void*) ha_initialize_handlerton(st_plugin_int*) void IndexRootPage::indexMerge(Dbb *dbb, int indexId, SRLUpdateIndex *logRecord, TransId transId) { IndexKey key; int recordNumber = logRecord->nextKey(&key); [...] for (; recordNumber != -1;) { // Position to insert first key (clone of addIndexEntry) IndexKey searchKey(key); searchKey.appendRecordNumber(recordNumber); [..] Bdb *bdb = findInsertionLeaf(dbb, indexId, &searchKey, ...); ASSERT(bdb); <<<< The sequence of events recorded in the serial log indicates the following: 1. Index 1288 was created 2. 10 records were inserted into the index 3. Index 1288 was deleted 4. 1 more record, id = 756, was inserted into index 1288 Recovery fails because by the time the SRLUpdateIndex entry for record 756 is processed, the state of the recovery object associated with index 1288 is 'active', and SRLUpdateIndex::execute() attempts to insert the record into a non-existent index. The recovery completes successfully if the serial log entry for record 756 is skipped. The falcon_online_alter stress test issues thousands of multiple concurrent online add/drop index commands--obviously an extreme scenario--so it is likely that an attempted index update would coincide or trail the deletion of the same index. The MySQL online alter API has a known weakness such that concurrent operations can sneak in between the second and third phase of an online DDL operation. This may be a contributing factor, however, Falcon should still guard against such occurrences. Clearly, the serial log should not contain an SRLUpdateIndex event for a deleted index--that's the core issue here--but should recovery have skipped the SRLUpdateIndex record anyway? Would that make recovery more robust?
[21 Jan 2009 21:17]
Christopher Powers
I can provide the corrupted database files upon request (128mb tarball). This is can be easily debugged in Windows or Linux: Simply copy the files into ./mysql-test/var/master-data and fire up Falcon. Useful breakpoints (set condition 'indexId == 1288') IndexRootPage.cpp: 952 (no condition) IndexRootPage.cpp: 929 ('indexId == 1288') indexMerge() SRLCreateIndex.cpp: 62 ('indexId == 1288') indexMerge() SRLCreateIndex.cpp: 45 ('indexId == 1288') read() SRLDeleteIndex.cpp: 69 ('indexId == 1288') append() SRLUpdateIndex.cpp: 149 ('indexId == 1288') read() SRLUpdateIndex.cpp: 194 ('indexId == 1288') read() SRLDeleteIndex.cpp: 79 ('indexId == 1288') execute()
[21 Jan 2009 22:07]
Christopher Powers
For Falcon recovery enthusiasts, I've uploaded the corrupted database (118mb): bug_39678.tar
[21 Jan 2009 22:13]
Ann Harrison
In theory, the first phase of recovery should have marked that incarnation of the index as having been deleted. Operations on dead incarnations of objects (tablespaces, tables, indexes) should be skipped.
[22 Jan 2009 7:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/63790 2969 Christopher Powers 2009-01-22 Bug #39678 Assertion (bdb) fails in IndexRootPage::indexMerge during recovery Improved synchronization between concurrent DROP INDEX and INSERT/UPDATE operations on the same table.
[22 Jan 2009 18:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/63827 2970 Christopher Powers 2009-01-22 [merge] Bug #39678 Assertion (bdb) fails in IndexRootPage::indexMerge during recovery Fixed bug in SRLUpdateIndex.
[22 Jan 2009 18:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/63834 2970 Christopher Powers 2009-01-22 Bug #39678 Assertion (bdb) fails in IndexRootPage::indexMerge during recovery Improved synchronization between concurrent DROP INDEX and INSERT/UPDATE operations on the same table.
[13 Feb 2009 7:24]
Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:christopher.powers@sun.com-20090122184002-co7njgsqxlu3zqd4) (merge vers: 6.0.10-alpha) (pib:6)
[15 May 2009 12:45]
MC Brown
An entry has been added to the 6.0.10 changelog: When performing concurrent DROP INDEX and INSERT or UPDATE operations on a Falcon table, an assertion could occur when recovering from a crashed instance.