Bug #44096 | Exception: Recovery failed: corrupted serial log | ||
---|---|---|---|
Submitted: | 5 Apr 2009 13:10 | Modified: | 15 May 2009 14:49 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0-falcon-team | OS: | Any |
Assigned to: | Ann Harrison | CPU Architecture: | Any |
Tags: | F_RECOVERY |
[5 Apr 2009 13:10]
Philip Stoev
[5 Apr 2009 13:19]
Philip Stoev
Tablespace before recovery: http://mysql-systemqa.s3.amazonaws.com/bug44096.zip
[5 Apr 2009 15:31]
Vladislav Vaintroub
changed severity to non-critical based on "all non-default" parameter values and extremely hard to reproduce scenario
[8 Apr 2009 15:53]
Ann Harrison
Underscoring Vlad's comment. The page cache size is set to 1K, which is 1/16th of the page size. Falcon will round this to a whole page, but a page cache that holds only one page is really small.
[8 Apr 2009 16:05]
Philip Stoev
Falcon adjusts the page cache size to 2Mb: 090408 19:04:14 [Warning] option 'falcon-page-cache-size': unsigned value 1024 adjusted to 2097152 so the page cache size is actually fairly reasonable.
[9 Apr 2009 20:10]
Ann Harrison
Had nothing to do with the cache size - the crash happened after an old log file had been truncated. Recovery was entirely from the new log file, but still checked that the two files were contiguous, which they weren't because the old one had been truncated.
[10 Apr 2009 9:04]
Philip Stoev
I am afraid that before we close this bug I would need that non-record serial log operations are also tagged with Vlad's crash hooks (http://lists.mysql.com/commits/68748). This way a crash can be produced after the operation in question. Such operations would be anything that operates on the log, including: * writing any headers, footers, EOF records, checksums, etc. for the serial log; * writing any headers, footers, EOF records, checksums, etc. for the individual serial log windows; * truncating the log * rotating from one log to the other (if this involves a disk write)
[13 Apr 2009 16:21]
Kevin Lewis
=== modified file 'storage/falcon/SerialLog.cpp' --- storage/falcon/SerialLog.cpp 2009-04-09 20:04:02 +0000 +++ storage/falcon/SerialLog.cpp 2009-04-10 15:52:23 +0000 @@ -274,6 +274,10 @@ } // Pick a window to start the search for the last block + // If there are two files, we want to find the last block + // in the second file. Given the way the windows were + // allocated, the fl2 window is linked in before the file1 + // window. If they're in the other order swap them around. SerialLogWindow *recoveryWindow = NULL; SerialLogWindow *otherWindow = NULL; @@ -310,38 +314,63 @@ return; } - // Look through windows looking for the very last block + // recoveryWindow is the first window in the second file. + // Look through windows looking for the very last block, + // starting by remember the first block in the recoverWindow + // as recoveryBlock and it's number as recoveryBlockNumber. + // These are used only to determine whether recovery is all + // from one file, or whether it needes to start in the earlier + // file. SerialLogBlock *recoveryBlock = recoveryWindow->firstBlock(); recoveryBlockNumber = recoveryBlock->blockNumber; SerialLogBlock *lastBlock = findLastBlock(recoveryWindow); - Log::log("\nFirst recovery block is " I64FORMAT "\n", + + // The last block's "readBlock" number is the first block where we + // actually perform recovery - it the first block that has something + // to do with a transaction that's not write complete. + + // Debugging hint. If there are unidentified objects in recovery - + // unwritten pages or something, try starting recovery at the first + // block in the first file - if and only if the two are contigous. + + uint64 readBlockNumber = lastBlock->readBlockNumber; + + if (readBlockNumber == 0) + readBlockNumber = lastBlock->blockNumber; + + Log::log("\nFirst block in the serial log is " I64FORMAT "\n", (otherWindow) ? otherWindow->firstBlock()->blockNumber : recoveryBlockNumber); - Log::log("Last recovery block is " I64FORMAT "\n", lastBlock->blockNumber); + Log::log("Last block in the serial log is " I64FORMAT "\n", lastBlock->blockNumber); + Log::log("First block used in recovery is " I64FORMAT "\n", readBlockNumber); + + // the nextBlockNumber is the first block number to use after + // recovery starts + + nextBlockNumber = lastBlock->blockNumber + 1; + + // If we're using both files, they'd better be contiguous. Check and + // when done, release any window allocated in the process. A side + // effect of the check is that it sets up the first window's last + // block number - that's used later. if (otherWindow) { SerialLogBlock *block = findLastBlock(otherWindow); - // If we have to use both files, be sure they're contiguous - - if (block && lastBlock->readBlockNumber < recoveryBlock->blockNumber) - if (block->blockNumber != (recoveryBlockNumber - 1)) + if (block && block->blockNumber != (recoveryBlockNumber - 1)) + if (readBlockNumber < recoveryBlock->blockNumber) throw SQLError(LOG_ERROR, "corrupted serial log"); SerialLogWindow *window = findWindowGivenBlock(block->blockNumber); window->deactivateWindow(); } - // Find read block - - uint64 readBlockNumber = lastBlock->readBlockNumber; - - if (readBlockNumber == 0) - readBlockNumber = lastBlock->blockNumber; - - Log::log("Recovery read block is " I64FORMAT "\n", readBlockNumber); - nextBlockNumber = lastBlock->blockNumber + 1; + // OK. Everything looks good. Release the window we used + // probing the log files, open a window on the first block + // we're going to use for recovery. That should work, but + // if it doesn't, complain. + lastWindow->deactivateWindow(); SerialLogWindow *window = findWindowGivenBlock(readBlockNumber); @@ -356,12 +385,19 @@ writeBlock = NULL; control.setWindow(window, block, 0); SerialLogRecord *record; + + // OK, we're good to go. Start recovering. + pass1 = true; recoveryPages = new RecoveryObjects(this); recoverySections = new RecoveryObjects(this); recoveryIndexes = new RecoveryObjects(this); - recoveryPhase = 1; // Take Inventory (serialLogTransactions, recoveryObject states, last checkpoint) + recoveryPhase = 1; + // Phase 1 - read from the start to end of the part of the + // log that's involved in recovery. Take Inventory of serialLogTransactions, + // recoveryObject states, last checkpoint) + Log::log("Recovery phase 1...\n"); unsigned long int recordCount = 0;
[15 Apr 2009 16:59]
Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090415164923-9arx29ye5pzxd4zf) (version source revid:hky@sun.com-20090414213212-ibcjs12ic1v2h0e5) (merge vers: 6.0.11-alpha) (pib:6)
[15 Apr 2009 22:12]
Kevin Lewis
The key change here, besides the very helpful comments, is + if (readBlockNumber < recoveryBlock->blockNumber) This prevents the SQLError from being thrown when the old serial log file has been truncated.
[15 May 2009 14:49]
MC Brown
Internal/test fix. No changelog entry required.