Bug #33608 | Falcon crash and corruption on blob DML and restart | ||
---|---|---|---|
Submitted: | 1 Jan 2008 17:53 | Modified: | 5 May 2008 18:27 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0.4-p2 | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
[1 Jan 2008 17:53]
Philip Stoev
[1 Jan 2008 17:54]
Philip Stoev
File for reproducing bug #33608, please edit top of file to match your setup.
Attachment: bug33608.pl (application/octet-stream, text), 2.16 KiB.
[1 Jan 2008 17:57]
Philip Stoev
ZIP of bug #33608 tablespace causing crash on future queries.
Attachment: bug33608.zip (application/x-zip-compressed, text), 32.12 KiB.
[1 Jan 2008 18:06]
Philip Stoev
In fact, any query issued against the corrupted tablespace will cause a crash. It is not know which query caused the corruption itself. Please note that the specific thing about this test is that the server gets restarted throughout the test, and that transactions containing blob updates which change the size of the blob are involved. Only one connection thread is involved. The problem does not occur without the periodic restarts. Please let me know if you want me to isolate the issue further.
[2 Jan 2008 14:01]
Philip Stoev
The bug33608-2.zip file was produced using the same script, however instead of a crash, the server starts to report that the blobtest.blobtest table does not exist, even though SHOW TABLES will display that it is there.
[2 Jan 2008 14:02]
Philip Stoev
Corrupted tablespace produced by the Perl script from bug #33608
Attachment: bug33608-2.zip (application/x-zip-compressed, text), 29.62 KiB.
[7 Jan 2008 14:28]
Kevin Lewis
Jim Starkey wrote; I'm having next to no luck with 33608. The test is driven by a random number generator. Sometimes it inserts, updates, or deletes blobs and sometimes it restarts the server. The files in question show that the server was restarted twice. The symptom of the corruption is a record locator page with a dangling data page. Normally, we will recreate the record locator page unless we know for sure it has been written. In this case, however, the last known write was before the most recent restart. At the moment, however, we don't carry that information in the serial log, though we have provision to add it. I think this is a bona fide bug, but unless we expect our beta users to start and the stop the server in a loop commingled with update operations, I think we should consider postponing a fix until after beta. I'm quite sure that if the server had stayed up long enough for an I/O thread to finish writing recovered pages we probably wouldn't have seen this (how's that for hedging...).
[10 Jan 2008 20:30]
Jeffrey Pugh
Re Kevin: Double recovery issue; high Risk to fix in 6.0.4
[23 Jan 2008 21:39]
Kevin Lewis
Jim Starkey pushed a change to Falcon that requires it ro wait for a Page Cache Flush at the end of recovery. It used to do this, but when the pool of IO threads were added, the recovery process just called for a flush to happen and then returned. Now the flush waits for all the IO threads to complete. When recovery does not wait for the flush to complete, new blocks can be written to the serial log before all the pages from the previous recovery can be written to disk. These blocks indicate that recovery should start after the previous one. If a shutdown happens before these pages are fully written, The following recovery will not start early enough. This very likely solves the double recovery problems exhibited by Bugs 32992, 33608 & 33665. I am putting these bugs into Patch Pending so that this fix can be verified, or not.
[25 Jan 2008 23:07]
Kevin Lewis
Explanation of the double recovery problem by Ann Harrison; ============================== We have been plagued recently with failures in recovery after a previous recovery had not succeeded. That turns out to have been the result of a change to the page cache checkpoint algorithm. The fix was simple, once the cause was understood. I'm going to try to explain the interaction of recovery and checkpointing. When Falcon starts, it determines whether it was shutdown cleanly. If not, it invokes recovery to apply the remaining active parts of the serial log. Recovery begins by finding the position in the log after which changes may need to be applied. That position is called the recovery start point. Every entry in the serial log includes the recovery start point that was current when the entry was written. If the shutdown was not clean, Falcon begins recovery by identifying the active log, reading the last entry, then backing up to its recovery start point. Recovery consists of applying changes in the serial log to page images in the page cache, until they are all done, then flushing the cache by forcing a checkpoint. The test systems often shut down the server before Falcon has moved all changes from the serial log to the main database. After some errors, the test system restarts the server again, causing Falcon to start recovery, then shutdown again. For the last several weeks, recovery has failed on the second restart. One odd characteristic Jim noticed was that if he restarted from an earlier stable point in the serial log, recovery succeeded. Originally, a cache checkpoint consisted of a single I/O thread writing all the dirty pages from the cache, using buffered I/O without an fsync. That was pretty fast, but not durable after an ill-timed crash. Adding an fsync caused a checkpoint to take longer than the interval between checkpoints, which was a performance problem. A checkpoint now causes dirty pages in the cache to be consolidated in a separate area so contiguous pages can be flushed in a single write, then a bevy of I/O threads push the page clusters to disk using direct I/O or buffered I/O with a sync on each write (O_SYNC) depending on the capabilities of the system. The original page cache checkpoint got all pages into the system file cache before it continued. The new system is asynchronous. The problem was that the recovery system declared the database recovered before the dirty pages in the page cache were actually on disk. Other processing began, including writes to the serial log with a new recovery start point. If the server stopped at that point, the next attempt to recover would fail because the first recovery did not get fully applied to the database. The solution is for recovery to wait until the I/O threads are complete.
[11 Feb 2008 20:45]
Kevin Lewis
Patch is in mysql-6.0-falcon-team and mysql-6.0-release version 6.0.4
[5 May 2008 18:27]
Paul DuBois
Noted in 6.0.4 changelog. Incomplete Falcon recovery after server restarts eventually resulted in tablespace corruption.