Bug #45297 | Crash during recovery due to pread reading beyond EOF | ||
---|---|---|---|
Submitted: | 3 Jun 2009 10:57 | Modified: | 26 May 2010 17:47 |
Reporter: | Olav Sandstå | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | 6.0.12-alpha | OS: | Linux |
Assigned to: | Olav Sandstå | CPU Architecture: | Any |
Tags: | F_RECOVERY |
[3 Jun 2009 10:57]
Olav Sandstå
[3 Jun 2009 11:03]
Olav Sandstå
Parts of the call stack is similar to the call stack in Bug#44744. Setting it to Analyzing while checking out if this is caused by the same problem as in Bug#44744.
[8 Jun 2009 13:40]
Olav Sandstå
Having re-run the recovery and looked at the core file I find that this crash is very similar to Bug#44744. What happens is that: 1. A call to DataPage::updateRecord() is done with a lineNumber that is equal to maxLine (same as in Bug#44744). 2. When checking the value of the LineIndex entry we find that this represent an overflow page (as in Bug#44744). 3. When reading out the page number for the overflow page this has a "wildly wrong value" (same as Bug#44744) representing a page that long way past the end of the file. Setting the status to verified. Most likely this is a duplicate of Bug44744 but keep it this open until Bug44744 is fixed.
[12 Jun 2009 13:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76184 2727 Olav Sandstaa 2009-06-12 In Bug#44744 and Bug#45297 a crash occures during recovery. In both these cases the crash occurs when a record in a data page is attempted updated and the record is not already present in the page. This patch does not solve this problem but will make the recovery fail when the invalid update to the data page occurs instead of failing later when the invalid data is used. Common for both recovery crashes is that DataPage::updateRecord() is called with a lineNumber that is equal to the page's maxLine. This is not a correct use of this function as DataPage::updateRecord should only be called for already existing records (ie. lineNumber should be less than maxLine). Through repeatedly running recovery I see several cases where this happens but does not lead to a crash but instead likely leads to an inconsistency. To avoid this inconsistency and to make the recovery fail immediately instead of when the invalid data is used this patch adds an ASSERT that verify that the lineNumber is valid.
[12 Jun 2009 13:51]
Kevin Lewis
The assert is approved to help isolate this problem and protect from related problems.
[22 Jun 2009 12:28]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76825 2739 Olav Sandstaa 2009-06-22 Fix for Bug#44744 "Recovery stage 3: segfault in RecoveryObjects::findRecoveryObject" and Bug#45297 "Crash during recovery due to pread reading beyond EOF" These crashes was both caused by an inconsistency between the Record Locator Page and the Data page. The record locator page claimed that the record was stored in a data page while the record was not present in the data page. This situation can happen during recovery if the Record Locator page has been written to disk while the data page has not been written to disk when a server crash occurs. The recovery crash happened when the code had locate the record by a lookup in the Record Locator page and then called DataPage::updateRecord() where the record was not present. In the crash situation the record expected to be located on the same line number as the maxLine of the data page. This lead to reading "garbage" from the data page and resulted in an invalid overflow page number. This patch fixes this problem by handling that DataPage::updateRecord and DataPage::deleteLine handles to be called also when the record is not present in the data page. In this situation both the functions will just ignore the call. DataPage::updateRecord() will return that there was not room for the record. This will result in the inconsistency between the data page and the Record Locator page being resolved since the "failing" call to DataPage::updateRecord() will trigger removal of the record's entry in the Record Locator page - and the record will be inserted somewhere else. @ storage/falcon/DataPage.cpp During recovery: extend DataPage::updateRecord() and DataPage::deleteLine() so that they handle to be called with a line number that is equal or larger than the current maxLine. If this is the case the call will just be ignored.
[22 Jun 2009 13:36]
Kevin Lewis
Ann Harrison wrote; The design of recovery is that changes made in phase 2 must be on their original pages because they are referenced elsewhere by page number. Changes made in phase 3 (including storing records) should indifferent to the page they go on, as long as the page was free - or allocated for the purpose for which it is reused. So, I think 2B is OK, and probably follows the original design. There's no requirement that record end up on their original page, just that they still have the same record number