Bug #45970 Corrupt RecordLocatorPage structure reported during recovery
Submitted: 6 Jul 2009 13:09 Modified: 26 May 2010 17:48
Reporter: Olav Sandstå Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.12-alpha OS:Linux
Assigned to: Olav Sandstå CPU Architecture:Any
Tags: F_RECOVERY
Triage: Triaged: D2 (Serious)

[6 Jul 2009 13:09] Olav Sandstå
Description:
During the recovery after running the Random Query Genertor test falcon_ddl Falcon crashes after having reported that the RecordLocatorPage is corrupted:

# 16:08:03 Recovery phase 3...
# 16:08:04 Corrupt RecordLocatorPage space management structure
# 16:08:04 0.  page 0, line 0, space -15
# 16:08:04 1.  page 0, line 0, space 0
# 16:08:04 2.  page 0, line 0, space 0
# 16:08:04 3.  page 0, line 0, space 0
# 16:08:04 4.  page 0, line 0, space 0
# 16:08:04 5.  page 0, line 0, space 0
# 16:08:04 6.  page 0, line 0, space 0
# 16:08:04 7.  page 0, line 0, space 0
# 16:08:04 8.  page 0, line 0, space 0
# 16:08:04 9.  page 0, line 0, space 0
# 16:08:04 10.  page 0, line 0, space 0
# 16:08:04 11.  page 0, line 0, space 0
# 16:08:04 12.  page 0, line 0, space 0
# 16:08:04 13.  page 0, line 0, space 0
# 16:08:04 14.  page 0, line 0, space 0
# 16:08:04 15.  page 0, line 0, space 3454
# 16:08:04 [Falcon] Error: assertion (false) failed at line 274 in file RecordLocatorPage.cpp

The call stack from the crash contains the following:

mysqld(Error::debugBreak()+0x12) [0x856a618]
mysqld(Error::error(char const*, ...)+0x7a) [0x856a694]
mysqld(Error::assertionFailed(char const*, char const*, int)+0x27) [0x856a739]
mysqld(RecordLocatorPage::corrupt()+0x39) [0x866a5d3]
Log message 'mysqld(RecordLocatorPage::corrupt()+0x39) [0x866a5d3]' indicates database corruption
mysqld(RecordLocatorPage::deleteLine(int, int)+0x72) [0x866ae70]
mysqld(Section::updateRecord(int, Stream*, unsigned int, bool)+0x4a0) [0x8626a2a]
mysqld(Dbb::updateRecord(int, int, Stream*, unsigned int, bool)+0x49) [0x85cabf1]
mysqld(SRLUpdateRecords::redo()+0x319) [0x861e405]
mysqld(SerialLog::recover()+0xd79) [0x862ee51]
mysqld(Database::openDatabase(char const*)+0x341) [0x85c0f6f]
mysqld(Connection::getDatabase(char const*, char const*, Threads*)+0x149) [0x85b3e6d]
mysqld(Connection::openDatabase(char const*, char const*, char const*, char const*, char const*, Threads*)+0x198) [0x85b538a]
mysqld(StorageDatabase::getOpenConnection()+0x86) [0x8578d02]
mysqld(StorageHandler::initialize()+0x9f) [0x857b3e9]
mysqld(StorageInterface::falcon_init(void*)+0x25f) [0x8561a1d]
mysqld(ha_initialize_handlerton(st_plugin_int*)+0xa3) [0x840ce3f]
mysqld [0x84c4afe]
mysqld(plugin_init(int*, char**, int)+0x695) [0x84c6065]
mysqld [0x82c5291]
mysqld(main+0x1de) [0x82c624a]

How to repeat:
Seen once when running RQG test falcon_ddl
[31 Jul 2009 12:03] Olav Sandstå
This bug is related to Bug#44744 and Bug#45297. Both of these have been fixed earlier. 

Also in this crash scenario redo is done on an UpdateRecords log record where the record exists in the RecordLocator page but not in the data page (most likely due to the RecordLocator page being written to disk before the crash but not the data page). The fix for Bug#44744 and Bug#45297 handled this situation by "pretending" that the new record version did not fit in the data page resulting in that that the record being inserted in a new position.

The difference in this case is that the UpdateRecords do not have any data (the stream parameter to Section::updateRecord is NULL - which I assume means that this is a result of a record delete?). So instead of calling DataPage::updateRecord() followed by DataPage::deleteLine(), only DataPage::deleteLine() is called in this scenario (see Section::updateRecord()).

The fix for Bug#44744/Bug#45297 handles this in DataPage::deleteLine() by just "ignoring the delete" (since the record does not exists on the page) - and returns "0" (meaning no more space on the page). This return value was not used in the scenario for Bug#44744/Bug#45297 but it is used in this scenario as a argument to RecordLocatorPage::deleteLine(..... spaceAvailable=0);

What causes this to corrupt the record locator page it the following code in RecordLocatorPage::deleteLine() which is run if spaceAvailable is 0:

	if (spaceAvailable == 0 && pageNumber)

		for (int n = 0; n < maxLine; ++n)
			//ASSERT (elements [n].page != pageNumber);
			if (elements [n].page == pageNumber)
				{
				elements [n].page = 0;
				elements [n].line = 0;
				}

So to ensure this is called we need to avoid that RecordLocatorPage::deleteLine() is not wrongly called with spaceAvailable=0. To fix this, the fix made for Bug#44744/Bug#45297 must be changed so that the correct amount of available space in the page is returned when calling DataPage::deleteLine():

	/* During recovery we might be asked to delete a record that does
	   not exists on the page. This situation can happen when the
	   Record Locator page was written to disk while this data
	   page was not written to disk before the server crashed. */

	if (line >= maxLine)
		{
		ASSERT(dbb->serialLog->recovering);
		return 0;
		}
[31 Jul 2009 12:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79763

2763 Olav Sandstaa	2009-07-31
      Patch for Bug#45970 Corrupt RecordLocatorPage structure reported during recovery
      
      This crash happened when processing a UpdateRecords log record and a record to be deleted did not
      exist in the data page. This was handled correctly in DataPage::deleteLine() after the fix for
      Bug#44744/Bug#45297 but the return value from DataPage::deleteLine was wrong. 0 (page full) was returned
      instead of the actual amount of available space. This triggered all entries for this page in the record 
      locator page to be "cleaned up" leading to a corrupted record locator page.
      
      This patch fixes this problem by having DataPage::deleteLine() return the correct amount of available space
      in the data page.
     @ storage/falcon/DataPage.cpp
        Return the correct amount of free space in the data page also in the case where the record to be 
        deleted did not exits in the data page.