Bug #44292 falcon_bug_34351_C fails with assertion (false) in file RecordLocatorPage.cpp
Submitted: 15 Apr 2009 13:00 Modified: 26 May 2010 17:47
Reporter: Olav Sandstå Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.11-alpha OS:Any
Assigned to: Olav Sandstå
Tags: F_PAGE IO
Triage: Triaged: D1 (Critical)

[15 Apr 2009 13:00] Olav Sandstå
Description:
When running the falcon_bug_34351_C test using the latest version from mysql-6.0-falcon-team it failed with the following error:

[Falcon] Error: assertion (false) failed at line 258 in file RecordLocatorPage.cpp

Call stack:

 ---- called from signal handler with signal 6 (SIGABRT) ------
  [8] __lwp_kill(0xa, 0x6, 0xffffffffbce6f780, 0x0, 0xffffffffffffffff, 0x10000), at 0xfffffd7fff14318a 
  [9] _thr_kill(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff13e1b3 
  [10] raise(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff0ecd79 
  [11] Error::debugBreak(), line 94 in "Error.cpp"
  [12] Error::error(string = 0xfd196a "assertion (%s) failed at line %d in file %s\n", ...), line 71 in "Error.cpp"
  [13] Error::assertionFailed(text = 0xff7b9e "false", fileName = 0xff7ba4 "RecordLocatorPage.cpp", line = 258), line 78 in "Error.cpp"
  [14] RecordLocatorPage::corrupt(this = 0x203c000), line 258 in "RecordLocatorPage.cpp"
  [15] RecordLocatorPage::linkSpaceSlot(this = 0x203c000, from = 445, to = 446), line 207 in "RecordLocatorPage.cpp"
  [16] RecordLocatorPage::insertSpaceSlot(this = 0x203c000, slot = 445, availableSpace = 4036), line 383 in "RecordLocatorPage.cpp"
  [17] RecordLocatorPage::setIndexSlot(this = 0x203c000, slot = 445, pageNumber = 202, line = 0, availableSpace = 4036), line 365 in "RecordLocatorPage.cpp"
  [18] Section::storeRecord(this = 0x18ace18, recordLocatorPage = 0x203c000, indexPageNumber = 193, index = 0x203ce00, stream = 0xfffffd7ffdda5eb8, transId = 56886U, earlyWrite = false), line 729 in "Section.cpp"
  [19] Section::updateRecord(this = 0x18ace18, recordNumber = 14188, stream = 0xfffffd7ffdda5eb8, transId = 56886U, earlyWrite = false), line 591 in "Section.cpp"
  [20] Dbb::updateRecord(this = 0x17d68f0, sectionId = 0, recordId = 14188, stream = 0xfffffd7ffdda5eb8, transId = 56886U, earlyWrite = false), line 347 in "Dbb.cpp"
  [21] SRLUpdateRecords::commit(this = 0xfffffd7ffdda6af0), line 419 in "SRLUpdateRecords.cpp"
  [22] SerialLogTransaction::commit(this = 0x61ff018), line 92 in "SerialLogTransaction.cpp"
  [23] SerialLogTransaction::doAction(this = 0x61ff018), line 158 in "SerialLogTransaction.cpp"
  [24] Gopher::gopherThread(this = 0x184a590), line 71 in "Gopher.cpp"
  [25] Gopher::gopherThread(arg = 0x184a590), line 37 in "Gopher.cpp"
  [26] Thread::thread(this = 0x18544a8), line 167 in "Thread.cpp"
  [27] Thread::thread(parameter = 0x18544a8), line 146 in "Thread.cpp"
  [28] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff13fd7b 
  [29] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff13ffb0 

How to repeat:
This crash happened after having run falcon_bug_34351_C 420 times.
[15 Apr 2009 14:02] Olav Sandstå
Setting to verified based on having reproduced the crash by running falcon_bug_34351_C 343 times.
[15 Apr 2009 14:31] Kevin Lewis
Olav,  It is always helpful to show the actual code around the place where the failure occurred in case line numbers creep over time.  In Falcon, line numbers change a lot.  This assert is here I think;

void RecordLocatorPage::corrupt(void)
{
	Log::debug("Corrupt RecordLocatorPage space management structure\n");
	printPage();
	ASSERT(false);
}

But the current falcon-team branch has this ASSERT on line 274, not 258.  How old is your codebase?  And were you able to save the results of the printPage?  Any indication of how the page was corrupt?
[15 Apr 2009 16:06] Olav Sandstå
Kevin, thanks for the suggestion about including some lines from the source of where the crash occurred. And, yes, you are right about that the crash occurred in this code:

void RecordLocatorPage::corrupt(void)
{
	Log::debug("Corrupt RecordLocatorPage space management structure\n");
	printPage();
	ASSERT(false);
}

The code version I use is fairly new. It was pulled from mysql-6.0-falcon-team yesterday morning with revision date: 2009-04-14 08:27:56 +0200 (and revision-id: hky@sun.com-20090414062756-38ehtd098ge97s23).

I have no indications about how the page was corrupted (but I still have two core files from the crash). The test was run without specifying any debug options so there is no output from the printPage. But I will re-run the test with the necessary debug flag enabled and see what I get from it.
[15 Apr 2009 16:43] Olav Sandstå
I have re-produced the crash with the falcon debug flag turned on using a updated source version pulled from mysql-6.0-falcon-team about 30 minutes ago. The content from printPage() looks like this:

Corrupt RecordLocatorPage space management structure
0.  page 0, line 0, space -2
1.  page 0, line 0, space 0
2.  page 213, line 0, space 4036
3.  page 0, line 0, space 0
4.  page 0, line 0, space 0
5.  page 0, line 0, space 0
6.  page 0, line 0, space 0
7.  page 0, line 0, space 0
8.  page 0, line 0, space 0
9.  page 0, line 0, space 0
10.  page 0, line 0, space 0
11.  page 0, line 0, space 0
12.  page 0, line 0, space 0
13.  page 0, line 0, space 0
14.  page 0, line 0, space 0
15.  page 0, line 0, space 0
16.  page 0, line 0, space 0
17.  page 0, line 0, space 0
18.  page 0, line 0, space 0
19.  page 0, line 0, space 0
20.  page 0, line 0, space 0
21.  page 0, line 0, space 0
22.  page 0, line 0, space 0
23.  page 0, line 0, space 0
24.  page 0, line 0, space 0
25.  page 0, line 0, space 0
26.  page 0, line 0, space 0
27.  page 0, line 0, space 0
28.  page 0, line 0, space 0
29.  page 0, line 0, space 0
30.  page 0, line 0, space 0
31.  page 0, line 0, space 0
32.  page 0, line 0, space 0
33.  page 0, line 0, space 0
34.  page 0, line 0, space 0
35.  page 0, line 0, space 0
36.  page 0, line 0, space 0
37.  page 0, line 0, space 0
38.  page 0, line 0, space 0
39.  page 0, line 0, space 0
40.  page 211, line 0, space 0
41.  page 0, line 0, space 0
42.  page 0, line 0, space 0
43.  page 0, line 0, space 0
44.  page 0, line 0, space 0
45.  page 0, line 0, space 0
46.  page 0, line 0, space 0
47.  page 0, line 1, space 0
[Falcon] Error: assertion (false) failed at line 274 in file RecordLocatorPage.cpp
[15 Apr 2009 20:57] Olav Sandstå
A possibly related bug with the same assert hitting is Bug #39456 "Falcon: assertion (false) failed at line 258 in file RecordLocatorPage.cpp". This has been fixed some time ago.
[15 Apr 2009 21:11] Kevin Lewis
In a recent fix for Bug#39890,  Jim added a call to   unlinkSpaceSlot(slot);  into   RecordLocatorPage::deleteDataPages.

Olav found this ASSERT while testing that fix.  
Here is the simplified call stack for this bug;

RecordLocatorPage::corrupt
RecordLocatorPage::linkSpaceSlot
RecordLocatorPage::insertSpaceSlot
RecordLocatorPage::setIndexSlot
Section::storeRecord
Section::updateRecord

Maybe there was a connection between the fix for 39890 and this.  Maybe a race condition.
[16 Apr 2009 9:47] Olav Sandstå
The code that determines that the page is "corrupt" is the following test:

void RecordLocatorPage::linkSpaceSlot(int from, int to)
{
	VALIDIF(to == 0 || elements[to].spaceAvailable > 0);

When this is evaluated the "to" parameter and the element has the following values:

(dbx) print to
to = 40
(dbx) print elements[40]
dbx: warning: subscript out of range
elements[40] = {
    page           = 211
    line           = 0
    spaceAvailable = 0
}

Seems like we are inserting into a page where the available space is (suddenly?) 0.
[18 May 2009 9:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/74353

2707 Olav Sandstaa	2009-05-18
      Fix for Bug#44292 falcon_bug_34351_C fails with assertion (false) in file RecordLocatorPage.cpp
      
      This crash was caused by an inconsistency in the RecordLocator
      page. In some situations the maxLine value was not updates when
      inserting new space management slots. This could cause later updates
      to the record locator page to introduce inconsistencies in the record
      index.
      
      This patch changes RecordLocatorPage::setIndexSlot() to handle
      situations where the slot to be inserted has a slot number larger than
      the maxLine value. Before this patch if a slot number that was larger
      than the current maxLine value was inserted, the maxLine value was not
      updated accordingly. The fix is to update the maxLine value based on
      the inserted slot number.
     @ storage/falcon/RecordLocatorPage.cpp
        Fix for inconsistency in RecordLocatorPage. RecordLocatorPage::setIndexSlot() is changed so that it handles situations where the slot to be inserted on the page has a slot number that is
        larger than the maxLine value.