Bug #44292 falcon_bug_34351_C fails with assertion (false) in file RecordLocatorPage.cpp
Submitted: 15 Apr 2009 13:00 Modified: 26 May 2010 17:47
Reporter: Olav Sandstaa Email Updates:
Status: Unsupported
Category:Server: Falcon Severity:S2 (Serious)
Version:6.0.11-alpha OS:Any
Assigned to: Olav Sandstaa Target Version:
Tags: F_PAGE IO
Triage: Triaged: D1 (Critical)

[15 Apr 2009 13:00] Olav Sandstaa
Description:
When running the falcon_bug_34351_C test using the latest version from
mysql-6.0-falcon-team it failed with the following error:

[Falcon] Error: assertion (false) failed at line 258 in file RecordLocatorPage.cpp

Call stack:

 ---- called from signal handler with signal 6 (SIGABRT) ------
  [8] __lwp_kill(0xa, 0x6, 0xffffffffbce6f780, 0x0, 0xffffffffffffffff, 0x10000), at
0xfffffd7fff14318a 
  [9] _thr_kill(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff13e1b3 
  [10] raise(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff0ecd79 
  [11] Error::debugBreak(), line 94 in "Error.cpp"
  [12] Error::error(string = 0xfd196a "assertion (%s) failed at line %d in file %s\n",
...), line 71 in "Error.cpp"
  [13] Error::assertionFailed(text = 0xff7b9e "false", fileName = 0xff7ba4
"RecordLocatorPage.cpp", line = 258), line 78 in "Error.cpp"
  [14] RecordLocatorPage::corrupt(this = 0x203c000), line 258 in "RecordLocatorPage.cpp"
  [15] RecordLocatorPage::linkSpaceSlot(this = 0x203c000, from = 445, to = 446), line 207
in "RecordLocatorPage.cpp"
  [16] RecordLocatorPage::insertSpaceSlot(this = 0x203c000, slot = 445, availableSpace =
4036), line 383 in "RecordLocatorPage.cpp"
  [17] RecordLocatorPage::setIndexSlot(this = 0x203c000, slot = 445, pageNumber = 202,
line = 0, availableSpace = 4036), line 365 in "RecordLocatorPage.cpp"
  [18] Section::storeRecord(this = 0x18ace18, recordLocatorPage = 0x203c000,
indexPageNumber = 193, index = 0x203ce00, stream = 0xfffffd7ffdda5eb8, transId = 56886U,
earlyWrite = false), line 729 in "Section.cpp"
  [19] Section::updateRecord(this = 0x18ace18, recordNumber = 14188, stream =
0xfffffd7ffdda5eb8, transId = 56886U, earlyWrite = false), line 591 in "Section.cpp"
  [20] Dbb::updateRecord(this = 0x17d68f0, sectionId = 0, recordId = 14188, stream =
0xfffffd7ffdda5eb8, transId = 56886U, earlyWrite = false), line 347 in "Dbb.cpp"
  [21] SRLUpdateRecords::commit(this = 0xfffffd7ffdda6af0), line 419 in
"SRLUpdateRecords.cpp"
  [22] SerialLogTransaction::commit(this = 0x61ff018), line 92 in
"SerialLogTransaction.cpp"
  [23] SerialLogTransaction::doAction(this = 0x61ff018), line 158 in
"SerialLogTransaction.cpp"
  [24] Gopher::gopherThread(this = 0x184a590), line 71 in "Gopher.cpp"
  [25] Gopher::gopherThread(arg = 0x184a590), line 37 in "Gopher.cpp"
  [26] Thread::thread(this = 0x18544a8), line 167 in "Thread.cpp"
  [27] Thread::thread(parameter = 0x18544a8), line 146 in "Thread.cpp"
  [28] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff13fd7b 
  [29] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff13ffb0 

How to repeat:
This crash happened after having run falcon_bug_34351_C 420 times.
[15 Apr 2009 14:02] Olav Sandstaa
Setting to verified based on having reproduced the crash by running falcon_bug_34351_C 343
times.
[15 Apr 2009 14:31] Kevin Lewis
Olav,  It is always helpful to show the actual code around the place where the failure
occurred in case line numbers creep over time.  In Falcon, line numbers change a lot. 
This assert is here I think;

void RecordLocatorPage::corrupt(void)
{
	Log::debug("Corrupt RecordLocatorPage space management structure\n");
	printPage();
	ASSERT(false);
}

But the current falcon-team branch has this ASSERT on line 274, not 258.  How old is your
codebase?  And were you able to save the results of the printPage?  Any indication of how
the page was corrupt?
[15 Apr 2009 16:06] Olav Sandstaa
Kevin, thanks for the suggestion about including some lines from the source of where the
crash occurred. And, yes, you are right about that the crash occurred in this code:

void RecordLocatorPage::corrupt(void)
{
	Log::debug("Corrupt RecordLocatorPage space management structure\n");
	printPage();
	ASSERT(false);
}

The code version I use is fairly new. It was pulled from mysql-6.0-falcon-team yesterday
morning with revision date: 2009-04-14 08:27:56 +0200 (and revision-id:
hky@sun.com-20090414062756-38ehtd098ge97s23).

I have no indications about how the page was corrupted (but I still have two core files
from the crash). The test was run without specifying any debug options so there is no
output from the printPage. But I will re-run the test with the necessary debug flag
enabled and see what I get from it.
[15 Apr 2009 16:43] Olav Sandstaa
I have re-produced the crash with the falcon debug flag turned on using a updated source
version pulled from mysql-6.0-falcon-team about 30 minutes ago. The content from
printPage() looks like this:

Corrupt RecordLocatorPage space management structure
0.  page 0, line 0, space -2
1.  page 0, line 0, space 0
2.  page 213, line 0, space 4036
3.  page 0, line 0, space 0
4.  page 0, line 0, space 0
5.  page 0, line 0, space 0
6.  page 0, line 0, space 0
7.  page 0, line 0, space 0
8.  page 0, line 0, space 0
9.  page 0, line 0, space 0
10.  page 0, line 0, space 0
11.  page 0, line 0, space 0
12.  page 0, line 0, space 0
13.  page 0, line 0, space 0
14.  page 0, line 0, space 0
15.  page 0, line 0, space 0
16.  page 0, line 0, space 0
17.  page 0, line 0, space 0
18.  page 0, line 0, space 0
19.  page 0, line 0, space 0
20.  page 0, line 0, space 0
21.  page 0, line 0, space 0
22.  page 0, line 0, space 0
23.  page 0, line 0, space 0
24.  page 0, line 0, space 0
25.  page 0, line 0, space 0
26.  page 0, line 0, space 0
27.  page 0, line 0, space 0
28.  page 0, line 0, space 0
29.  page 0, line 0, space 0
30.  page 0, line 0, space 0
31.  page 0, line 0, space 0
32.  page 0, line 0, space 0
33.  page 0, line 0, space 0
34.  page 0, line 0, space 0
35.  page 0, line 0, space 0
36.  page 0, line 0, space 0
37.  page 0, line 0, space 0
38.  page 0, line 0, space 0
39.  page 0, line 0, space 0
40.  page 211, line 0, space 0
41.  page 0, line 0, space 0
42.  page 0, line 0, space 0
43.  page 0, line 0, space 0
44.  page 0, line 0, space 0
45.  page 0, line 0, space 0
46.  page 0, line 0, space 0
47.  page 0, line 1, space 0
[Falcon] Error: assertion (false) failed at line 274 in file RecordLocatorPage.cpp
[15 Apr 2009 20:57] Olav Sandstaa
A possibly related bug with the same assert hitting is Bug #39456 "Falcon: assertion
(false) failed at line 258 in file RecordLocatorPage.cpp". This has been fixed some time
ago.
[15 Apr 2009 21:11] Kevin Lewis
In a recent fix for Bug#39890,  Jim added a call to   unlinkSpaceSlot(slot);  into  
RecordLocatorPage::deleteDataPages.

Olav found this ASSERT while testing that fix.  
Here is the simplified call stack for this bug;

RecordLocatorPage::corrupt
RecordLocatorPage::linkSpaceSlot
RecordLocatorPage::insertSpaceSlot
RecordLocatorPage::setIndexSlot
Section::storeRecord
Section::updateRecord

Maybe there was a connection between the fix for 39890 and this.  Maybe a race condition.
[16 Apr 2009 9:47] Olav Sandstaa
The code that determines that the page is "corrupt" is the following test:

void RecordLocatorPage::linkSpaceSlot(int from, int to)
{
	VALIDIF(to == 0 || elements[to].spaceAvailable > 0);

When this is evaluated the "to" parameter and the element has the following values:

(dbx) print to
to = 40
(dbx) print elements[40]
dbx: warning: subscript out of range
elements[40] = {
    page           = 211
    line           = 0
    spaceAvailable = 0
}

Seems like we are inserting into a page where the available space is (suddenly?) 0.
[18 May 2009 9:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/74353

2707 Olav Sandstaa	2009-05-18
      Fix for Bug#44292 falcon_bug_34351_C fails with assertion (false) in file
RecordLocatorPage.cpp
      
      This crash was caused by an inconsistency in the RecordLocator
      page. In some situations the maxLine value was not updates when
      inserting new space management slots. This could cause later updates
      to the record locator page to introduce inconsistencies in the record
      index.
      
      This patch changes RecordLocatorPage::setIndexSlot() to handle
      situations where the slot to be inserted has a slot number larger than
      the maxLine value. Before this patch if a slot number that was larger
      than the current maxLine value was inserted, the maxLine value was not
      updated accordingly. The fix is to update the maxLine value based on
      the inserted slot number.
     @ storage/falcon/RecordLocatorPage.cpp
        Fix for inconsistency in RecordLocatorPage. RecordLocatorPage::setIndexSlot() is
changed so that it handles situations where the slot to be inserted on the page has a
slot number that is
        larger than the maxLine value.