MySQL Bugs: #46083: falcon_pagesize2K fails in IndexRootPage::addIndexEntry() during recovery

Bug #46083	falcon_pagesize2K fails in IndexRootPage::addIndexEntry() during recovery
Submitted:	9 Jul 2009 13:23	Modified:	22 Oct 2009 10:35
Reporter:	Olav Sandstå	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S2 (Serious)
Version:	6.0.12-alpha	OS:	Linux
Assigned to:	Lars-Erik BjÃ¸rk	CPU Architecture:	Any
Tags:	F_INDEX, recovery

Description:
When running the falcon_pagesize2K test Falcon crashes during recovery. The crash occures in IndexRootPage::addIndexEntry() with the following code:

	ASSERT (false);
	FATAL ("index split failed");

The call stack when failing is:

  [10] raise(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff0ecd79 
  [11] Error::debugBreak(), line 94 in "Error.cpp"
  [12] Error::error(string = 0xfe11ba "assertion (%s) failed at line %d in file %s\n", ...), line 71 in "Error.cpp"
  [13] Error::assertionFailed(text = 0xff11d8 "false", fileName = 0xff11de "IndexRootPage.cpp", line = 212), line 78 in "Error.cpp"
  [14] IndexRootPage::addIndexEntry(dbb = 0x2a5b518, indexId = 63, key = 0xfffffd7fffdfb000, recordNumber = 2555, transId = 0), line 212 in "IndexRootPage.cpp"
  [15] IndexRootPage::indexMerge(dbb = 0x2a5b518, indexId = 63, logRecord = 0xfffffd7fffdfe428, transId = 0), line 884 in "IndexRootPage.cpp"
  [16] SRLUpdateIndex::execute(this = 0xfffffd7fffdfe428), line 209 in "SRLUpdateIndex.cpp"
  [17] SRLUpdateIndex::redo(this = 0xfffffd7fffdfe428), line 186 in "SRLUpdateIndex.cpp"
  [18] SerialLog::recover(this = 0x1819338), line 476 in "SerialLog.cpp"
  [19] Database::openDatabase(this = 0x16281d0, filename = 0xfffffd7fffdfea3c "/home/os136802/mysql/develop/bug/bugXXXXX/a1/vardirs/master-data/falcon_master.fts"), line 760 in "Database.cpp"
  [20] Connection::getDatabase(this = 0x185d078, dbName = 0x185cc7c "FALCON_MASTER", dbFileName = 0xfffffd7fffdfea3c "/home/os136802/mysql/develop/bug/bugXXXXX/a1/vardirs/master-data/falcon_master.fts", threads = 0x185cce8), line 1657 in "Connection.cpp"
  [21] Connection::openDatabase(this = 0x185d078, dbName = 0x185cc7c "FALCON_MASTER", filename = 0x185ccb4 "falcon_master.fts", account = 0xfe3877 "mysql", password = 0xfe387d "mysql", address = (nil), parent = 0x185cce8), line 940 in "Connection.cpp"
  [22] StorageDatabase::getOpenConnection(this = 0x185cae8), line 137 in "StorageDatabase.cpp"
  [23] StorageHandler::initialize(this = 0x1627668), line 989 in "StorageHandler.cpp"
  [24] StorageInterface::falcon_init(p = 0x15a92d0), line 268 in "ha_falcon.cpp"
  [25] ha_initialize_handlerton(plugin = 0x1b5aec0), line 448 in "handler.cc"
  [26] plugin_initialize(plugin = 0x1b5aec0), line 1010 in "sql_plugin.cc"
  [27] plugin_init(argc = 0x1299c68, argv = 0x15a8368, flags = 0), line 1226 in "sql_plugin.cc"
  [28] init_server_components(), line 4105 in "mysqld.cc"
  [29] main(argc = 4, argv = 0xfffffd7fffdff528), line 4619 in "mysqld.cc"

How to repeat:
Seen once after running the Random Query Generator test falcon_pagesize2K.

There two other bug reports for this crash, see Bug#37056 and Bug#44542. In these two bug reports the problem was considered to be caused by multiple threads inserting into the same block. Bug#37056 tried to reduce the effect of this by increasing the number of attempts to do the insert from 10 to 1000.

The reason for filing this as a separate bug report is that in this case the it happened during recovery which should be done more or less by a single thread.

Further, redoing the recovery using the database and log files from the test makes this problem reproducable on every run.

This suggests that this might not be a concurrency issue with multiple threads competing for the same block. Instead it might either be a valid index block that leads to this scenario or we might have an invalid index block.

Lar-Erik is investigating a solution to this {for (int n = 0; n < 1000; ++n)} loop.

The logs and tablespace files should make this very easy to
reproduce and debug - single threaded, consistent.  If I were
Lars Eric, I'd start here...

I have looked at bug#46083 (falcon_pagesize2K fails in IndexRootPage::addIndexEntry() during recovery) today, and I have debugged the relevant parts using the log and tablespace files provided by Olav.

This is what seems to happen:

We try to add a node in IndexRootPage::addIndexEntry. Here is a piece of the code:

<snip>

/* If the node fits on page, we're done */

AddNodeResult result;

for (;;)
   {
   result = page->addNode(dbb, key, recordNumber);
<snip>

The node does not fit on the page, and we are told to split the page at the middle.

In IndexPage::splitIndexPageMiddle, we search for a node to split on. We want to chose a node where the prefix compression is poor.

<snip>
for (; node.node < pageEnd; node.getNext(bucketEnd))
   {
   //int l =
   node.expandKey(key);

   if (node.offset || node.length)
       {
       if (node.nextNode > midpoint)
           break;

       chain = node.node;
       }

   }
<snip>

Then we check to make sure that this is not the last node on the page, and if it is, we split on the previous node (prevNode) (I sent a mail about the calculation of the previous node some weeks ago, unrelated to this):

   // The split node should never be the last node on the page
     if ((UCHAR*) node.nextNode > (UCHAR*) this + length)
       node = prevNode;

What I see in the debugger is that (UCHAR*) node.nextNode == (UCHAR*) this + length (note *equal*). To my understanding, this *is* the last node on the page, and the condition should have been '>=' instead of just '>'. This means that we in this case split on the last node instead of the second to last node. On the next iteration we will navigate to this page once again, and try to insert into it. This page is still full (same length as the last time, we split on the END_BUCKET node I assume), and we will split it again. This goes on ... and on (I have looped through 14 of the iterations).

If I change this condition from '>' to '>=' and also actually update prevNode in the loop showed earlier, so that it does not refer to the first node on the page, the node will be inserted on the second iteration, after the first split. The recovery still segfaults, but on a different place that I have no knowledge of, further down the recovery process

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/80731

2766 lars-erik.bjork@sun.com	2009-08-13
      This is a fix for bug #46083 - falcon_pagesize2K fails in
      IndexRootPage::addIndexEntry() during recovery
      
      This is the reason for the bug:
      
      We try to add a node in IndexRootPage::addIndexEntry:
      
      <snip>
      
      /* If the node fits on page, we're done */
      
      AddNodeResult result;
      
      for (;;)
        {
        result = page->addNode(dbb, key, recordNumber);
      <snip>
      
      The node does not fit on the page, and we are told to split the page
      at the middle.
      
      In IndexPage::splitIndexPageMiddle, we search for a node to split
      on. We want to chose a node where the prefix compression is poor.
      
      <snip>
      for (; node.node < pageEnd; node.getNext(bucketEnd))
        {
        //int l =
        node.expandKey(key);
      
        if (node.offset || node.length)
            {
            if (node.nextNode > midpoint)
                break;
      
            chain = node.node;
            }
      
        }
      <snip>
      
      
      Then we check to make sure that this is not the last node on the page,
      and if it is, we split on the previous node (prevNode)
      
      <snip>
      // The split node should never be the last node on the page
        if ((UCHAR*) node.nextNode > (UCHAR*) this + length)
          node = prevNode;
      <snip>
      
      What I have seen in the debugger is that (UCHAR*) node.nextNode == (UCHAR*)
      this + length (note *equal*). To my understanding, this *is* the last
      node on the page, and the condition should have been '>=' instead of
      just '>'. This means that we in this case split on the last node
      instead of the second to last node. On the next iteration we will
      navigate to this page once again, and try to insert into it. This page
      is still full (same length as the last time, we split on the
      END_BUCKET node I assume), and we will split it again. This repeats.
      
      In this fix I change the condition from '>' to '>=' and also actually
      update prevNode in the loop showed earlier, so that it does not refer
      to the first node on the page.
      
      modified file 'storage/falcon/IndexPage.cpp' 
      * Changed the condition, and updated prevNode.  
      * Removed unused and confusing 'chain' from ::splitIndexPageMiddle and
      ::splitIndexPageEnd
      modified:
        storage/falcon/IndexPage.cpp

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/81477

2765 lars-erik.bjork@sun.com	2009-08-25
      This is a followup patch for bug #46083 - falcon_pagesize2K fails in
      IndexRootPage::addIndexEntry() during recovery
      
      Since the eternal loop should no longer occur after the previous
      patch, I have decreased the loop count from 1000 to 10. This should
      hopefully be enough to handle concurrency when splitting pages.
      Ideally, we should either implement a solution that does not require
      looping, or make the current code robust enough to have a for (;;) loop
      
      If the assert no longer happens, we should also close bug#44542 -
      Falcon needs a redesign of IndexRootPage::addIndexEntry
      
      modified file 'storage/falcon/IndexRootPage.cpp' 
      * Changed loop count  
      * Removed a comment
      modified:
        storage/falcon/IndexRootPage.cpp

A bug in a feature tree, no need for a documentation entry, closing.