MySQL Bugs: #37726: Falcon crash in WalkDeferred::getNext

Bug #37726	Falcon crash in WalkDeferred::getNext
Submitted:	29 Jun 2008 12:31	Modified:	1 Oct 2008 12:42
Reporter:	Philip Stoev	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S1 (Critical)
Version:	6.0-falcon	OS:	Any
Assigned to:	Ann Harrison	CPU Architecture:	Any

Description:
When executing a DML workload involving LIMIT, Falcon crashed as follows:

#0  0x00110416 in __kernel_vsyscall ()
#1  0x00581c78 in pthread_kill () from /lib/libpthread.so.0
#2  0x0844a1c9 in write_core (sig=11) at stacktrace.c:302
#3  0x082a464c in handle_segfault (sig=11) at mysqld.cc:2635
#4  <signal handler called>
#5  0x085abc70 in WalkDeferred::getNext (this=0xadb88e78, lockForUpdate=false) at WalkDeferred.cpp:43
#6  0x08531427 in IndexWalker::getNext (this=0xadcaf188, lockForUpdate=false) at IndexWalker.cpp:94
#7  0x084bb092 in StorageDatabase::nextIndexed (this=0xb71e9130, storageTable=0xb507c620, indexWalker=0xadcaf188, lockForUpdate=false)
    at StorageDatabase.cpp:481
#8  0x084c19f4 in StorageTable::nextIndexed (this=0xb507c620, recordNumber=461, lockForUpdate=false) at StorageTable.cpp:180
#9  0x084b4621 in StorageInterface::index_next (this=0x9617be0, buf=0x9617da8 "Ы") at ha_falcon.cpp:1589
#10 0x0831c047 in join_read_next (info=0x969e7c4) at sql_select.cc:14357
#11 0x0831fa0e in sub_select (join=0xad82aed8, join_tab=0x969e780, end_of_records=false) at sql_select.cc:13449
#12 0x0832c9e1 in do_select (join=0xad82aed8, fields=0x96475f4, table=0x0, procedure=0x0) at sql_select.cc:13190
#13 0x0833f957 in JOIN::exec (this=0xad82aed8) at sql_select.cc:2785
#14 0x0833abfd in mysql_select (thd=0x96461a8, rref_pointer_array=0x9647664, tables=0x969d428, wild_num=0, fields=@0x96475f4, conds=0x969de40, og_num=1,
    order=0x969dfd0, group=0x0, having=0x0, proc_param=0x0, select_options=2148289024, result=0x969e068, unit=0x96472d0, select_lex=0x9647560)
    at sql_select.cc:2975
#15 0x0833fc8a in handle_select (thd=0x96461a8, lex=0x9647274, result=0x969e068, setup_tables_done_option=0) at sql_select.cc:289
#16 0x082b3d6f in execute_sqlcom_select (thd=0x96461a8, all_tables=0x969d428) at sql_parse.cc:4835
#17 0x082b57ac in mysql_execute_command (thd=0x96461a8) at sql_parse.cc:2017
#18 0x082be6ab in mysql_parse (thd=0x96461a8,
    inBuf=0x969d200 "SELECT X . int_key FROM D AS X LEFT JOIN E AS Y ON ( X . int_key = Y . pk ) WHERE X . int_nokey < 71  ORDER BY int_key LIMIT 54",
    length=127, found_semicolon=0xad9f2260) at sql_parse.cc:5799
#19 0x082bf147 in dispatch_command (command=COM_QUERY, thd=0x96461a8, packet=0x9650221 "", packet_length=127) at sql_parse.cc:1058
#20 0x082c0408 in do_command (thd=0x96461a8) at sql_parse.cc:731
#21 0x082ada2e in handle_one_connection (arg=0x96461a8) at sql_connect.cc:1134
#22 0x0057d32f in start_thread () from /lib/libpthread.so.0
#23 0x0049a27e in clone () from /lib/libc.so.6

The problematic code is:

(gdb) list
38
39                              return NULL;
40                              }
41
42                      key = node->key;
43                      keyLength = node->keyLength;  <- HERE
44                      recordNumber = node->recordNumber; 
45
46                      if ( (currentRecord = getValidatedRecord(recordNumber, lockForUpdate)) )
47                              return currentRecord;

(gdb) print node
$1 = (DINode *) 0xeeeeeeee

(gdb) print node->keyLength
Cannot access memory at address 0xeeeeeef2

How to repeat:
If this happens again, a repeatable test case will be provided.

Grammar file for bug 37726

Attachment: bug37726.yy (text/plain), 1.05 KiB.

Please find attached the grammar file required to produce the crash (in at least 3/4 of all runs, the rest may crash with a different type of crash). To run, please clone the mysql-test-extra-6.0 tree and then issue:

$ cd mysql-test-extra-6.0/mysql-test/gentest
$ ./runall.pl --basedir=/path/to/6.0-falcon/ --engine=falcon \
  --grammar=/location/of/bug37726.yy

A crash should happen within 10 minutes. The offending query is a simple select in the form of:

SELECT X . pk FROM C AS X LEFT JOIN D AS Y ON ( X . pk = Y . int_key ) WHERE X . int_nokey < 80 GROUP BY pk  LIMIT 124

Quite possibly even a JOIN is not required. DDL statements are not present in this test case, just simple INSERT/UPDATE/DELETE/SELECT meaning that any workload containing Falcon and LIMIT will crash eventually.

This bug was found after applying Ann's patches to the LIMIT optimization:

<quote>
in IndexWalker::IndexWalker where you should add

balance = 0;

to the initialization statements.

The other is in WalkIndex::getNextNode(void) where
you need to add the "else" statement shown below:

if (first)
{
first = false;
recordNumber = node.getNumber();

if (recordNumber >= 0)
return recordNumber;
else if (recordNumber == END_LEVEL)
return -1;
}
</quote>

Please note that having a literal LIMIT in the query is not required, the optimizer may select to use this path for other queries, such as:

SELECT L . int_key AS X
FROM `C` AS K
LEFT JOIN `A` AS L ON ( K . pk = L . int_key )
WHERE K . int_key IN (
 SELECT L . pk AS X FROM `C` AS K
 LEFT JOIN `A` AS L ON ( K . pk = L . int_key )
 WHERE K . int_nokey IN (
  SELECT K . pk AS X FROM `A` AS K
  LEFT JOIN `C` AS L ON ( K . int_nokey = L . int_key )
  WHERE K . int_nokey < 51 GROUP BY K . pk  )
 GROUP BY K . int_key ORDER BY X )
ORDER BY X

It appears that a deferred index is being deleted while
an index walker is traversing it.  The question is "why?"

>Ann W. Harrison wrote;
>
>    It appears that Philip's test that turns up a node of 0xeeeeee
>is in fact showing a case where a deferred index is deleted while
>an IndexWalker (actually a WalkDeferred object) is walking it.
>Do you remember what mechanism protects a deferred index while it
>is in use?
>
>    The situation appears to be that some transaction A creates a
>deferred index segment and commits. Later some transaction B
>executes a query that requires navigational access to that index.
>While B is operating, A goes from being committed to being write
>complete, and the index is deleted.  B's WalkDeferred object
>still has a pointer to the deleted deferred index, but the
>object itself is released and overwritten.
>
>   Thoughts?

Ann,

It looks like Index::scanIndex protects itself from a disappearing deferredIndex by getting a shared lock on Index::deferredIndexes.syncObject. It will not be deleted during that time because Index::detachDeferredIndex() gets an exclusive lock on it when called by the gopher at writeComplete.

On the other hand, Index::positionIndex will get a shared lock on Index::deferredIndexes.syncObject only while it is searching for the DeferredIndex.  While holding the shared lock, it creates a few of these;

	WalkDeferred *walkDeferred = new WalkDeferred(deferredIndex, transaction, searchFlags, &walkIndex->lowerBound, &walkIndex->upperBound);
	indexWalker->addWalker(walkDeferred);

then returns the indexWalker as the lock is released!

Either we make the walkDeferred objects hold the shared lock on Index::deferredIndexes.syncObject  (bad idea)  or we need to add a refCount mechanism on the DeferredIndex so that even after it has been detached from the index, it will stay around while a walkDeferred object has a pointer to it.  The first approach is very bad because it holds a syncObject for an indeterminate amount of time, which could be very long, and the gophers will hang during that time.  And it prevents removal of any other DeferredIndex attached to an Index.

Kevin

Patch has been pushed to the team tree.  Test runs fine here
with the patch.

Code looks good. Patch Approved.  Revision ID: ann@mysql.com-20080725180724-rqcqddcj962pym15

Pushed to mysql-6.0-falcon-team

This fix is version 6.0.7

Documented in the 6.0.7 changelog as follows:

        Executing large numbers of SQL statements using LIMIT on Falcon tables
        eventually led to a crash of the server.