MySQL Bugs: #35072: Falcon crash in RecoveryObjects::findRecoveryObject, line 103

Bug #35072	Falcon crash in RecoveryObjects::findRecoveryObject, line 103
Submitted:	5 Mar 2008 10:01	Modified:	1 Oct 2008 12:02
Reporter:	Philip Stoev	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S1 (Critical)
Version:	6.0-falcon-team	OS:	Any
Assigned to:	Vladislav Vaintroub	CPU Architecture:	Any
Tags:	falcon, pushbuild, test failure, widespread

Description:
When Falcon executes a non-concurrent workload consisting of ALTER TABLE ADD PARTITION and ALTER TABLE DROP PARTITION, the server will crash if the thread is killed. The stack trace is as follows:

#0  0x00110402 in __kernel_vsyscall ()
#1  0x00bdc617 in pthread_kill () from /lib/libpthread.so.0
#2  0x0842ba5c in write_core (sig=11) at stacktrace.c:240
#3  0x082912a6 in handle_segfault (sig=11) at mysqld.cc:2313
#4  <signal handler called>
#5  0x0858a4f9 in RecoveryObjects::findRecoveryObject (this=0xb71828e0, objectNumber=2, tableSpaceId=0) at RecoveryObjects.cpp:103
#6  0x0858a54f in RecoveryObjects::isObjectActive (this=0xb71828e0, objectNumber=2, tableSpaceId=0) at RecoveryObjects.cpp:90
#7  0x085414eb in SerialLog::isIndexActive (this=0xb7496d38, indexId=2, tableSpaceId=0) at SerialLog.cpp:1021
#8  0x085607e8 in SRLUpdateIndex::execute (this=0xae462170) at SRLUpdateIndex.cpp:163
#9  0x08560947 in SRLUpdateIndex::commit (this=0xae462170) at SRLUpdateIndex.cpp:157
#10 0x085474cf in SerialLogTransaction::commit (this=0xb71c9bd0) at SerialLogTransaction.cpp:88
#11 0x085475c3 in SerialLogTransaction::doAction (this=0xb71c9bd0) at SerialLogTransaction.cpp:154
#12 0x0857f893 in Gopher::gopherThread (this=0xb7147230) at Gopher.cpp:66
#13 0x0857f9ad in Gopher::gopherThread (arg=0xb7147230) at Gopher.cpp:37
#14 0x084aefab in Thread::thread (this=0xb714e260) at Thread.cpp:161
#15 0x084af1a7 in Thread::thread (parameter=0xb714e260) at Thread.cpp:140
#16 0x00bd750b in start_thread () from /lib/libpthread.so.0
#17 0x00b18b2e in clone () from /lib/libc.so.6

How to repeat:
In one thread run: 

use test;
delimiter ;
drop table if exists t1;
create table t1_test(id int not null,name varchar(25),data
tinyblob,key(name),key(id)) engine=falcon partition by range(id) ( partition p00
values less than (5000));

                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');
                insert into t1_test values
(floor(10000*rand()),'aaaaaaa','bbbbbbb');

drop procedure if exists p1;
delimiter //
create procedure p1()
begin
        declare counter int default 0;
        declare continue handler for sqlexception begin end;
        repeat

                alter table t1_test add partition (partition `p08` values less
than (200000));
                alter table t1_test drop partition p08;
                set counter=counter+1;
        until counter > 1000000
        end repeat;

end //

delimiter ;

call p1();

This will crash within 20 seconds. This test case is adapted from Shane Bester's test case for another bug.

Suggested fix:
Note that is is a single-thread scenario, so there are no concurrency issue between concurrent DDL statements.

I start my servers with perl mysql-test-run.pl --start-and-exit --skip-ndb.

[philips@philips build]$ uname -a
Linux philips 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 13:55:12 EDT 2007 i686 i686 i386 GNU/Linux

I use a debug build from the 6.0-falcon-team tree.

Philip, I ran your test for more than 6 hours. No crash.

I did a fresh clone of mysql-6.0-falcon-team, compiled with BUILD/compile-pentium-debug-max-no-ndb, started server with perl mysql-test-run.pl --start-and-exit --skip-ndb and then ran the second test case. It crashed in 10 seconds or so.

This stack trace still occurs with 6.0.5

Unrolled test case for bug 35072

Attachment: bug35072.test (application/octet-stream, text), 22.63 KiB.

I just uploaded a test case where the procedure loop has been unrolled. to run: 

$ mysql -uroot --socket=var/tmp/master.sock  -f < bug35072.test

crash is almost immediate on my virtual fedora core, 2 x 1.4 Ghz cpu. Timing appears to be important.

i got this crash with 1 thread workload. all thread debug info attached.

Attachment: bug35072_debug_info.txt (text/plain), 35.70 KiB.

The test falcon_bugs2 crashed at the exact same place in RecoveryObjects::findRecoveryObject() during Pushbuild on mysql-6.0-falcon branch, Windows 2003, Mon Jul 14 17:11:24 2008.

Test output:

falcon.falcon_bugs2            [ fail ]

mysqltest: At line 557: query 'DROP TABLE t1' failed: 2006: MySQL server has gone away

The result from queries just before the failure was:
< snip >
SET autocommit=0;
SET tx_isolation="READ-COMMITTED";
CREATE TABLE t1 (a int);
INSERT INTO t1 VALUES (1);
SET autocommit=0;
SET tx_isolation="READ-COMMITTED";
SELECT * FROM t1;
a
COMMIT;
SELECT * FROM t1;
a
1
SELECT * FROM t1;
a
1
DROP TABLE t1;
***************** falcon_bugs2_023 *****************
CREATE TABLE t1 (a int);
ALTER TABLE t1 ADD PRIMARY KEY (a);
DROP TABLE t1;

Stack trace from Master.err:

CURRENT_TEST: falcon.falcon_bugs
080714 23:36:15 - mysqld got exception 0xc0000005 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=5
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 59980 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
007E7490    mysqld.exe!RecoveryObjects::findRecoveryObject()[recoveryobjects.cpp:103]
007E764F    mysqld.exe!RecoveryObjects::isObjectActive()[recoveryobjects.cpp:90]
007C7904    mysqld.exe!SRLUpdateRecords::commit()[srlupdaterecords.cpp:361]
FFFFFFFF    kernel32.dll!AlignRects()
77F4725C    ntdll.dll!RtlFreeHeap()
65535C2E    
65535C2E    
FFFFF428    
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
CURRENT_TEST: falcon.falcon_bugs2
MiniDumpWriteDump() failed, last error 3489660941

Vlad, This apparently has happened several times.  John noticed the same crash on a recent pushbuild, now that we can get call stacks.  This is not a recovery, it is a gopher thread doing drop table.

Another similar stack trace from Windows Pushbuild:

CURRENT_TEST: falcon.falcon_bug_22516
080715 16:43:42 - mysqld got exception 0xc0000005 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=3
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60395 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
00000001404CF428    mysqld.exe!RecoveryObjects::isObjectActive()[recoveryobjects.cpp:90]
000000014049AFD2    mysqld.exe!SRLUpdateIndex::commit()[srlupdateindex.cpp:157]
00000001404D01A0    mysqld.exe!SerialLogTransaction::commit()[seriallogtransaction.cpp:88]
00000001404D03A2    mysqld.exe!SerialLogTransaction::doAction()[seriallogtransaction.cpp:159]
00000001404CEB45    mysqld.exe!Gopher::gopherThread()[gopher.cpp:73]
000000014042A8C8    mysqld.exe!Thread::thread()[thread.cpp:169]
000000014042AE3E    mysqld.exe!Thread::thread()[thread.cpp:147]
0000000077D6B69A    kernel32.dll!BaseThreadStart()

The crash is here:

98      RecoveryPage* RecoveryObjects::findRecoveryObject(int objectNumber, int tableSpaceId)
99      {
100             int slot = objectNumber % RPG_HASH_SIZE;
101
102             for (RecoveryPage *object = recoveryObjects[slot]; object; object = object->collision)
103                     if (object->objectNumber == objectNumber && object->tableSpaceId == tableSpaceId) <<<<<<<<<<<< HEREE
104                             return object;
105
106             return NULL;
107     }

(gdb) print object
$1 = (class RecoveryPage *) 0xeeeeeeee

Another crash in Windows Pushbuild with similar stack trace:

CURRENT_TEST: falcon.index_merge_falcon
080718  2:28:05 - mysqld got exception 0xc0000005 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=2
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 59980 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
007723D0    mysqld.exe!RecoveryObjects::findRecoveryObject()[recoveryobjects.cpp:103]
0077258F    mysqld.exe!RecoveryObjects::isObjectActive()[recoveryobjects.cpp:90]
00752A83    mysqld.exe!SRLUpdateIndex::execute()[srlupdateindex.cpp:163]
007530CE    mysqld.exe!SRLUpdateIndex::commit()[srlupdateindex.cpp:158]
00772FE0    mysqld.exe!SerialLogTransaction::commit()[seriallogtransaction.cpp:92]
0077204B    mysqld.exe!Gopher::gopherThread()[gopher.cpp:73]
00704D9D    mysqld.exe!Thread::thread()[thread.cpp:169]
007050EC    mysqld.exe!Thread::thread()[thread.cpp:147]
77E64829    kernel32.dll!GetModuleHandleA()
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
Minidump written to e:\var-falcon-101\master-data\mysqld.dmp

Not Windows specific, Philip indicates that his original test crashes on his Linux machine

Philip, could please you attach the full backtrace, including other threads?
I still cannot reproduce. Seems to be a very subtle timing issue. Interestingly, all machines on which the crash happened, are VMWare boxes.

Threads for bug 35072

Attachment: bug35072.threads.txt (text/plain), 10.52 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50059

2750 Vladislav Vaintroub	2008-07-18
      Bug#35072: Crash Falcon crash in RecoveryObjects::findRecoveryObject
      Problem: The functiion crashed while traversing single linked list 
      that contained a dangling pointer (deleted memory, filled with 0xe bytes).
      Access and modification of the hash table was not synchronized.
      When one thread adds a new entry to the hashtable while another
      is delets from the same hash bucket,  hashtable can become corrupted 
      (i.e still have pointer to the deleted object)
      
      Solution is to synchronize hashtable access. Here, a reader/writer
      lock is used, one per hash bucket.

Vlad, before you push this, we need to measure the performance cost in a high concurrency test.

Hakan and/or Kelly.

Can one of you try out Vlad's patch here to see what affect it has on DBT2?  

The new RecoveryObjects::syncArray of SyncObjects is a concern expressed by Jim below in an email sent to dev_falcon.  Also,  I have added Vlad's color commentary.

>Vladislav Vaintroub wrote:
>> Falconers,
>> I'm asking 2 volunteers to review the fix to
>> http://lists.mysql.com/commits/50059 , the fix for accessing invalid
>> memory in RecoveryObjects::findObject problem, that was plaguing us 
>> in the last time on pushbuild.
>>
>> Admittedly, I could not reproduce the problem either on my Windows machine
>> or on one of my 2 vmwared Ubuntus. Yet, the problem seems to be  obvious. 
>> It has to be missing synchronization when the hashtable is modified in 
>> different threads.  (I can't think of anythihg else, here luckily all 
>> memory allocations and deletes are done in the same class.  
>>
>> I also asked Philip to test the fix and he confirms problem does not exist
>> on his Virtual machine anymore.
>>
>> Preliminary answer for the friends of lock-free  -  it might or might 
>> not be possible to do the job without any synchronization and with 
>> compare-exchange  and memory-barriers and such.
>> I did not even try- it would be Herculean effort for me to prove 
>> that it works only to find out later, that it does not work on some 
>> OS and with some compiler. I tried my best to make this as fast as 
>> possible with a reader/writer lock (i.e SyncObject)  per hash bucket. 
>> Not sure it was necessary, single mutex would probably be just fine here.

>Jim Starkey wrote;
>Vlad, you may want to think about this some more.   I suspect you fix
>will work, but at a cost in memory and performance that can probably be
>avoided.
>
>The primary use of recovery objects is during recovery, which is single
>threaded.  So no synchronization is required there.
>
>The mechanism is also used to track the states of indexes and data
>sections during ordinary processing to avoid posting stuff to deleted
>objects.  The only places where objects are deleted from the hash table
>is when previous deleted index and section ids are reused and become
>active again.  Insert/insert conflicts are controlled by
>SerialLog::syncObject locked by the START_RECORD macro.  Insert/read
>conflicts are a non-issue because inserting into a hash table is
>intrinsically thread safe (though processor visibility can be an
>issue).  That leaves delete/read problems.
>
>At the moment, we delete recovery objects when their avatars are
>reactivated.  If we just left them in the hash table as "active", the
>problem evaporates.  If we really cared about the tiny bit of memory
>involved,we could do a check for superfluous recovery objects when we
>already had an exclusive lock.
>
>Your solution does minimize collisions.  But since actual contention can
>occur only when creating a new index or table, a rare event in a
>production system, the potential for contention is low anyway.  My
>primary concern is the cost of the interlocked CASes to seize and
>release the hash table sync object.  But do note that the memory
>consumed for a large number of sync objects is probably greater than the
>memory consumed by dangling (but correct) recovery objects.
>
> From time to time I wake up in the middle of the night to worry that
>Falcon is 98% lock/unlock calls with only 2% of the cycles actually
>slopping data around.  For non-critical areas, it makes good engineering
>sense to be conservative with the locking strategy.  But in the
>mainline, we need to be clever.  (Of course, we also have to get it
>right, which isn't easy.)

This fix is in version 6.0.7

Pushed into 6.0.7-alpha  (revid:vvaintroub@mysql.com-20080718212452-v2bgwuzcalgzanoq) (version source revid:sven@mysql.com-20080818195835-r615g9zz6xphmkzg) (pib:3)

Documented bugfix in the 6.0.7 changelog as follows:

        Executing ALTER TABLE ADD PARTITION followed by ALTER TABLE DROP
        PARTITION on a Falcon table, and then killing the thread performing the
        ALTER TABLE statements could cause the server to crash.