MySQL Bugs: #42343: Falcon crash in Cache::flush on/after recovery

Bug #42343	Falcon crash in Cache::flush on/after recovery
Submitted:	26 Jan 2009 9:56	Modified:	27 Mar 2009 16:18
Reporter:	Philip Stoev	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S1 (Critical)
Version:	6.0-falcon-team	OS:	Any
Assigned to:	Vladislav Vaintroub	CPU Architecture:	Any
Tags:	F_RECOVERY

Description:
When recovering a tablespace produced  by bug #42340, Falcon crashed as follows:

#0  0x000000315b00c216 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000014d556a in my_write_core (sig=11) at stacktrace.c:309
#2  0x000000000081cae3 in handle_segfault (sig=11) at mysqld.cc:2679
#3  <signal handler called>
#4  0x0000000000ecea2b in Cache::flush (this=0x7f00320360b8, arg=378)
    at Cache.cpp:410
#5  0x0000000000f0473e in Database::flush (this=0x7f003200dbc8, arg=378)
    at Database.cpp:1262
#6  0x0000000000ff0a8c in SerialLog::checkpoint (this=0x7f0032204068,
    force=false) at SerialLog.cpp:999
#7  0x0000000000ff0b4d in SerialLog::execute (this=0x7f0032204068,
    scheduler=0x7f0031e16270) at SerialLog.cpp:980
#8  0x0000000000fcf0da in Scheduler::schedule (this=0x7f0031e16270)
    at Scheduler.cpp:137
#9  0x0000000000fcf3ed in Scheduler::schedule (lpParameter=0x7f0031e16270)
    at Scheduler.cpp:159
#10 0x0000000000e9f4c7 in Thread::thread (this=0x7f0031e338f8)
    at Thread.cpp:167
#11 0x0000000000e9f9d5 in Thread::thread (parameter=0x7f0031e338f8)
    at Thread.cpp:146
#12 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0

405             sync.unlock();
406             flushLock.unlock();
407
408             for (int n = 0; n < numberIoThreads; ++n)
409                     if (ioThreads[n])
410                             ioThreads[n]->wake();
411     }
412
413     void Cache::moveToHead(Bdb * bdb)
414     {

(gdb) print *ioThreads[n]
$5 = {<Synchronize> = {_vptr.Synchronize = 0xeeeeeeeeeeeeeeee,
    shutdownInProgress = 238, sleeping = 238, wakeup = 238,
    waitTime = -1229782938247303442, condition = {__data = {
        __lock = -286331154, __futex = 4008636142,
        __total_seq = 17216961135462248174,
        __wakeup_seq = 17216961135462248174,
        __woken_seq = 17216961135462248174, __mutex = 0xeeeeeeeeeeeeeeee,
        __nwaiters = 4008636142, __broadcast_seq = 4008636142},
      __size = 'Н' <repeats 48 times>, __align = -1229782938247303442},
    mutex = {__data = {__lock = -286331154, __count = 4008636142,
        __owner = -286331154, __nusers = 4008636142, __kind = -286331154,
        __spins = -286331154, __list = {__prev = 0xeeeeeeeeeeeeeeee,
          __next = 0xeeeeeeeeeeeeeeee}}, __size = 'Н' <repeats 40 times>,
      __align = -1229782938247303442}}, argument = 0xeeeeeeeeeeeeeeee,
  function = 0xeeeeeeeeeeeeeeee, threadHandle = 0xeeeeeeeeeeeeeeee,
  threadId = 17216961135462248174, threadBarn = 0xeeeeeeeeeeeeeeee,
  next = 0xeeeeeeeeeeeeeeee, prior = 0xeeeeeeeeeeeeeeee,
  queue = 0xeeeeeeeeeeeeeeee, srlQueue = 0xeeeeeeeeeeeeeeee,
  lockType = 4008636142, wakeupType = 4008636142, lockGranted = 238,
  licenseWakeup = 238, activeLocks = -286331154, locks = 0xeeeeeeeeeeeeeeee,
  lockPending = 0xeeeeeeeeeeeeeeee, syncWait = 0xeeeeeeeeeeeeeeee,
  marked = 238, pageMarks = -286331154, eventNumber = -286331154,
  random = -286331154, backoff = -286331154,
---Type <return> to continue, or q <return> to quit---
  description = 0xeeeeeeeeeeeeeeee <Address 0xeeeeeeeeeeeeeeee out of bounds>,
  where = 0xeeeeeeeeeeeeeeee <Address 0xeeeeeeeeeeeeeeee out of bounds>,
  defaultTimeZone = 0xeeeeeeeeeeeeeeee, javaThread = 0xeeeeeeeeeeeeeeee,
  commitBlockNumber = 17216961135462248174, useCount = -286331154}

This pointer points to invalid memory.

The other thread present at the time has the following backtrace:

#0  0x000000315b00b58d in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x000000000105758e in Synchronize::sleep (this=0x7f0031e0ce80,
    milliseconds=1000, callersMutex=0x0) at Synchronize.cpp:183
#2  0x00000000010570c1 in Synchronize::sleep (this=0x7f0031e0ce80,
    milliseconds=1000) at Synchronize.cpp:136
#3  0x0000000000ea1edf in Threads::waitForAll (this=0x7f0031e0ce80)
    at Threads.cpp:184
#4  0x0000000000f08185 in ~Database (this=0x7f003200dbc8) at Database.cpp:594
#5  0x0000000000eea489 in Connection::getDatabase (this=0x7f0031e0c770,
    dbName=0x7f0031e0c3a4 "FALCON_MASTER",
    dbFileName=0x7fff3b525f70 "/tmp/vardir/master-data_recovery/falcon_master.fts", threads=0x7f0031e0c410) at Connection.cpp:1657
#6  0x0000000000eeeb88 in Connection::openDatabase (this=0x7f0031e0c770,
    dbName=0x7f0031e0c3a4 "FALCON_MASTER",
    filename=0x7f0031e0c3dc "falcon_master.fts", account=0x189c6c0 "mysql",
    password=0x189c6c0 "mysql", address=0x0, parent=0x7f0031e0c410)
    at Connection.cpp:934
#7  0x0000000000e56279 in StorageDatabase::getOpenConnection (
    this=0x7f0031e0c210) at StorageDatabase.cpp:136
#8  0x0000000000e5c33c in StorageHandler::initialize (this=0x7f003200d048)
    at StorageHandler.cpp:987
#9  0x0000000000e4a5db in StorageInterface::falcon_init (p=0x7f002c1b56b8)
---Type <return> to continue, or q <return> to quit---
    at ha_falcon.cpp:245
#10 0x0000000000af5438 in ha_initialize_handlerton (plugin=0x7f002c1881e0)
    at handler.cc:441
#11 0x0000000000c7e8b5 in plugin_initialize (plugin=0x7f002c1881e0)
    at sql_plugin.cc:1007
#12 0x0000000000c864ad in plugin_init (argc=0x1ebafa0, argv=0x7f002c118ec0,
    flags=0) at sql_plugin.cc:1216
#13 0x0000000000825830 in init_server_components () at mysqld.cc:4125
#14 0x00000000008274f8 in main (argc=11, argv=0x7fff3b526978) at mysqld.cc:4636

How to repeat:
The tablespace will be uploaded shortly.

Output in the error log:

# 21:35:54 /nFirst recovery block is 1
# 21:35:54 Last recovery block is 373
# 21:35:54 Recovery read block is 42
# 21:35:54 Recovery phase 1...
# 21:35:54 Processed:    23647
# 21:35:54
# 21:35:54 Recovery phase 2...
# 21:35:54 Processed:    23647
# 21:35:54
# 21:35:54 Recovery phase 3...
# 21:35:55 Processed:    23647
# 21:35:55 Recovery complete
# 21:35:55 1: Commit System Transaction 2
# 21:35:55 1: Commit System Transaction 4
# 21:35:55 1: Commit System Transaction 6
# 21:35:55 Exception: duplicate values for key INDEXES..PRIMARY_KEY in table SYSTEM.INDEXES
# 21:35:55 Verb rollback: duplicate values for key INDEXES..PRIMARY_KEY in table SYSTEM.INDEXES
# 21:35:55 Scheduler::initialize: duplicate values for key INDEXES..PRIMARY_KEY in table SYSTEM.INDEXES
# 21:35:55 1: Rollback transaction 8
# 21:35:55 1: Cache flush: 861 pages, 23 writes in 0 seconds (861 pps)
# 21:36:07 090125 21:36:07 - mysqld got signal 11 ;

When trying to recover the original tablespace with a server from 6.0-falcon-team, the following problem occurs:

bumpPageIncarnation; page 0
[Falcon] Error: assertion (bdb) failed at line 829 in file IndexRootPage.cpp

090327 17:32:31 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=0
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338401 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x40000
./mysqld(my_print_stacktrace+0x32) [0xc5f1e1]
./mysqld(handle_segfault+0x2a6) [0x6c724a]
/lib64/libpthread.so.0 [0x315b00f0f0]
/lib64/libpthread.so.0(raise+0x2b) [0x315b00efab]
./mysqld(Error::debugBreak()+0xe) [0x9efc84]
./mysqld(Error::error(char const*, ...)+0x131) [0x9efdb7]
./mysqld(Error::assertionFailed(char const*, char const*, int)+0x2d) [0x9efe5b]
./mysqld(IndexRootPage::indexMerge(Dbb*, int, SRLUpdateIndex*, unsigned int)+0x189) [0xa04593]
./mysqld(SRLUpdateIndex::execute()+0xd6) [0xa7713c]
./mysqld(SRLUpdateIndex::redo()+0x15) [0xa7716d]
./mysqld(SerialLog::recover()+0xd06) [0xa58160]
./mysqld(Database::openDatabase(char const*)+0x405) [0x9de5f3]
./mysqld(Connection::getDatabase(char const*, char const*, Threads*)+0x16a) [0x9ccea0]
./mysqld(Connection::openDatabase(char const*, char const*, char const*, char const*, char const*, Threads*)+0x1c8) [0x9ce1e0]
./mysqld(StorageDatabase::getOpenConnection()+0x89) [0x983e61]
./mysqld(StorageHandler::initialize()+0xa3) [0x986ef3]
./mysqld(StorageInterface::falcon_init(void*)+0x2ac) [0x97d390]
./mysqld(ha_initialize_handlerton(st_plugin_int*)+0xae) [0x82087a]
./mysqld [0x8d7df8]
./mysqld(plugin_init(int*, char**, int)+0x673) [0x8db5f1]
./mysqld [0x6caa65]
./mysqld(main+0x1f0) [0x6cb563]
/lib64/libc.so.6(__libc_start_main+0xe6) [0x315a41e546]
./mysqld [0x5d2c39]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
[Falcon] Error: assertion (lockState == 0) failed at line 132 in file SyncObject.cpp

Very likely this tablespace is no longer compatible with the latest server.

Unfortunately, the files provided demonstrate something quite different
from the error reported in the bug, and something that suggests that the
files themselves are corrupt.  Specifically, the falcon_master.fts header
page (page 0) contains several bad fields, including the wrong section 
id for the tablespaces table and the wrong size for the serial log 
segments.  It's possible that Falcon corrupted that tablespace, but it
seems more likely that it was damaged in transit.  If we had a bug that
broke header pages, we'd see it more often, I think.