Bug #37587 | falcon_bug_33404.test hangs forever | ||
---|---|---|---|
Submitted: | 23 Jun 2008 14:47 | Modified: | 8 Jan 2009 9:47 |
Reporter: | Sergey Petrunya | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-6.0-falcon bk | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
[23 Jun 2008 14:47]
Sergey Petrunya
[23 Jun 2008 14:49]
Sergey Petrunya
If I attach the debugger to see where mysqld is, I get the following: (gdb) attach 26482 Attaching to process 26482 ... (skip)... Thread 15 (Thread 0xb7cc3b90 (LWP 26484)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb6fdf250) at Synchronize.cpp:123 #3 0x086665a7 in Cache::ioThread (this=0xb6fdee88) at Cache.cpp:926 #4 0x08666755 in Cache::ioThread (arg=0xb6fdee88) at Cache.cpp:748 #5 0x0859e949 in Thread::thread (this=0xb6fdf250) at Thread.cpp:167 #6 0x0859eb55 in Thread::thread (parameter=0xb6fdf250) at Thread.cpp:146 #7 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #8 0xb7d8709e in clone () from /lib/libc.so.6 Thread 14 (Thread 0xb58bab90 (LWP 26485)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb6fdf330) at Synchronize.cpp:123 #3 0x086665a7 in Cache::ioThread (this=0xb6fdee88) at Cache.cpp:926 #4 0x08666755 in Cache::ioThread (arg=0xb6fdee88) at Cache.cpp:748 #5 0x0859e949 in Thread::thread (this=0xb6fdf330) at Thread.cpp:167 #6 0x0859eb55 in Thread::thread (parameter=0xb6fdf330) at Thread.cpp:146 #7 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #8 0xb7d8709e in clone () from /lib/libc.so.6 Thread 13 (Thread 0xb4eb8b90 (LWP 26486)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f717fc in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x08659052 in Synchronize::sleep (this=0xb6fe8018, milliseconds=1000, callersMutex=0x0) at Synchronize.cpp:183 #3 0x08658e53 in Synchronize::sleep (this=0xb6fe8018, milliseconds=1000) at Synchronize.cpp:136 #4 0x085c2c29 in Database::ticker (this=0xb71df640) at Database.cpp:1880 #5 0x085c2c5d in Database::ticker (database=0xb71df640) at Database.cpp:1869 #6 0x0859e949 in Thread::thread (this=0xb6fe8018) at Thread.cpp:167 #7 0x0859eb55 in Thread::thread (parameter=0xb6fe8018) at Thread.cpp:146 #8 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #9 0xb7d8709e in clone () from /lib/libc.so.6 Thread 12 (Thread 0xb46b7b90 (LWP 26487)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb6fe80f8) at Synchronize.cpp:123 #3 0x08608c19 in PageWriter::writer (this=0xb6fe06f8) at PageWriter.cpp:171 #4 0x08608d45 in PageWriter::writer (arg=0xb6fe06f8) at PageWriter.cpp:160 #5 0x0859e949 in Thread::thread (this=0xb6fe80f8) at Thread.cpp:167 #6 0x0859eb55 in Thread::thread (parameter=0xb6fe80f8) at Thread.cpp:146 #7 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #8 0xb7d8709e in clone () from /lib/libc.so.6 Thread 11 (Thread 0xb2ab5b90 (LWP 26488)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb6fe84f8) at Synchronize.cpp:123 ---Type <return> to continue, or q <return> to quit--- #3 0x0858e674 in SyncObject::wait (this=0xb6fe0364, type=Exclusive, thread=0xb6fe84f8, sync=0xb2ab4cec, timeout=0) at SyncObject.cpp:664 #4 0x0858eb7b in SyncObject::lock (this=0xb6fe0364, sync=0xb2ab4cec, type=Exclusive, timeout=0) at SyncObject.cpp:441 #5 0x0858d0e3 in Sync::lock (this=0xb2ab4cec, type=Exclusive) at Sync.cpp:58 #6 0x0859c9ba in TableSpaceManager::expungeTableSpace (this=0xb6fe0030, tableSpaceId=3) at TableSpaceManager.cpp:415 #7 0x08628d22 in SRLDropTableSpace::commit (this=0xb2ab51fc) at SRLDropTableSpace.cpp:78 #8 0x0864536d in SerialLogTransaction::commit (this=0xb704e1d0) at SerialLogTransaction.cpp:92 #9 0x0864545f in SerialLogTransaction::doAction (this=0xb704e1d0) at SerialLogTransaction.cpp:158 #10 0x0866f790 in Gopher::gopherThread (this=0xb6fe06d0) at Gopher.cpp:77 #11 0x0866f8db in Gopher::gopherThread (arg=0xb6fe06d0) at Gopher.cpp:37 #12 0x0859e949 in Thread::thread (this=0xb6fe84f8) at Thread.cpp:167 #13 0x0859eb55 in Thread::thread (parameter=0xb6fe84f8) at Thread.cpp:146 #14 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #15 0xb7d8709e in clone () from /lib/libc.so.6 Thread 10 (Thread 0xb22b4b90 (LWP 26489)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb6fe85d8) at Synchronize.cpp:123 #3 0x0858e674 in SyncObject::wait (this=0xb733e44c, type=Exclusive, thread=0xb6fe85d8, sync=0xb22b42ec, timeout=0) at SyncObject.cpp:664 #4 0x0858eb7b in SyncObject::lock (this=0xb733e44c, sync=0xb22b42ec, type=Exclusive, timeout=0) at SyncObject.cpp:441 #5 0x0858d0e3 in Sync::lock (this=0xb22b42ec, type=Exclusive) at Sync.cpp:58 #6 0x0866f773 in Gopher::gopherThread (this=0xb6fe06a8) at Gopher.cpp:73 #7 0x0866f8db in Gopher::gopherThread (arg=0xb6fe06a8) at Gopher.cpp:37 #8 0x0859e949 in Thread::thread (this=0xb6fe85d8) at Thread.cpp:167 #9 0x0859eb55 in Thread::thread (parameter=0xb6fe85d8) at Thread.cpp:146 #10 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #11 0xb7d8709e in clone () from /lib/libc.so.6 Thread 9 (Thread 0xb1ab3b90 (LWP 26490)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb6fe86b8) at Synchronize.cpp:123 #3 0x0866f6d8 in Gopher::gopherThread (this=0xb6fe0680) at Gopher.cpp:58 #4 0x0866f8db in Gopher::gopherThread (arg=0xb6fe0680) at Gopher.cpp:37 #5 0x0859e949 in Thread::thread (this=0xb6fe86b8) at Thread.cpp:167 #6 0x0859eb55 in Thread::thread (parameter=0xb6fe86b8) at Thread.cpp:146 #7 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #8 0xb7d8709e in clone () from /lib/libc.so.6 Thread 8 (Thread 0xb12b2b90 (LWP 26491)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb6fe8798) at Synchronize.cpp:123 #3 0x0866f6d8 in Gopher::gopherThread (this=0xb6fe0658) at Gopher.cpp:58 #4 0x0866f8db in Gopher::gopherThread (arg=0xb6fe0658) at Gopher.cpp:37 #5 0x0859e949 in Thread::thread (this=0xb6fe8798) at Thread.cpp:167 #6 0x0859eb55 in Thread::thread (parameter=0xb6fe8798) at Thread.cpp:146 #7 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #8 0xb7d8709e in clone () from /lib/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Thread 7 (Thread 0xb0ab1b90 (LWP 26492)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb6fe8878) at Synchronize.cpp:123 #3 0x0866f6d8 in Gopher::gopherThread (this=0xb6fe0630) at Gopher.cpp:58 #4 0x0866f8db in Gopher::gopherThread (arg=0xb6fe0630) at Gopher.cpp:37 #5 0x0859e949 in Thread::thread (this=0xb6fe8878) at Thread.cpp:167 #6 0x0859eb55 in Thread::thread (parameter=0xb6fe8878) at Thread.cpp:146 #7 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #8 0xb7d8709e in clone () from /lib/libc.so.6
[23 Jun 2008 14:49]
Sergey Petrunya
Thread 6 (Thread 0xb00afb90 (LWP 26493)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb700adc8) at Synchronize.cpp:123 #3 0x086301cd in Scheduler::schedule (this=0xb6fe73a8) at Scheduler.cpp:151 #4 0x08630219 in Scheduler::schedule (lpParameter=0xb6fe73a8) at Scheduler.cpp:158 #5 0x0859e949 in Thread::thread (this=0xb700adc8) at Thread.cpp:167 #6 0x0859eb55 in Thread::thread (parameter=0xb700adc8) at Thread.cpp:146 #7 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #8 0xb7d8709e in clone () from /lib/libc.so.6 Thread 5 (Thread 0xaf8aeb90 (LWP 26494)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb700e0a8) at Synchronize.cpp:123 #3 0x0858e674 in SyncObject::wait (this=0xb71dff8c, type=Shared, thread=0xb700e0a8, sync=0xaf8ae234, timeout=0) at SyncObject.cpp:664 #4 0x0858eb7b in SyncObject::lock (this=0xb71dff8c, sync=0xaf8ae234, type=Shared, timeout=0) at SyncObject.cpp:441 #5 0x0858d0e3 in Sync::lock (this=0xaf8ae234, type=Shared) at Sync.cpp:58 #6 0x085c1d2f in Database::updateCardinalities (this=0xb71df640) at Database.cpp:2363 #7 0x085c3262 in Database::scavenge (this=0xb71df640) at Database.cpp:1710 #8 0x0862e6bc in Scavenger::scavenge (this=0xb6fe74d8) at Scavenger.cpp:58 #9 0x0862e6cf in Scavenger::execute (this=0xb6fe74d8, scheduler=0xb6fe7440) at Scavenger.cpp:68 #10 0x086300d7 in Scheduler::schedule (this=0xb6fe7440) at Scheduler.cpp:136 #11 0x08630219 in Scheduler::schedule (lpParameter=0xb6fe7440) at Scheduler.cpp:158 #12 0x0859e949 in Thread::thread (this=0xb700e0a8) at Thread.cpp:167 #13 0x0859eb55 in Thread::thread (parameter=0xb700e0a8) at Thread.cpp:146 #14 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #15 0xb7d8709e in clone () from /lib/libc.so.6 Thread 4 (Thread 0xaf0adb90 (LWP 26495)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f751ce in do_sigwait () from /lib/libpthread.so.0 #2 0xb7f7526f in sigwait () from /lib/libpthread.so.0 #3 0x0831c18e in signal_hand (arg=0x0) at mysqld.cc:2835 #4 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #5 0xb7d8709e in clone () from /lib/libc.so.6 Thread 3 (Thread 0xaf07cb90 (LWP 26512)): #0 0xffffe410 in __kernel_vsyscall () ---Type <return> to continue, or q <return> to quit--- #1 0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x086591b3 in Synchronize::sleep (this=0xb70137c0) at Synchronize.cpp:123 #3 0x0858e674 in SyncObject::wait (this=0xb733f540, type=Shared, thread=0xb70137c0, sync=0xaf07aa84, timeout=0) at SyncObject.cpp:664 #4 0x0858eb7b in SyncObject::lock (this=0xb733f540, sync=0xaf07aa84, type=Shared, timeout=0) at SyncObject.cpp:441 #5 0x0858d0e3 in Sync::lock (this=0xaf07aa84, type=Shared) at Sync.cpp:58 #6 0x08638bb9 in SerialLog::findTransaction (this=0xb733d210, transactionId=244) at SerialLog.cpp:822 #7 0x0863d91d in SerialLog::getTransaction (this=0xb733d210, transactionId=244) at SerialLog.cpp:835 #8 0x08628dde in SRLDropTableSpace::append (this=0xb733fb3c, tableSpace=0xb701c370, transaction=0xb704d748) at SRLDropTableSpace.cpp:48 #9 0x0859cd38 in TableSpaceManager::dropTableSpace (this=0xb6fe0030, tableSpace=0xb701c370) at TableSpaceManager.cpp:330 #10 0x0864e51d in Nfs::Statement::dropTableSpace (this=0xb7024b60, syntax=0xb70454c8) at Statement.cpp:3019 #11 0x08655cc9 in Nfs::Statement::executeDDL (this=0xb7024b60) at Statement.cpp:849 #12 0x08656c04 in Nfs::Statement::execute (this=0xb7024b60, sqlString=0xaf07ad28 "drop tablespace \"ts3\"", isQuery=false) at Statement.cpp:2705 #13 0x08656c80 in Nfs::Statement::execute (this=0xb7024b60, sqlString=0xaf07ad28 "drop tablespace \"ts3\"") at Statement.cpp:161 #14 0x0864dd69 in Nfs::Statement::executeUpdate (this=0xb7024b60, sqlString=0xaf07ad28 "drop tablespace \"ts3\"") at Statement.cpp:1082 #15 0x085858af in StorageHandler::deleteTablespace (this=0xb71df028, tableSpaceName=0x8df1af8 "ts3") at StorageHandler.cpp:549 #16 0x08572dff in StorageInterface::alter_tablespace (hton=0x8d8bf10, thd=0x8d8c9c0, ts_info=0x8df1b00) at ha_falcon.cpp:2073 #17 0x08507a83 in mysql_alter_tablespace (thd=0x8d8c9c0, ts_info=0x8df1b00) at sql_tablespace.cc:44 #18 0x08336f90 in mysql_execute_command (thd=0x8d8c9c0) at sql_parse.cc:4653 #19 0x0833776c in mysql_parse (thd=0x8d8c9c0, inBuf=0x8df1a68 "DROP TABLESPACE ts3 ENGINE=Falcon", length=33, found_semicolon=0xaf07c260) at sql_parse.cc:5799 #20 0x08338204 in dispatch_command (command=COM_QUERY, thd=0x8d8c9c0, packet=0x8de2e11 "DROP TABLESPACE ts3 ENGINE=Falcon", packet_length=33) at sql_parse.cc:1058 #21 0x083394c1 in do_command (thd=0x8d8c9c0) at sql_parse.cc:731 #22 0x0832564f in handle_one_connection (arg=0x8d8c9c0) at sql_connect.cc:1134 #23 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #24 0xb7d8709e in clone () from /lib/libc.so.6 Thread 2 (Thread 0xaf04bb90 (LWP 26519)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7f7400b in read () from /lib/libpthread.so.0 #2 0x08858bd0 in vio_read (vio=0x8de9550, buf=0x8df8ad8 "\001", size=4) at viosocket.c:44 #3 0x0830c4f4 in my_real_read (net=0x8de7a08, complen=0xaf04b2ec) at net_serv.cc:811 #4 0x0830cae0 in my_net_read (net=0x8de7a08) at net_serv.cc:992 #5 0x08339320 in do_command (thd=0x8de7988) at sql_parse.cc:677 #6 0x0832564f in handle_one_connection (arg=0x8de7988) at sql_connect.cc:1134 #7 0xb7f6d18b in start_thread () from /lib/libpthread.so.0 #8 0xb7d8709e in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb7cc46d0 (LWP 26482)): #0 0xffffe410 in __kernel_vsyscall () #1 0xb7d807d1 in select () from /lib/libc.so.6 #2 0x0831d2a0 in handle_connections_sockets (arg=0x0) at mysqld.cc:5019 #3 0x0832031e in main (argc=31, argv=0xbfaf55e4) at mysqld.cc:4527 #0 0xffffe410 in __kernel_vsyscall () (gdb)
[23 Jun 2008 14:51]
Sergey Petrunya
The hang is very repeatable on my x86 Linux machine, I compiled with BUILD/compile-pentium-debug-max, the latest cset was: ------------------------------------------------------------ revno: 2714 committer: Vladislav Vaintroub<vvaintroub@mysql.com> branch nick: mysql-6.0-falcon timestamp: Mon 2008-06-23 10:08:15 +0200 message: Comment FAST_SHARED. SyncObject deadlocks with it.
[24 Jun 2008 12:29]
MySQL Verification Team
Thank you for the bug report.
[25 Jun 2008 22:37]
Kevin Lewis
Here is the summary of the deadlock; Thread 3 wait for SerialLogTransaction::pending.syncObject type=Shared #6 SerialLog::findTransaction at SerialLog.cpp:822 #7 SerialLog::getTransaction at SerialLog.cpp:835 #8 SRLDropTableSpace::append at SRLDropTableSpace.cpp:48 locked TableSpaceManager::syncObject type=Exclusive #9 TableSpaceManager::dropTableSpace at TableSpaceManager.cpp:330 #10 Nfs::Statement::dropTableSpace at Statement.cpp:3019 #11 Nfs::Statement::executeDDL at Statement.cpp:849 #12 Nfs::Statement::execute at Statement.cpp:2705 #13 Nfs::Statement::execute at Statement.cpp:161 #14 Nfs::Statement::executeUpdate at Statement.cpp:1082 #15 StorageHandler::deleteTablespace at StorageHandler.cpp:549 #16 StorageInterface::alter_tablespace at ha_falcon.cpp:2073 #17 mysql_alter_tablespace at sql_tablespace.cc:44 Thread 11 wait for TableSpaceManager::syncObject type=Exclusive #6 TableSpaceManager::expungeTableSpace at TableSpaceManager.cpp:415 #7 SRLDropTableSpace::commit at SRLDropTableSpace.cpp:78 #8 SerialLogTransaction::commit at SerialLogTransaction.cpp:92 #9 SerialLogTransaction::doAction at SerialLogTransaction.cpp:158 locked SerialLogTransaction::pending.syncObject type=Exclusive #10 Gopher::gopherThread (this=0xb6fe06d0) at Gopher.cpp:77
[30 Jun 2008 15:05]
Kevin Lewis
Actually... This is a three way deadlock. -------------------------------------- Thread 3 wait for SerialLogTransaction::pending.syncObject type=Shared #6 SerialLog::findTransaction at SerialLog.cpp:822 #7 SerialLog::getTransaction at SerialLog.cpp:835 #8 SRLDropTableSpace::append at SRLDropTableSpace.cpp:48 locked TableSpaceManager::syncObject type=Exclusive #9 TableSpaceManager::dropTableSpace at TableSpaceManager.cpp:330 #10 Nfs::Statement::dropTableSpace at Statement.cpp:3019 #11 Nfs::Statement::executeDDL at Statement.cpp:849 #12 Nfs::Statement::execute at Statement.cpp:2705 #13 Nfs::Statement::execute at Statement.cpp:161 #14 Nfs::Statement::executeUpdate at Statement.cpp:1082 #15 StorageHandler::deleteTablespace at StorageHandler.cpp:549 #16 StorageInterface::alter_tablespace at ha_falcon.cpp:2073 #17 mysql_alter_tablespace at sql_tablespace.cc:44 Thread 11 wait for TableSpaceManager::syncObject type=Exclusive #6 TableSpaceManager::expungeTableSpace at TableSpaceManager.cpp:415 #7 SRLDropTableSpace::commit at SRLDropTableSpace.cpp:78 #8 SerialLogTransaction::commit at SerialLogTransaction.cpp:92 #9 SerialLogTransaction::doAction at SerialLogTransaction.cpp:158 not locked SerialLogTransaction::pending.syncObject type=Exclusive #10 Gopher::gopherThread (this=0xb6fe06d0) at Gopher.cpp:77 Thread 10 wait for SerialLog::syncSerializeGophers type=Exclusive locked SerialLogTransaction::pending.syncObject type=Exclusive #6 0x0866f773 in Gopher::gopherThread (this=0xb6fe06a8) at Gopher.cpp:73 ---------------------------------------- Thread 11 had released SerialLogTransaction::pending.syncObject but held the SerialLog::syncSerializeGophers. Thread 10 was still holding SerialLogTransaction::pending.syncObject while it waited for SerialLog::syncSerializeGophers. The fix is simple; Release SerialLogTransaction::pending.syncObject just before waiting for SerialLog::syncSerializeGophers instead of just after that.
[30 Jun 2008 16:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/48744 2722 Kevin Lewis 2008-06-30 Bug#37587 - Release SerialLogTransaction::pending.syncObject just before waiting for SerialLog::syncSerializeGophers instead of just after that. This avoids a 3 way deadlock
[30 Jun 2008 17:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/48746 2723 Kevin Lewis 2008-06-30 Cleanup code and change CRLF to LF
[7 Jul 2008 6:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/49061 2733 Kevin Lewis 2008-07-07 Bug#37587 - Fix this 3 way deadlock a different way. Allow the gopher to hold SerialLog::pending.syncObject while it locks serializeGophers, but release serializeGophers before getting SerialLog::pending.syncObject back. Then fix TableSpaceManager::dropTableSpace so that it releases its syncObject before calling dropTableSpace.append().
[8 Jul 2008 20:55]
Kevin Lewis
The last patch had a problem in that it made TableSpaceManager::dropTableSpace do the SerialLog append at the end while the syncOgject was no longer held. But it did not do a commitSystemTransaction after that. So the file was not deleted in a tmely manner by the gopher thread. Then the next createTableSpace for the same Table tablespace waited forever. Once I fixed that, I tried MTR with a group of tablespace related tests, ass suggested by Vlad; falcon_bug_29511 falcon_bug_31110 falcon_bug_31114 falcon_bug_31295 falcon_bug_31311 falcon_bug_32398 falcon_bug_33211 falcon_bug_33212 falcon_bug_33213 falcon_bug_33216 falcon_bug_33404 This ran for a while until two gopher threads locked up in a deadlock. One was holding serializeGophers with a shared lock and doing a SRLUpdateRecords::commit, which eventually calls SerialLog::findTransaction, which tries to lock SerialLog::pending.syncObject. The other gopher was holding SerialLog::pending.syncObject and waiting for an exclusive lock on serializeGophers. So becaus a gopher thread itself can get SerialLog::pending.syncObject while it is doing an action, and it holds this lock also at its highest loop, the solution to serialize gophers by grabbing a lock while pending is still held, and holding it during the action, will not work. I am looking again for another solution for serializing gopher threads that handle a drop tablespace. This is really a failure of the solution to Bug#36396.
[9 Jul 2008 19:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/49355 2740 Kevin Lewis 2008-07-09 Bug#37587 - Try a new method of serializing gopher threads for DropTableSpace. Instead of holding a lock during the action. Set a lock variable while holding syncPending exclusively and do a loop of release-sleep(10)-lock until the other gopher threads are fully done with previous transactions. Also, reduce the possibility of a deadlock by not holding TableSpaceManager::syncObject while calling SRLDropTableSpace::append(), which locks SerialLog::pending.syncObject.
[9 Jul 2008 21:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/49369 2741 Kevin Lewis 2008-07-09 [merge] merge
[14 Jul 2008 15:51]
Kevin Lewis
This patch was reviewed by Chris, Vlad and Kelly Long. Kelly suggested that another way to serialize this kind of thing would be to focus on the Tablespace. Every transaction that uses a table space would increment a refCount at commit and decrement it upon completion by the gogher. But it was noted that this solution is not necessary to implement since the current solution is acceptable. I would note that note every SerialLogRecord has a quick link tothe tablespace, and looking it up adds code path to the task by the gopher. And it added lines or code to most SRL_____.cpp file. Below is an interesting discussion that Vlad and I had about this solution and possible alternatives that Vlad thought of. ============================================== Falconeers, The following is an email exchange that Vlad and I had this morning about how to fix Bug#37587. Vlad was being very creative, looking for a deterministic, syncObject oriented solution to the problem of serializing gopher threads. I had debugged it a bunch and had given up, opting for a looping backout method that is in a sense, the opposite of waiting for a lock on a syncObject. The interesting part here is the discussion about how to avoid deadlocks. Kevin -----Original Message----- From: Vladislav Vaintroub [mailto:vaintroub@mysql.com] Sent: Wednesday, July 09, 2008 7:25 AM To: 'Kevin Lewis' Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever Kevin, It just occurred to me that actually, there is no necessity in running dropTablespace in exclusive mode. More important is that all transactions that precede dropTablespace and using this tablespace file are finished, before gopher closes and deleted the file. So the pseudo-code could look like this lock pending get the first transaction from queue if(transacttion is in exclusive mode) lock serializeGophers(Exclusive) // after acquiring this lock, all prior transactions are finished unlock serializeGophers lock SerializeGophers(Shared) else lock SerializeGophers(Shared) unlock pending transaction->doAction unlock serializeGophers end if What do you think of this approach? The serializeGophers then should be renamed to something that is better describes the purpose of the lock, i.e gophersSyncPoint or something like that. -----Original Message----- From: Kevin Lewis [mailto:klewis@mysql.com] Sent: Wednesday, July 09, 2008 4:38 PM To: 'Vladislav Vaintroub' Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever Vlad, Your suggestion assumes that while one gopher is holding an exclusive lock on SerialLog::pending.syncObject, all other gophers will finish up their work and come back home to wait at the bottom of the while loop in Gopher::gopherThread. But they do not. Some of them may be stuck trying to do a SerialLog::findTransaction(). These gophers will deadlock since they have a shared lock on serializeGophers and wait on pending.syncObject, while our gopher owns pending.syncObject exclusively and waits on for an exclusive lock on serializeGophers. We cannot lock any syncObject while holding pending.syncObject and hold it through transaction->doAction(). On the bright side, I worked out a solution last night that uses this technique in a few places. while (log->serializeGophers) { syncPending->unlock(); workerThread->sleep(10); syncPending->lock(Exclusive); } log->serializeGophers is now an int32 that is protected by exclusive ownership of syncPending. This solution worked on the list of tablespace tests, when nothing else would. It is not nice, since there is a short wait and those gophers effectively spin instead of wait to be signaled. But the argument is that drop tablespace does not happen very often, so performance does not matter. I will put together a patch on top of the current code in the next few hours for you to review. Kevin -----Original Message----- From: Vladislav Vaintroub [mailto:vaintroub@mysql.com] Sent: Wednesday, July 09, 2008 10:33 AM To: 'Kevin Lewis' Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever Ok, I see your point about deadlock. There is actually point in request *exclusive* serializeGophers lock while holding pending. let me modify my original propose to avoid it lock pending get the first transaction from queue lock serializeGophers(Shared) unlock pending if(transaction is in exclusive mode) unlock serializeGophers lock serializeGophers(Exclusive) // all transaction started before this are now complete endif transaction->doAction() unlock serializeGophers This should ensure that all transactions started before exclusive are completed. Also, there is no possibility for deadlock you describe. What do you think about this. This could avoid " It is not nice" of your solution (I do agree it is not very nice ;) -----Original Message----- From: Kevin Lewis [mailto:klewis@mysql.com] Sent: Wednesday, July 09, 2008 6:26 PM To: 'Vladislav Vaintroub' Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever Vlad, That still will not work. It sets up the three way deadlock again; Gopher 1 Holds exclusive lock on serializeGophers Calls TableSpaceManager::dropTableSpace which Waits for shared lock on TableSpaceManager::syncObject Gopher 2 Holds exclusive lock on SerialLog::pending.syncObject Waits for shared lock on serializeGophers Client Holds exclusive lock on TableSpaceManager::syncObject Calls SRLDropTablespace::append Calls SerialLog::findTransaction Waits for shared lock on SerialLog::pending.syncObject Your newest solution modifies the rule of "never locking in opposite order" by adding the qualifier, "unless it is a shared lock". But I don't think this can be proven safe because other threads can get involved with exclusive locks. And there will ALWAYS be exclusive locks in other threads because without exclusive locks, there is no reason to have a shared lock. Kevin -----Original Message----- From: Vladislav Vaintroub [mailto:vaintroub@mysql.com] Sent: Wednesday, July 09, 2008 12:22 PM To: 'Kevin Lewis' Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever > Client Holds exclusive lock on TableSpaceManager::syncObject > Calls SRLDropTablespace::append > Calls SerialLog::findTransaction > Waits for shared lock on SerialLog::pending.syncObject Is TableSpaceManager::syncObject here necessary? I think we decided it was not ----Original Message---- From: Kevin Lewis [mailto:klewis@mysql.com] Sent: Wednesday, July 09, 2008 8:59 PM To: 'Vladislav Vaintroub' Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever I have changed TableSpaceManager::dropTableSpace so that it does not hold both of them at the same time, but that is just one of the many possible paths that can call SerialLog::findTransaction. If this one path is cleaned up, what is to prevent another possible 3-way or 4-way deadlock? I think that leaving code paths that acquire locks in opposite order is an invitation to deadlock in some current or future combination of code paths. I am still struggling to create this patch in bazaar. I will get you the patch as soon as I can.
[22 Aug 2008 21:13]
Kevin Lewis
This is fixed in version 6.0.6
[8 Jan 2009 9:47]
MC Brown
Internal change only. No documentation needed.