Bug #37587 falcon_bug_33404.test hangs forever
Submitted: 23 Jun 2008 14:47 Modified: 8 Jan 2009 9:47
Reporter: Sergey Petrunya Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:mysql-6.0-falcon bk OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Triage: D3 (Medium)

[23 Jun 2008 14:47] Sergey Petrunya
Description:
falcon_bug_33404.test hangs forever in latest mysql-6.0-falcon bzr tree.

How to repeat:
Run  

 ./mysql-test-run --suite=falcon falcon_bug_33404 

and see it to hang - the test will continue to run but the cpu usage will drop to zero. If I connect with mysql client to see what mysql is doing I see it is stuck in the following state:

mysql> show processlist;
+----+------+-----------+------+---------+------+----------------------+-----------------------------------+
| Id | User | Host      | db   | Command | Time | State                | Info                              |
+----+------+-----------+------+---------+------+----------------------+-----------------------------------+
|  1 | root | localhost | test | Query   |  132 | checking permissions | DROP TABLESPACE ts3 ENGINE=Falcon | 
|  2 | root | localhost | NULL | Query   |    0 | NULL                 | show processlist                  | 
+----+------+-----------+------+---------+------+----------------------+-----------------------------------+
2 rows in set (0.00 sec)
[23 Jun 2008 14:49] Sergey Petrunya
If I attach the debugger to see where mysqld is, I get the following:
(gdb) attach 26482
Attaching to process 26482
... (skip)...
Thread 15 (Thread 0xb7cc3b90 (LWP 26484)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb6fdf250) at Synchronize.cpp:123
#3  0x086665a7 in Cache::ioThread (this=0xb6fdee88) at Cache.cpp:926
#4  0x08666755 in Cache::ioThread (arg=0xb6fdee88) at Cache.cpp:748
#5  0x0859e949 in Thread::thread (this=0xb6fdf250) at Thread.cpp:167
#6  0x0859eb55 in Thread::thread (parameter=0xb6fdf250) at Thread.cpp:146
#7  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#8  0xb7d8709e in clone () from /lib/libc.so.6

Thread 14 (Thread 0xb58bab90 (LWP 26485)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb6fdf330) at Synchronize.cpp:123
#3  0x086665a7 in Cache::ioThread (this=0xb6fdee88) at Cache.cpp:926
#4  0x08666755 in Cache::ioThread (arg=0xb6fdee88) at Cache.cpp:748
#5  0x0859e949 in Thread::thread (this=0xb6fdf330) at Thread.cpp:167
#6  0x0859eb55 in Thread::thread (parameter=0xb6fdf330) at Thread.cpp:146
#7  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#8  0xb7d8709e in clone () from /lib/libc.so.6

Thread 13 (Thread 0xb4eb8b90 (LWP 26486)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f717fc in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x08659052 in Synchronize::sleep (this=0xb6fe8018, milliseconds=1000, callersMutex=0x0) at Synchronize.cpp:183
#3  0x08658e53 in Synchronize::sleep (this=0xb6fe8018, milliseconds=1000) at Synchronize.cpp:136
#4  0x085c2c29 in Database::ticker (this=0xb71df640) at Database.cpp:1880
#5  0x085c2c5d in Database::ticker (database=0xb71df640) at Database.cpp:1869
#6  0x0859e949 in Thread::thread (this=0xb6fe8018) at Thread.cpp:167
#7  0x0859eb55 in Thread::thread (parameter=0xb6fe8018) at Thread.cpp:146
#8  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#9  0xb7d8709e in clone () from /lib/libc.so.6

Thread 12 (Thread 0xb46b7b90 (LWP 26487)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb6fe80f8) at Synchronize.cpp:123
#3  0x08608c19 in PageWriter::writer (this=0xb6fe06f8) at PageWriter.cpp:171
#4  0x08608d45 in PageWriter::writer (arg=0xb6fe06f8) at PageWriter.cpp:160
#5  0x0859e949 in Thread::thread (this=0xb6fe80f8) at Thread.cpp:167
#6  0x0859eb55 in Thread::thread (parameter=0xb6fe80f8) at Thread.cpp:146
#7  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#8  0xb7d8709e in clone () from /lib/libc.so.6

Thread 11 (Thread 0xb2ab5b90 (LWP 26488)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb6fe84f8) at Synchronize.cpp:123
---Type <return> to continue, or q <return> to quit---
#3  0x0858e674 in SyncObject::wait (this=0xb6fe0364, type=Exclusive, thread=0xb6fe84f8, sync=0xb2ab4cec, timeout=0) at SyncObject.cpp:664
#4  0x0858eb7b in SyncObject::lock (this=0xb6fe0364, sync=0xb2ab4cec, type=Exclusive, timeout=0) at SyncObject.cpp:441
#5  0x0858d0e3 in Sync::lock (this=0xb2ab4cec, type=Exclusive) at Sync.cpp:58
#6  0x0859c9ba in TableSpaceManager::expungeTableSpace (this=0xb6fe0030, tableSpaceId=3) at TableSpaceManager.cpp:415
#7  0x08628d22 in SRLDropTableSpace::commit (this=0xb2ab51fc) at SRLDropTableSpace.cpp:78
#8  0x0864536d in SerialLogTransaction::commit (this=0xb704e1d0) at SerialLogTransaction.cpp:92
#9  0x0864545f in SerialLogTransaction::doAction (this=0xb704e1d0) at SerialLogTransaction.cpp:158
#10 0x0866f790 in Gopher::gopherThread (this=0xb6fe06d0) at Gopher.cpp:77
#11 0x0866f8db in Gopher::gopherThread (arg=0xb6fe06d0) at Gopher.cpp:37
#12 0x0859e949 in Thread::thread (this=0xb6fe84f8) at Thread.cpp:167
#13 0x0859eb55 in Thread::thread (parameter=0xb6fe84f8) at Thread.cpp:146
#14 0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#15 0xb7d8709e in clone () from /lib/libc.so.6

Thread 10 (Thread 0xb22b4b90 (LWP 26489)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb6fe85d8) at Synchronize.cpp:123
#3  0x0858e674 in SyncObject::wait (this=0xb733e44c, type=Exclusive, thread=0xb6fe85d8, sync=0xb22b42ec, timeout=0) at SyncObject.cpp:664
#4  0x0858eb7b in SyncObject::lock (this=0xb733e44c, sync=0xb22b42ec, type=Exclusive, timeout=0) at SyncObject.cpp:441
#5  0x0858d0e3 in Sync::lock (this=0xb22b42ec, type=Exclusive) at Sync.cpp:58
#6  0x0866f773 in Gopher::gopherThread (this=0xb6fe06a8) at Gopher.cpp:73
#7  0x0866f8db in Gopher::gopherThread (arg=0xb6fe06a8) at Gopher.cpp:37
#8  0x0859e949 in Thread::thread (this=0xb6fe85d8) at Thread.cpp:167
#9  0x0859eb55 in Thread::thread (parameter=0xb6fe85d8) at Thread.cpp:146
#10 0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#11 0xb7d8709e in clone () from /lib/libc.so.6

Thread 9 (Thread 0xb1ab3b90 (LWP 26490)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb6fe86b8) at Synchronize.cpp:123
#3  0x0866f6d8 in Gopher::gopherThread (this=0xb6fe0680) at Gopher.cpp:58
#4  0x0866f8db in Gopher::gopherThread (arg=0xb6fe0680) at Gopher.cpp:37
#5  0x0859e949 in Thread::thread (this=0xb6fe86b8) at Thread.cpp:167
#6  0x0859eb55 in Thread::thread (parameter=0xb6fe86b8) at Thread.cpp:146
#7  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#8  0xb7d8709e in clone () from /lib/libc.so.6

Thread 8 (Thread 0xb12b2b90 (LWP 26491)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb6fe8798) at Synchronize.cpp:123
#3  0x0866f6d8 in Gopher::gopherThread (this=0xb6fe0658) at Gopher.cpp:58
#4  0x0866f8db in Gopher::gopherThread (arg=0xb6fe0658) at Gopher.cpp:37
#5  0x0859e949 in Thread::thread (this=0xb6fe8798) at Thread.cpp:167
#6  0x0859eb55 in Thread::thread (parameter=0xb6fe8798) at Thread.cpp:146
#7  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#8  0xb7d8709e in clone () from /lib/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 7 (Thread 0xb0ab1b90 (LWP 26492)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb6fe8878) at Synchronize.cpp:123
#3  0x0866f6d8 in Gopher::gopherThread (this=0xb6fe0630) at Gopher.cpp:58
#4  0x0866f8db in Gopher::gopherThread (arg=0xb6fe0630) at Gopher.cpp:37
#5  0x0859e949 in Thread::thread (this=0xb6fe8878) at Thread.cpp:167
#6  0x0859eb55 in Thread::thread (parameter=0xb6fe8878) at Thread.cpp:146
#7  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#8  0xb7d8709e in clone () from /lib/libc.so.6
[23 Jun 2008 14:49] Sergey Petrunya
Thread 6 (Thread 0xb00afb90 (LWP 26493)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb700adc8) at Synchronize.cpp:123
#3  0x086301cd in Scheduler::schedule (this=0xb6fe73a8) at Scheduler.cpp:151
#4  0x08630219 in Scheduler::schedule (lpParameter=0xb6fe73a8) at Scheduler.cpp:158
#5  0x0859e949 in Thread::thread (this=0xb700adc8) at Thread.cpp:167
#6  0x0859eb55 in Thread::thread (parameter=0xb700adc8) at Thread.cpp:146
#7  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#8  0xb7d8709e in clone () from /lib/libc.so.6

Thread 5 (Thread 0xaf8aeb90 (LWP 26494)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb700e0a8) at Synchronize.cpp:123
#3  0x0858e674 in SyncObject::wait (this=0xb71dff8c, type=Shared, thread=0xb700e0a8, sync=0xaf8ae234, timeout=0) at SyncObject.cpp:664
#4  0x0858eb7b in SyncObject::lock (this=0xb71dff8c, sync=0xaf8ae234, type=Shared, timeout=0) at SyncObject.cpp:441
#5  0x0858d0e3 in Sync::lock (this=0xaf8ae234, type=Shared) at Sync.cpp:58
#6  0x085c1d2f in Database::updateCardinalities (this=0xb71df640) at Database.cpp:2363
#7  0x085c3262 in Database::scavenge (this=0xb71df640) at Database.cpp:1710
#8  0x0862e6bc in Scavenger::scavenge (this=0xb6fe74d8) at Scavenger.cpp:58
#9  0x0862e6cf in Scavenger::execute (this=0xb6fe74d8, scheduler=0xb6fe7440) at Scavenger.cpp:68
#10 0x086300d7 in Scheduler::schedule (this=0xb6fe7440) at Scheduler.cpp:136
#11 0x08630219 in Scheduler::schedule (lpParameter=0xb6fe7440) at Scheduler.cpp:158
#12 0x0859e949 in Thread::thread (this=0xb700e0a8) at Thread.cpp:167
#13 0x0859eb55 in Thread::thread (parameter=0xb700e0a8) at Thread.cpp:146
#14 0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#15 0xb7d8709e in clone () from /lib/libc.so.6

Thread 4 (Thread 0xaf0adb90 (LWP 26495)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f751ce in do_sigwait () from /lib/libpthread.so.0
#2  0xb7f7526f in sigwait () from /lib/libpthread.so.0
#3  0x0831c18e in signal_hand (arg=0x0) at mysqld.cc:2835
#4  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#5  0xb7d8709e in clone () from /lib/libc.so.6

Thread 3 (Thread 0xaf07cb90 (LWP 26512)):
#0  0xffffe410 in __kernel_vsyscall ()
---Type <return> to continue, or q <return> to quit---
#1  0xb7f71576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x086591b3 in Synchronize::sleep (this=0xb70137c0) at Synchronize.cpp:123
#3  0x0858e674 in SyncObject::wait (this=0xb733f540, type=Shared, thread=0xb70137c0, sync=0xaf07aa84, timeout=0) at SyncObject.cpp:664
#4  0x0858eb7b in SyncObject::lock (this=0xb733f540, sync=0xaf07aa84, type=Shared, timeout=0) at SyncObject.cpp:441
#5  0x0858d0e3 in Sync::lock (this=0xaf07aa84, type=Shared) at Sync.cpp:58
#6  0x08638bb9 in SerialLog::findTransaction (this=0xb733d210, transactionId=244) at SerialLog.cpp:822
#7  0x0863d91d in SerialLog::getTransaction (this=0xb733d210, transactionId=244) at SerialLog.cpp:835
#8  0x08628dde in SRLDropTableSpace::append (this=0xb733fb3c, tableSpace=0xb701c370, transaction=0xb704d748) at SRLDropTableSpace.cpp:48
#9  0x0859cd38 in TableSpaceManager::dropTableSpace (this=0xb6fe0030, tableSpace=0xb701c370) at TableSpaceManager.cpp:330
#10 0x0864e51d in Nfs::Statement::dropTableSpace (this=0xb7024b60, syntax=0xb70454c8) at Statement.cpp:3019
#11 0x08655cc9 in Nfs::Statement::executeDDL (this=0xb7024b60) at Statement.cpp:849
#12 0x08656c04 in Nfs::Statement::execute (this=0xb7024b60, sqlString=0xaf07ad28 "drop tablespace \"ts3\"", isQuery=false) at Statement.cpp:2705
#13 0x08656c80 in Nfs::Statement::execute (this=0xb7024b60, sqlString=0xaf07ad28 "drop tablespace \"ts3\"") at Statement.cpp:161
#14 0x0864dd69 in Nfs::Statement::executeUpdate (this=0xb7024b60, sqlString=0xaf07ad28 "drop tablespace \"ts3\"") at Statement.cpp:1082
#15 0x085858af in StorageHandler::deleteTablespace (this=0xb71df028, tableSpaceName=0x8df1af8 "ts3") at StorageHandler.cpp:549
#16 0x08572dff in StorageInterface::alter_tablespace (hton=0x8d8bf10, thd=0x8d8c9c0, ts_info=0x8df1b00) at ha_falcon.cpp:2073
#17 0x08507a83 in mysql_alter_tablespace (thd=0x8d8c9c0, ts_info=0x8df1b00) at sql_tablespace.cc:44
#18 0x08336f90 in mysql_execute_command (thd=0x8d8c9c0) at sql_parse.cc:4653
#19 0x0833776c in mysql_parse (thd=0x8d8c9c0, inBuf=0x8df1a68 "DROP TABLESPACE ts3 ENGINE=Falcon", length=33, found_semicolon=0xaf07c260) at sql_parse.cc:5799
#20 0x08338204 in dispatch_command (command=COM_QUERY, thd=0x8d8c9c0, packet=0x8de2e11 "DROP TABLESPACE ts3 ENGINE=Falcon", packet_length=33) at sql_parse.cc:1058
#21 0x083394c1 in do_command (thd=0x8d8c9c0) at sql_parse.cc:731
#22 0x0832564f in handle_one_connection (arg=0x8d8c9c0) at sql_connect.cc:1134
#23 0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#24 0xb7d8709e in clone () from /lib/libc.so.6

Thread 2 (Thread 0xaf04bb90 (LWP 26519)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7f7400b in read () from /lib/libpthread.so.0
#2  0x08858bd0 in vio_read (vio=0x8de9550, buf=0x8df8ad8 "\001", size=4) at viosocket.c:44
#3  0x0830c4f4 in my_real_read (net=0x8de7a08, complen=0xaf04b2ec) at net_serv.cc:811
#4  0x0830cae0 in my_net_read (net=0x8de7a08) at net_serv.cc:992
#5  0x08339320 in do_command (thd=0x8de7988) at sql_parse.cc:677
#6  0x0832564f in handle_one_connection (arg=0x8de7988) at sql_connect.cc:1134
#7  0xb7f6d18b in start_thread () from /lib/libpthread.so.0
#8  0xb7d8709e in clone () from /lib/libc.so.6

Thread 1 (Thread 0xb7cc46d0 (LWP 26482)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7d807d1 in select () from /lib/libc.so.6
#2  0x0831d2a0 in handle_connections_sockets (arg=0x0) at mysqld.cc:5019
#3  0x0832031e in main (argc=31, argv=0xbfaf55e4) at mysqld.cc:4527
#0  0xffffe410 in __kernel_vsyscall ()
(gdb)
[23 Jun 2008 14:51] Sergey Petrunya
The hang is very repeatable on my x86 Linux machine, I compiled with BUILD/compile-pentium-debug-max, the latest cset was: 

------------------------------------------------------------
revno: 2714
committer: Vladislav Vaintroub<vvaintroub@mysql.com>
branch nick: mysql-6.0-falcon
timestamp: Mon 2008-06-23 10:08:15 +0200
message:
  Comment FAST_SHARED. SyncObject deadlocks with it.
[24 Jun 2008 12:29] Miguel Solorzano
Thank you for the bug report.
[25 Jun 2008 22:37] Kevin Lewis
Here is the summary of the deadlock;

Thread 3

    wait for  SerialLogTransaction::pending.syncObject  type=Shared

#6  SerialLog::findTransaction at SerialLog.cpp:822
#7  SerialLog::getTransaction  at SerialLog.cpp:835
#8  SRLDropTableSpace::append at SRLDropTableSpace.cpp:48

    locked  TableSpaceManager::syncObject  type=Exclusive

#9  TableSpaceManager::dropTableSpace at TableSpaceManager.cpp:330
#10 Nfs::Statement::dropTableSpace at Statement.cpp:3019
#11 Nfs::Statement::executeDDL at Statement.cpp:849
#12 Nfs::Statement::execute at Statement.cpp:2705
#13 Nfs::Statement::execute at Statement.cpp:161
#14 Nfs::Statement::executeUpdate at Statement.cpp:1082
#15 StorageHandler::deleteTablespace at StorageHandler.cpp:549
#16 StorageInterface::alter_tablespace at ha_falcon.cpp:2073
#17 mysql_alter_tablespace at sql_tablespace.cc:44 

Thread 11

    wait for  TableSpaceManager::syncObject  type=Exclusive

#6  TableSpaceManager::expungeTableSpace at TableSpaceManager.cpp:415
#7  SRLDropTableSpace::commit at SRLDropTableSpace.cpp:78
#8  SerialLogTransaction::commit at SerialLogTransaction.cpp:92
#9  SerialLogTransaction::doAction at SerialLogTransaction.cpp:158

locked  SerialLogTransaction::pending.syncObject  type=Exclusive

#10 Gopher::gopherThread (this=0xb6fe06d0) at Gopher.cpp:77
[30 Jun 2008 15:05] Kevin Lewis
Actually...
This is a three way deadlock.
--------------------------------------
Thread 3

    wait for SerialLogTransaction::pending.syncObject type=Shared

#6  SerialLog::findTransaction at SerialLog.cpp:822
#7  SerialLog::getTransaction  at SerialLog.cpp:835
#8  SRLDropTableSpace::append at SRLDropTableSpace.cpp:48

    locked TableSpaceManager::syncObject  type=Exclusive

#9  TableSpaceManager::dropTableSpace at TableSpaceManager.cpp:330
#10 Nfs::Statement::dropTableSpace at Statement.cpp:3019
#11 Nfs::Statement::executeDDL at Statement.cpp:849
#12 Nfs::Statement::execute at Statement.cpp:2705
#13 Nfs::Statement::execute at Statement.cpp:161
#14 Nfs::Statement::executeUpdate at Statement.cpp:1082
#15 StorageHandler::deleteTablespace at StorageHandler.cpp:549
#16 StorageInterface::alter_tablespace at ha_falcon.cpp:2073
#17 mysql_alter_tablespace at sql_tablespace.cc:44 

Thread 11

    wait for  TableSpaceManager::syncObject  type=Exclusive

#6  TableSpaceManager::expungeTableSpace at TableSpaceManager.cpp:415
#7  SRLDropTableSpace::commit at SRLDropTableSpace.cpp:78
#8  SerialLogTransaction::commit at SerialLogTransaction.cpp:92
#9  SerialLogTransaction::doAction at SerialLogTransaction.cpp:158

    not locked  SerialLogTransaction::pending.syncObject  type=Exclusive

#10 Gopher::gopherThread (this=0xb6fe06d0) at Gopher.cpp:77

Thread 10 

    wait for  SerialLog::syncSerializeGophers   type=Exclusive
    locked  SerialLogTransaction::pending.syncObject  type=Exclusive

#6  0x0866f773 in Gopher::gopherThread (this=0xb6fe06a8) at Gopher.cpp:73
----------------------------------------

Thread 11 had released SerialLogTransaction::pending.syncObject  but held the SerialLog::syncSerializeGophers.  Thread 10 was still holding SerialLogTransaction::pending.syncObject  while it waited for SerialLog::syncSerializeGophers.  

The fix is simple;  Release SerialLogTransaction::pending.syncObject just before  waiting for SerialLog::syncSerializeGophers instead of just after that.
[30 Jun 2008 16:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48744

2722 Kevin Lewis	2008-06-30
      Bug#37587 - Release SerialLogTransaction::pending.syncObject just before
      waiting for SerialLog::syncSerializeGophers instead of just after that. 
      This avoids a 3 way deadlock
[30 Jun 2008 17:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48746

2723 Kevin Lewis	2008-06-30
      Cleanup code and change CRLF to LF
[7 Jul 2008 6:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49061

2733 Kevin Lewis	2008-07-07
      Bug#37587 - Fix this 3 way deadlock a different way.
      Allow the gopher to hold SerialLog::pending.syncObject while
      it locks serializeGophers, but release serializeGophers before 
      getting SerialLog::pending.syncObject back.
      Then fix TableSpaceManager::dropTableSpace so that it releases
      its syncObject before calling  dropTableSpace.append().
[8 Jul 2008 20:55] Kevin Lewis
The last patch had a problem in that it made TableSpaceManager::dropTableSpace do the SerialLog append at the end while the syncOgject was no longer held.  But it did not do a commitSystemTransaction after that.  So the file was not deleted in a tmely manner by the gopher thread.  Then the next createTableSpace for the same Table tablespace waited forever.

Once I fixed that, I tried MTR with a group of tablespace related tests, ass suggested by Vlad;  falcon_bug_29511 falcon_bug_31110 falcon_bug_31114 falcon_bug_31295 falcon_bug_31311 falcon_bug_32398 falcon_bug_33211 falcon_bug_33212 falcon_bug_33213 falcon_bug_33216 falcon_bug_33404

This ran for a while until two gopher threads locked up in a deadlock.  

One was holding serializeGophers with a shared lock and doing a SRLUpdateRecords::commit, which eventually calls SerialLog::findTransaction, which tries to lock SerialLog::pending.syncObject.  

The other gopher was holding SerialLog::pending.syncObject and waiting for an exclusive lock on serializeGophers.

So becaus a gopher thread itself can get SerialLog::pending.syncObject while it is doing an action, and it holds this lock also at its highest loop, the solution to serialize gophers by grabbing a lock while pending is still held, and holding it during the action, will not work.

I am looking again for another solution for serializing gopher threads that handle a drop tablespace.  This is really a failure of the solution to Bug#36396.
[9 Jul 2008 19:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49355

2740 Kevin Lewis	2008-07-09
      Bug#37587 - Try a new method of serializing gopher threads
      for DropTableSpace.  Instead of holding a lock during the 
      action.  Set a lock variable while holding syncPending exclusively
      and do a loop of release-sleep(10)-lock until the other gopher 
      threads are fully done with previous transactions.
      
      Also, reduce the possibility of a deadlock by not holding 
      TableSpaceManager::syncObject while calling 
      SRLDropTableSpace::append(), which locks 
      SerialLog::pending.syncObject.
[9 Jul 2008 21:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49369

2741 Kevin Lewis	2008-07-09 [merge]
      merge
[14 Jul 2008 15:51] Kevin Lewis
This patch was reviewed by Chris, Vlad and Kelly Long.  Kelly suggested that another way to serialize this kind of thing would be to focus on the Tablespace.  Every transaction that uses a table space would increment a refCount at commit and  decrement it upon completion by the gogher.  But it was noted that this solution is not necessary to implement since the current solution is acceptable.  

I would note that note every SerialLogRecord has a quick link tothe tablespace, and looking it up adds code path to the task by the gopher.  And it added lines or code to most SRL_____.cpp file.

Below is an interesting discussion that Vlad and I had about this solution and possible alternatives that Vlad thought of.

==============================================

Falconeers, 

The following is an email exchange that Vlad and I had this morning about how to fix Bug#37587.  Vlad was being very creative, looking for a deterministic, syncObject oriented solution to the problem of serializing gopher threads.  I had debugged it a bunch and had given up, opting for a looping backout method that is in a sense, the opposite of waiting for a lock on a syncObject.

The interesting part here is the discussion about how to avoid deadlocks.

Kevin

-----Original Message-----
From: Vladislav Vaintroub [mailto:vaintroub@mysql.com]
Sent: Wednesday, July 09, 2008 7:25 AM
To: 'Kevin Lewis'
Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever

Kevin,
It just occurred to me that actually, there is no necessity in running dropTablespace in exclusive mode.
More important is that all transactions that precede dropTablespace and using this tablespace file are finished, before gopher closes and deleted the file.

So the pseudo-code could look like this

	lock pending
	get the first transaction from queue
	if(transacttion is in exclusive mode)
	   lock serializeGophers(Exclusive)
	   // after acquiring this lock,  all prior transactions are finished
	   unlock serializeGophers
	   lock SerializeGophers(Shared)
	else
	   lock SerializeGophers(Shared)
	   unlock pending
	
	   transaction->doAction

	   unlock serializeGophers
	end if

What do you think of this approach? The serializeGophers then should be renamed to something that is better describes the purpose of the lock, i.e gophersSyncPoint or something like that.

-----Original Message-----
From: Kevin Lewis [mailto:klewis@mysql.com]
Sent: Wednesday, July 09, 2008 4:38 PM
To: 'Vladislav Vaintroub'
Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever

Vlad,

Your suggestion assumes that while one gopher is holding an exclusive lock on SerialLog::pending.syncObject, all other gophers will finish up their work and come back home to wait at the bottom of the while loop in Gopher::gopherThread.  But they do not.  Some of them may be stuck trying to do a SerialLog::findTransaction().  These gophers will deadlock since they have a shared lock on serializeGophers and wait on pending.syncObject, while our gopher owns pending.syncObject exclusively and waits on for an exclusive lock on serializeGophers.

We cannot lock any syncObject while holding pending.syncObject and hold it through transaction->doAction().

On the bright side, I worked out a solution last night that uses this technique in a few places.

		while (log->serializeGophers)
			{
			syncPending->unlock();
			workerThread->sleep(10);
			syncPending->lock(Exclusive);
			}

log->serializeGophers is now an int32 that is protected by exclusive ownership of syncPending.  This solution worked on the list of tablespace tests, when nothing else would.  It is not nice, since there is a short wait and those gophers effectively spin instead of wait to be signaled.  But the argument is that drop tablespace does not happen very often, so performance does not matter.

I will put together a patch on top of the current code in the next few hours for you to review.

Kevin

-----Original Message-----
From: Vladislav Vaintroub [mailto:vaintroub@mysql.com]
Sent: Wednesday, July 09, 2008 10:33 AM
To: 'Kevin Lewis'
Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever

Ok, I see your point about deadlock. There is actually point in request *exclusive* serializeGophers lock while holding pending. let me modify my original propose to avoid it 

lock pending
get the first transaction from queue
lock serializeGophers(Shared)
unlock pending

if(transaction is in exclusive mode)
 unlock serializeGophers
 lock serializeGophers(Exclusive)
 // all transaction started before this are now complete
endif

transaction->doAction()
unlock serializeGophers

This should ensure that all transactions started before exclusive are completed.
Also, there is no possibility for deadlock you describe.

What do you think about this. This could avoid " It is not nice" of your solution (I do agree it is not very nice ;)

-----Original Message-----
From: Kevin Lewis [mailto:klewis@mysql.com]
Sent: Wednesday, July 09, 2008 6:26 PM
To: 'Vladislav Vaintroub'
Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever

Vlad,

That still will not work.  It sets up the three way deadlock again;

Gopher 1	Holds exclusive lock on serializeGophers
		Calls TableSpaceManager::dropTableSpace which
		Waits for shared lock on TableSpaceManager::syncObject

Gopher 2	Holds exclusive lock on SerialLog::pending.syncObject
		Waits for shared lock on serializeGophers

Client		Holds exclusive lock on TableSpaceManager::syncObject
		Calls SRLDropTablespace::append
		Calls SerialLog::findTransaction
		Waits for shared lock on SerialLog::pending.syncObject
		
Your newest solution modifies the rule of "never locking in opposite order" by adding the qualifier, "unless it is a shared lock".  But I don't think this can be proven safe because other threads can get involved with exclusive locks.  And there will ALWAYS be exclusive locks in other threads because without exclusive locks, there is no reason to have a shared lock.

Kevin

-----Original Message-----
From: Vladislav Vaintroub [mailto:vaintroub@mysql.com]
Sent: Wednesday, July 09, 2008 12:22 PM
To: 'Kevin Lewis'
Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever

> Client		Holds exclusive lock on TableSpaceManager::syncObject
> 		Calls SRLDropTablespace::append
> 		Calls SerialLog::findTransaction
> 		Waits for shared lock on SerialLog::pending.syncObject

Is TableSpaceManager::syncObject here necessary? I think we decided it was not

----Original Message----
From: Kevin Lewis [mailto:klewis@mysql.com]
Sent: Wednesday, July 09, 2008 8:59 PM
To: 'Vladislav Vaintroub'
Subject: RE: #37587 [Com,Pnd->Prg]: falcon_bug_33404.test hangs forever
 
I have changed TableSpaceManager::dropTableSpace so that it does not
hold both of them at the same time, but that is just one of the many
possible paths that can call SerialLog::findTransaction.  If this one
path is cleaned up, what is to prevent another possible 3-way or 4-way
deadlock?  I think that leaving code paths that acquire locks in
opposite order is an invitation to deadlock in some current or future
combination of code paths.
 
I am still struggling to create this patch in bazaar.  I will get
you the patch as soon as I can.
[22 Aug 2008 21:13] Kevin Lewis
This is fixed in version 6.0.6
[8 Jan 2009 9:47] MC Brown
Internal change only. No documentation needed.