Bug #40981 Maria: deadlock between checkpoint and maria_close() when freeing state info
Submitted: 24 Nov 2008 15:11 Modified: 9 Jan 2009 15:13
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Maria storage engine Severity:S3 (Non-critical)
Version:5.1-maria OS:Any
Assigned to: Oleksandr Byelkin CPU Architecture:Any

[24 Nov 2008 15:11] Guilhem Bichot
Description:
5.1-maria, Windows. Consider those two threads:

1) checkpoint thread:

   8  Id: e70.6f4 Suspend: 1 Teb: 7ffd7000 Unfrozen
ChildEBP RetAddr
03e3fd50 7c827d0b ntdll!KiFastSystemCallRet
03e3fd54 7c83d236 ntdll!NtWaitForSingleObject+0xc
03e3fd90 7c83d281 ntdll!RtlpWaitOnCriticalSection+0x1a3
03e3fdb0 0073f1d8 ntdll!RtlEnterCriticalSection+0xa8
03e3fdc0 0074a2ab mysqld!_ma_remove_not_visible_states_with_lock+0x18 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\storage\maria\ma_state.c @ 197]
03e3fe4c 0074a4ee mysqld!collect_tables+0x50b [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\storage\maria\ma_checkpoint.c @ 1062]
03e3fef0 0074a779 mysqld!really_execute_checkpoint+0x9e [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\storage\maria\ma_checkpoint.c @ 196]
03e3ff00 0074a978 mysqld!ma_checkpoint_execute+0x69 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\storage\maria\ma_checkpoint.c @ 133]
03e3ff60 0063d09b mysqld!ma_checkpoint_background+0x1d8 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\storage\maria\ma_checkpoint.c @ 618]
03e3ff78 007769b9 mysqld!pthread_start+0x3b [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\mysys\my_winthread.c @ 86]

2) a thread which is closing a table:

  19  Id: e70.da8 Suspend: 1 Teb: 7ffa7000 Unfrozen
ChildEBP RetAddr
0499f668 7c827d0b ntdll!KiFastSystemCallRet
0499f66c 7c83d236 ntdll!NtWaitForSingleObject+0xc
0499f6a8 7c83d281 ntdll!RtlpWaitOnCriticalSection+0x1a3
0499f6c8 00748e14 ntdll!RtlEnterCriticalSection+0xa8
0499f6d4 0073f14b mysqld!trnman_exists_active_transactions+0x14 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\storage\maria\trnman.c @ 877]
0499f700 0072f22f mysqld!_ma_remove_not_visible_states+0x3b [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\storage\maria\ma_state.c @ 156]
0499f728 0071db86 mysqld!maria_close+0x25f [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\storage\maria\ma_close.c @ 142]
0499f730 005b63cf mysqld!ha_maria::close+0x16 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\storage\maria\ha_maria.cc @ 946]
0499f740 0051f3b8 mysqld!closefrm+0x1f [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\sql\table.cc @ 1949]
0499f750 0051f3db mysqld!intern_close_table+0x28 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\sql\sql_base.cc @ 779]
0499f75c 006347f0 mysqld!free_cache_entry+0xb [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\sql\sql_base.cc @ 801]
0499f788 0051f565 mysqld!my_hash_delete+0x230 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\mysys\hash.c @ 530]
0499f79c 00520d9f mysqld!close_thread_table+0xa5 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\sql\sql_base.cc @ 1365]
0499f7b0 00520ee7 mysqld!close_open_tables+0x1f [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\sql\sql_base.cc @ 1174]
0499f7c0 00555d95 mysqld!close_thread_tables+0xc7 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\sql\sql_base.cc @ 1328]
0499ff1c 00556386 mysqld!dispatch_command+0xc15 [g:\pb2\build\sb_0-143975-1227514509.47\mysql-5.1.31-maria-alpha-win-x86\sql\sql_parse.cc @ 1593]

Thread1 is in ma_state.c:197: it owns LOCK_trn_list (has called trnman_lock()) and is blocked on getting share->intern_lock (for some reason, Windows debugger prints the number of the line just after where thread is blocked).
Thread2 is in trnman.c:877: it wants LOCK_trn_list, so blocks on thread1. It however owns share->intern_lock as it is inside maria_close() (see in stack trace ma_close.c:142), so blocks thread1. So it's a deadlock.

How to repeat:
Seen today in pushbuild2 (see URL in private post). Test maria_bulk_insert, machine "win x86 max".

Suggested fix:
decide on the proper order of share->intern_lock and LOCK_trn_list.
[27 Nov 2008 8:57] Guilhem Bichot
Sanja, see email I sent.
[5 Dec 2008 8:37] Guilhem Bichot
In the latest 6.0-maria, test maria-lock.test shows this bug (just look at the warnings in pushbuild:
master.err: maria.maria-lock: safe_mutex: Found wrong usage of mutex 'share->intern_lock' and 'LOCK_trn_list'
from the mutex deadlock detector).
[15 Dec 2008 10:08] Bugs System
Pushed into 6.0.9-alpha  (revid:sanja@sun.com-20081208200959-qe6jl5ll85jnw2do) (version source revid:guilhem@mysql.com-20081213204800-0nubni3t4ihn4hv9) (pib:5)
[9 Jan 2009 15:13] MC Brown
Internal only. No documentation needed.