MySQL Bugs: #60679: Provide deadlock free way to dump mysql state during deadlock

Bug #60679	Provide deadlock free way to dump mysql state during deadlock
Submitted:	28 Mar 2011 16:34	Modified:	25 Dec 2011 19:50
Reporter:	Mark Callaghan	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S4 (Feature request)
Version:	5.1.47	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	deadlock, Monitoring

Description:
InnoDB is kind enough to detect many deadlocks and crash. Deadlock above InnoDB is usually not noticed. So my first feature request is for mysqld to notice when no progress is made because of deadlock on pthread mutex and pthread rw-lock and other locks and then do something useful.

My next feature request is to prevent INNODB SHOW STATUS from hanging on thd_security_context calls. This is an example call stack of such a hang.
  __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,thd_security_context,innobase_mysql_print_thd,lock_print_info_all_transactions,srv_printf_innodb_transaction,innodb_show
_status,innobase_show_status,ha_show_status,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone

Code from thd_security_context. I think it would be better to use pthread_mutex_trylock and avoid waits, or at least provide such an option to each caller. 

  pthread_mutex_lock(&thd->LOCK_thd_data);

  if (thd->query())
  {
    if (max_query_len < 1)
      len= thd->query_length();
    else
      len= min(thd->query_length(), max_query_len);
    str.append('\n');
    str.append(thd->query(), len);
  }

  pthread_mutex_unlock(&thd->LOCK_thd_data);

--------

srv_error_monitor_thread can hang on the call to srv_refresh_innodb_monitor_stats because srv_innodb_monitor_mutex is locked by the thread running the SHOW INNODB STATUS calls. So maybe nothing should be fixed here is you fix SHOW INNODB STATUS to not hang.

--------

Finally, there are a few well known mutexes for which state can easily be dumped without relying on all of the other routines (kernel_mutex, buffer pool mutex). Right now I am looking at poor-mans profiler output to determine who holds kernel_mutex from a hung server. 

How to repeat:
hang server, try and debug it

Suggested fix:
make life easier for people supporting innodb

Should a hang (deadlock) in a single component bring down the entire server?

If the stuck threads have any of the big kernel locks in mysqld (LOCK_open, kernel_mutex, ...) then no more work will get done.

Quite a bit of work can be done without LOCK_open on 5.5, probably even more in the future. Another aspect is the case of user induced deadlocks (GET_LOCK, etc). It's not clear cut.

I personally think it would be more interesting to have some kind of watchdog to detect hangs. We could periodically generate a per-thread interruption (e.g. timer signal) whose handler checks if the thread is doing any useful work. If the handler detects that the thread is stuck in some call stack, it can assume the thread is hung and generate a trap [*].

Anyway, I think the core problem is that we don't do a good job at dumping the state of the server. We somewhat fully rely on core files, which has its own set of problems. The server should be able to generate a mini dump while or without crashing the server so that we can have a stacktrace of all threads, which locks were being held, etc.

* Trigger a user configurable action.

It is clear cut to me. GET_LOCK is not different than an InnoDB row lock. Both are dependent on users. The innodb watchdog thread is not confused by them and does not crash the server because of them.

With respect to "the server should", there is no debugging information that lists lock owners/waiters provided by MySQL. InnoDB has had this for a long time.

With respect to the use of core files one must realize that core files are going away at least until someone figures out how to exclude the InnoDB buffer pool from them. Most people do not allow many or any 100G core files in production. There are also platform specific issues on RHEL/CentOS that prevent the generation of core files because of user-switching done by mysql init scripts.

InnoDB has deadlock detection that works without getting confused by long user-dependent waits on row locks. It detects hangs on internal mutexes and rw-locks over which users have no control. MySQL should have the same.

For the record I spent ~6 hours debugging http://bugs.mysql.com/bug.php?id=60682. I think MySQL can do more to shorten the debugging effort the next time I look at a deadlock.

These are from unmodified MySQL 5.1.47

Stack for thread stuck doing KILL

Thread 2 (Thread 1230346560 (LWP 28363)):
#0  0x00000031ef20c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00000031ef208911 in _L_mutex_lock_1133 () from /lib64/libpthread.so.0
#2  0x00000031ef20886a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000908301 in safe_mutex_lock (mp=0xebc780, try_lock=0 '\0', file=0x9fa378 "sql_class.cc", line=1154) at thr_mutex.c:149
#4  0x0000000000619ecf in THD::awake (this=0x3e686d0, state_to_set=THD::KILL_QUERY) at sql_class.cc:1154
#5  0x000000000063d7bf in kill_one_thread (thd=0x3eae310, id=3, only_kill_query=true) at sql_parse.cc:7038
#6  0x000000000063d866 in sql_kill (thd=0x3eae310, id=3, only_kill_query=true) at sql_parse.cc:7063
#7  0x0000000000649561 in mysql_execute_command (thd=0x3eae310) at sql_parse.cc:4031
#8  0x000000000064c899 in mysql_parse (thd=0x3eae310, inBuf=0x3e682c0 "KILL QUERY 3", length=12, found_semicolon=0x49558ee0) at sql_parse.cc:5986
#9  0x000000000064e155 in dispatch_command (command=COM_QUERY, thd=0x3eae310, packet=0x3eb0db1 "KILL QUERY 3", packet_length=12) at sql_parse.cc:1233
#10 0x000000000064f5b1 in do_command (thd=0x3eae310) at sql_parse.cc:874
#11 0x000000000063b89b in handle_one_connection (arg=0x3eae310) at sql_connect.cc:1134
#12 0x00000031ef2062f7 in start_thread () from /lib64/libpthread.so.0
#13 0x00000031ee6d1e3d in clone () from /lib64/libc.so.6

stack for thread running DROP TABLE

Thread 3 (Thread 1230080320 (LWP 28362)):
#0  0x00000031ef20a496 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000908674 in safe_cond_wait (cond=0x32d7648, mp=0x32d75d0, file=0x2aaaaadefa30 "os/os0sync.c", line=399) at thr_mutex.c:237
#2  0x00002aaaaad50235 in os_event_wait_low (event=0x32d75d0, reset_sig_count=1) at os/os0sync.c:399
#3  0x00002aaaaadb461a in sync_array_wait_event (arr=0x32d6920, index=2) at sync/sync0arr.c:451
#4  0x00002aaaaadb6777 in mutex_spin_wait (mutex=0x32d7588, file_name=0x2aaaaae000e8 "trx/trx0trx.c", line=200) at sync/sync0sync.c:609
#5  0x00002aaaaadcae46 in mutex_enter_func (mutex=0x32d7588, file_name=0x2aaaaae000e8 "trx/trx0trx.c", line=200) at ./include/sync0sync.ic:221
#6  0x00002aaaaadcad04 in trx_allocate_for_mysql () at trx/trx0trx.c:200
#7  0x00002aaaaad1dc14 in innobase_trx_allocate (thd=0x3e7dc50) at handler/ha_innodb.cc:1314
#8  0x00002aaaaad22cae in ha_innodb::delete_table (this=0x3e85370, name=0x49516090 "./test/t1") at handler/ha_innodb.cc:6906
#9  0x00000000007816fd in handler::ha_delete_table (this=0x3e85370, name=0x49516090 "./test/t1") at handler.cc:3373
#10 0x00000000007876f3 in ha_delete_table (thd=0x3e7dc50, table_type=0x32d9eb0, path=0x49516090 "./test/t1", db=0x3e85a60 "test", alias=0x3e8c910 "t1", generate_warning=true) at handler.cc
:1996
#11 0x00000000007b27cc in mysql_rm_table_part2 (thd=0x3e7dc50, tables=0x3ec1310, if_exists=false, drop_temporary=false, drop_view=false, dont_log_query=false) at sql_table.cc:2059
#12 0x00000000007b2dc6 in mysql_rm_table (thd=0x3e7dc50, tables=0x3ec1310, if_exists=0 '\0', drop_temporary=0 '\0') at sql_table.cc:1833
#13 0x0000000000647318 in mysql_execute_command (thd=0x3e7dc50) at sql_parse.cc:3400
#14 0x000000000064c899 in mysql_parse (thd=0x3e7dc50, inBuf=0x3e81d40 "drop table t1", length=13, found_semicolon=0x49517ee0) at sql_parse.cc:5986
#15 0x000000000064e155 in dispatch_command (command=COM_QUERY, thd=0x3e7dc50, packet=0x3ea62f1 "drop table t1", packet_length=13) at sql_parse.cc:1233
#16 0x000000000064f5b1 in do_command (thd=0x3e7dc50) at sql_parse.cc:874
#17 0x000000000063b89b in handle_one_connection (arg=0x3e7dc50) at sql_connect.cc:1134

stack for SHOW INNODB STATUS

Thread 4 (Thread 1229814080 (LWP 28361)):
#0  0x00000031ef20c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00000031ef208911 in _L_mutex_lock_1133 () from /lib64/libpthread.so.0
#2  0x00000031ef20886a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000908301 in safe_mutex_lock (mp=0x3e69440, try_lock=0 '\0', file=0x9fa378 "sql_class.cc", line=413) at thr_mutex.c:149
#4  0x000000000061c350 in thd_security_context (thd=0x3e686d0, buffer=0x494d4800 "MySQL thread id 3, query id 17 localhost 127.0.0.1 root Waiting for table", length=1024, max_query_len=600
) at sql_class.cc:413
#5  0x00002aaaaad21583 in innobase_mysql_print_thd (f=0x3885da0, thd=0x3e686d0, max_query_len=600) at handler/ha_innodb.cc:894
#6  0x00002aaaaadce313 in trx_print (f=0x3885da0, trx=0x3ebd818, max_query_len=600) at trx/trx0trx.c:1747
#7  0x00002aaaaad393cb in lock_print_info_all_transactions (file=0x3885da0) at lock/lock0lock.c:4462
#8  0x00002aaaaadae1a4 in srv_printf_innodb_monitor (file=0x3885da0, nowait=0, trx_start=0x494d5310, trx_end=0x494d5308) at srv/srv0srv.c:1761
#9  0x00002aaaaad1de0d in innodb_show_status (hton=0x32d9eb0, thd=0x3e73190, stat_print=0x77f52a <stat_print>) at handler/ha_innodb.cc:8671
#10 0x00002aaaaad1e104 in innobase_show_status (hton=0x32d9eb0, thd=0x3e73190, stat_print=0x77f52a <stat_print>, stat_type=HA_ENGINE_STATUS) at handler/ha_innodb.cc:8914
#11 0x0000000000780f12 in ha_show_status (thd=0x3e73190, db_type=0x32d9eb0, stat=HA_ENGINE_STATUS) at handler.cc:4448
#12 0x00000000006446a4 in mysql_execute_command (thd=0x3e73190) at sql_parse.cc:2482
#13 0x000000000064c899 in mysql_parse (thd=0x3e73190, inBuf=0x3ec1660 "SHOW INNODB STATUS", length=18, found_semicolon=0x494d6ee0) at sql_parse.cc:5986
#14 0x000000000064e155 in dispatch_command (command=COM_QUERY, thd=0x3e73190, packet=0x3e75c31 "", packet_length=18) at sql_parse.cc:1233
#15 0x000000000064f5b1 in do_command (thd=0x3e73190) at sql_parse.cc:874
#16 0x000000000063b89b in handle_one_connection (arg=0x3e73190) at sql_connect.cc:1134

Stack for SELECT

Thread 5 (Thread 1229547840 (LWP 28360)):
#0  0x00000031ef20c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00000031ef208911 in _L_mutex_lock_1133 () from /lib64/libpthread.so.0
#2  0x00000031ef20886a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000908301 in safe_mutex_lock (mp=0x3e66b88, try_lock=0 '\0', file=0xa5d235 "sql_base.cc", line=2195) at thr_mutex.c:149
#4  0x0000000000694f49 in wait_for_condition (thd=0x3e686d0, mutex=0xebc780, cond=0xebd280) at sql_base.cc:2195
#5  0x000000000069b324 in open_table (thd=0x3e686d0, table_list=0x3ec0ef0, mem_root=0x49493e30, refresh=0x49493ebf, flags=0) at sql_base.cc:2847
#6  0x000000000069c751 in open_tables (thd=0x3e686d0, start=0x49493f30, counter=0x49493f74, flags=0) at sql_base.cc:4605
#7  0x000000000069d0db in open_and_lock_tables_derived (thd=0x3e686d0, tables=0x3ec0ef0, derived=true) at sql_base.cc:5023
#8  0x0000000000650802 in open_and_lock_tables (thd=0x3e686d0, tables=0x3ec0ef0) at mysql_priv.h:1564
#9  0x0000000000642ee3 in execute_sqlcom_select (thd=0x3e686d0, all_tables=0x3ec0ef0) at sql_parse.cc:5032
#10 0x0000000000643dae in mysql_execute_command (thd=0x3e686d0) at sql_parse.cc:2263
#11 0x000000000064c899 in mysql_parse (thd=0x3e686d0, inBuf=0x3e66a90 "select i from t2", length=16, found_semicolon=0x49495ee0) at sql_parse.cc:5986
#12 0x000000000064e155 in dispatch_command (command=COM_QUERY, thd=0x3e686d0, packet=0x3e6b171 "select i from t2", packet_length=16) at sql_parse.cc:1233
#13 0x000000000064f5b1 in do_command (thd=0x3e686d0) at sql_parse.cc:874
#14 0x000000000063b89b in handle_one_connection (arg=0x3e686d0) at sql_connect.cc:1134

There was another problem that shows up in a variation of this deadlock:

connection doing row_drop_table_for_mysql
-- has dictionary lock
-- waiting for kernel_mutex

connection doing SELECT from renamed table (without deadlock statement should error)
-- has LOCK_open
-- waiting for dictionary lock in dict_table_get

connection doing SHOW INNODB STATUS
-- has kernel_mutex
-- waiting for LOCK_thd_data in thd_security_context

connection doing KILL tid
-- has LOCK_thd_data
-- waiting for LOCK_open

purge_thread
-- waiting in row_mysql_free_data_dictionary

pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,row_mysql_freeze_data_dictionary_func,row_purge_step,que_run_threads,trx_purge,srv_master_thread,start_thread,clone
-- waiting on rw_lock_s_lock_func(&dict_operation_lock, ...

pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,mutex_spin_wait,dict_table_get,ha_innobase::open_internal,handler::ha_open,open_table_from_share,open_unireg_entry,open_table,open_tables,open_and_lock_tables_derived,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone
-- waiting on dict->sys_mutex

__lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,thd_security_context,innobase_mysql_print_thd,lock_print_info_all_transactions,srv_printf_innodb_transaction,innodb_show_status,innobase_show_status,ha_show_status,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone

__lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,THD::awake,kill_one_thread,sql_kill,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone

The last 2 updates don't belong in this bug

> It is clear cut to me. GET_LOCK is not different than an InnoDB row lock. Both are
> dependent on users. The innodb watchdog thread is not confused by them and
> does not crash the server because of them.

OK. Perhaps you think that InnoDB's behavior is good. I disagree.

> With respect to the use of core files one must realize that core files are going away at
> least until someone figures out how to exclude the InnoDB buffer pool from them.

Perhaps Facebook can contribute a patch implementing MAP_NOCORE to the Linux kernel, or augment coredump_filter.

InnoDB handling being good or bad depends on context and what you need. For places with excellent failover capability a complete server kill and loss of service can be best because it allows a flap-free signal that failover is required.

Worth trying to kill all possibly involved threads successively to see if the problem can be cleared that way first, perhaps - generally better to try that first IMO if the server architecture had a way for a storage engine to tell the MySQL layer to kill a thread.

It's not perfect but there has been one nice result from the InnoDB handling: almost every trip of the watchdog thread is now a false alarm and it's been modified to make those less likely.* Too dramatic but it was very effective at getting things looked at and fixed.

* http://lists.mysql.com/commits/134235 likely to be in next 5.1/5.5/+ builds.

Hi Mark.

"
With respect to "the server should", there is no debugging information that lists lock
owners/waiters provided by MySQL. InnoDB has had this for a long time.
"

I have to disagree here, this is true only for 5.1

Starting with the performance schema in 5.5:
- TABLE performance_schema.mutex_instances shows which thread hold a lock on a mutex.
- TABLE performance_schema.events_waits_current shows which mutex (rwlock, condition) a thread is waiting for.

When critical locks such as LOCK_open / LOCK_thread_count are still free,
and even if there are some deadlocks within innob, a query should still be able to connect and execute a select on the performance_schema tables, and give results. This is because very few locks are needed to get to that code.

What is probably missing, for cases when the server is badly deadlocked (LOCK_open / LOCK_thread_count), is to possibly dump the same information from a kill -15, to see in a dump file the current lock and waits.

Starting with mysql-trunk (5.6), table locks are also instrumented, so waits to get a table lock will also be visible in table performance_schema.events_waits_current.

While this data is not complete, and more can be instrumented and exposed, this is a huge step forward already compared to 5.1

Independently of the lock data exposed, detecting dead locks is only a partial solution. A better way is to prevent deadlocks in the first place, if possible, by enforcing less inter dependencies on different locks, and by enforcing a lock free design, with for example a documented and enforced lock order.

See for example
http://forge.mysql.com/worklog/task.php?id=3262
and in particular option 4.

Regards,
-- Marc

Marc,

Note that only true for 5.1 is only true for <= 5.1 which means a huge majority of MySQL in production.

Is P_S.mutex_instances always enabled including when you run benchmarks? If not then it isn't a solution.