Description:
Briefly, thread srv_error_monitor_thread is waiting for the lock srv_innodb_monitor_mutex. But the lock is not held by any thread.
The reason I said that none of threads held the lock srv_innodb_monitor_mutex is that you can see from the stack that the _owner=0
I analyzed the functions used for locking srv_innodb_monitor_mutex. Three functions use this lock, srv_refresh_innodb_monitor_stats,
srv_export_innodb_status, and srv_printf_innodb_monitor. They all released the lock well.
Therefore, I think the lock problem may be caused by memoryBarrier.
The thread srv_error_monitor_thread handles the memoryBarrier problem of locks, but the thread itself gets stuck before it can handle the problem.
Below is the stack information
------------------------------------------
(gdb) thr 2648
[Switching to thread 2648 (Thread 0x7f2ef7bff700 (LWP 20799))]
#0 0x00007f302d3d1975 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f302d3d1975 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000000000110ba11 in wait (this=<optimized out>) at /data//storage/innobase/os/os0event.cc:171
#2 os_event::wait_low (this=0x7f302b778478, reset_sig_count=<optimized out>) at /data//storage/innobase/os/os0event.cc:358
#3 0x000000000110c35a in os_event_wait_low (event=<optimized out>, reset_sig_count=<optimized out>) at /data//storage/innobase/os/os0event.cc:603
#4 0x00000000011ceb07 in sync_array_wait_event (arr=arr@entry=0x7f302b6a8c18, cell=@0x7f2ef7bfd3f0: 0x7f30295ee458) at /data//storage/innobase/sync/sync0arr.cc:475
#5 0x00000000011b82dc in TTASEventMutex<GenericPolicy>::wait (this=this@entry=0x22b8e80 <srv_innodb_monitor_mutex>, filename=filename@entry=0x1907908 "/data//storage/innobase/srv/srv0srv.cc",
line=line@entry=1171, spin=4) at /data//storage/innobase/include/ut0mutex.ic:89
#6 0x00000000011bb94a in spin_and_try_lock (line=1171, filename=0x1907908 "/data//storage/innobase/srv/srv0srv.cc", max_delay=<optimized out>, max_spins=60,
this=0x22b8e80 <srv_innodb_monitor_mutex>) at /data//storage/innobase/include/ib0mutex.h:850
#7 enter (line=1171, filename=0x1907908 "/data//storage/innobase/srv/srv0srv.cc", max_delay=<optimized out>, max_spins=<optimized out>, this=0x22b8e80 <srv_innodb_monitor_mutex>)
at /data//storage/innobase/include/ib0mutex.h:707
#8 enter (name=0x1907908 "/data//storage/innobase/srv/srv0srv.cc", line=1171, n_delay=<optimized out>, n_spins=<optimized out>, this=0x22b8e80 <srv_innodb_monitor_mutex>)
at /data//storage/innobase/include/ib0mutex.h:987
#9 srv_refresh_innodb_monitor_stats () at /data//storage/innobase/srv/srv0srv.cc:1171
#10 srv_error_monitor_thread (arg=<optimized out>) at /data//storage/innobase/srv/srv0srv.cc:1758
#11 0x00007f302d3cddf5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f302be8648d in clone () from /lib64/libc.so.6
(gdb) f 6
#6 0x00000000011bb94a in spin_and_try_lock (line=1171, filename=0x1907908 "/data//storage/innobase/srv/srv0srv.cc", max_delay=<optimized out>, max_spins=60,
this=0x22b8e80 <srv_innodb_monitor_mutex>) at /data//storage/innobase/include/ib0mutex.h:850
850 /data//storage/innobase/include/ib0mutex.h: No such file or directory.
(gdb) p *this
$2 = {m_lock_word = 1, m_waiters = 1, m_event = 0x7f302b778478, m_policy = {m_count = {m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}, m_id = LATCH_ID_SRV_INNODB_MONITOR}}
(gdb) p *this->m_event
$3 = {m_set = false, signal_count = 1, mutex = {m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 3, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = '\000' <repeats 12 times>, "\003", '\000' <repeats 26 times>, __align = 0}}, cond_var = {__data = {__lock = 0, __futex = 3, __total_seq = 3, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x7f302b778488, __nwaiters = 7,
__broadcast_seq = 0}, __size = "\000\000\000\000\003\000\000\000\003", '\000' <repeats 23 times>, "\210\204w+0\177\000\000\a\000\000\000\000\000\000", __align = 12884901888}, static cond_attr = {__size = "\002\000\000",
__align = 2}, static cond_attr_has_monotonic_clock = true, Python Exception <type 'exceptions.ValueError'> Cannot find type event_iter_t::_Node:
static global_initialized = true, event_iter = }
==================================
I reported a bug, but maybe he didn't understand it. Bug #79185
The community has fixed similar bugs before. Bug #99219
How to repeat:
The problem is difficult to reproduce. But from the stack and the code analysis, it felt like something was wrong.
Suggested fix:
A simple way to fix it.
--- a/storage/innobase/srv/srv0srv.cc
+++ b/storage/innobase/srv/srv0srv.cc
@@ -1176,7 +1176,8 @@ void
srv_refresh_innodb_monitor_stats(void)
/*==================================*/
{
- mutex_enter(&srv_innodb_monitor_mutex);
+ if(mutex_enter_nowait(&srv_innodb_monitor_mutex))
+ return ;
srv_last_monitor_time = ut_time_monotonic();