| Bug #47814 | Diagnostics are frequently not printed after a long lock wait in InnoDB | ||
|---|---|---|---|
| Submitted: | 4 Oct 20:33 | Modified: | 30 Oct 4:43 |
| Reporter: | Mark Callaghan | ||
| Status: | In review | ||
| Category: | Server: InnoDB | Severity: | S2 (Serious) |
| Version: | 5.0.44 | OS: | Linux |
| Assigned to: | Jimmy Yang | Target Version: | 5.0+ |
| Tags: | innodb, long, Semaphore, lock, wait | ||
| Triage: | Triaged: D2 (Serious) | ||
[5 Oct 19:41]
Harrison Fisk
This can occur when InnoDB is hanging on the kernel_mutex. sync_array_print_long_waits() signals the background lock timeout and monitor thread to output the show innodb status info by doing: srv_print_innodb_monitor = TRUE; os_event_set(srv_lock_timeout_thread_event); However, the srv_lock_timeout_and_monitor_thread does two things: 1. Outputs show innodb status as requested (such as by sync_array_print_long_waits) 2. Checks for innodb_lock_wait_timeout While doing #2 it attempts to acquire the kernel_mutex. When the kernel_mutex is the one causing the hang, then it will never make it back to #1 and hence will not output the requested information. One way to solve this would be to split these two uses into two threads. Another way would be to allow the kernel_mutex acquisition required for #2 to timeout after some set time limit.
[14 Oct 12:36]
Jimmy Yang
Here is a repro for this problem:
0) mysql> create table test (a int) engine = innodb;
Query OK, 0 rows affected (0.01 sec)
1) Start two sessions, in session one, start a transaction, and insert a row:
mysql> start transaction;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into test values(9);
Query OK, 1 row affected (0.00 sec)
2) Attach debugger, put break point in srv_suspend_mysql_thread() right after
mutex_enter(&kernel_mutex);
3) In session 2, call delete from table test, this will hang (lock conflict):
mysql> delete from test;
4) And after a while, it will get to the break point you set in step 2, when it handles
DB_LOCK_WAIT error, stack is:
srv_suspend_mysql_thread
row_mysql_handle_errors (DB_LOCK_WAIT )
row_search_for_mysql
ha_innobase::index_read
5) Create a function of yourself, and let it keep looping in os_thread_yield() (this
might need to be done first and compiled into server)
void
debug_loop()
{
int set = 1;
while (set)
{
os_thread_yield();
}
}
6) In the debugger, when you break right after the mutex_enter(), call this function:
(gdb) call debug_loop()
7) Now this thread is hold the kernel_mutex and looping(), we artificially create
extended holding of mutex.
8) The srv_lock_timeout_and_monitor_thread() is invoked by the background thread, and it
will find kernel_mutex is not available to it:
srv_lock_timeout_and_monitor_thread()
{
...
mutex_enter(&kernel_mutex); <===
...
}
As a result, sync_array_print_long_waits() will be called, and again, it will wait 30
seconds for srv_lock_timeout_and_monitor_thread() to print monitor information. However,
nothing will get printed because srv_lock_timeout_and_monitor_thread() is stuck (by the
mutex). So you will get the exact output described by this bug report:
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Warning: a long semaphore wait:
--Thread 1152145728 has waited at srv/srv0srv.c line 1992 for 614.00 seconds the
semaphore:
Mutex at 0x2aaaaad732b8 created file srv/srv0srv.c line 875, lock var 1
waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
So we created a artificial situation for the case. In the real time, it would more likely
be a race condition.
Solutions are being evaluated...
Thanks
Jimmy
[22 Oct 12:24]
Jimmy Yang
Additional note on this bug. The monitor printing operation also requires the kernel_mutex in following call path: srv_printf_innodb_monitor() -> lock_print_info_summary() -> lock_mutex_enter_kernel() It is possible, the thread gets blocked there before the request (mutex_enter(&kernel_mutex)) in srv_lock_timeout_and_monitor_thread() We are testing approaches built on top of mutex_enter_nowait() for a limited wait and skip the operation if we cannot gain the lock for this trial. Thanks Jimmy

Description: Code in innobase/sync/sync0arr.c attempts to print diagnostics when a long semaphore/lock wait is detected. Frequently, nothing is printed in that case. I frequently see this: InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info: InnoDB: Pending preads 0, pwrites 0 InnoDB: ###### Diagnostic info printed to the standard error stream And from the code below I expect much more to be printed between the 'InnoDB: #####' lines. The code is: if (noticed) { fprintf(stderr, "InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:\n"); old_val = srv_print_innodb_monitor; /* If some crucial semaphore is reserved, then also the InnoDB Monitor can hang, and we do not get diagnostics. Since in many cases an InnoDB hang is caused by a pwrite() or a pread() call hanging inside the operating system, let us print right now the values of pending calls of these. */ fprintf(stderr, "InnoDB: Pending preads %lu, pwrites %lu\n", (ulong)os_file_n_pending_preads, (ulong)os_file_n_pending_pwrites); srv_print_innodb_monitor = TRUE; os_event_set(srv_lock_timeout_thread_event); os_thread_sleep(30000000); srv_print_innodb_monitor = old_val; fprintf(stderr, "InnoDB: ###### Diagnostic info printed to the standard error stream\n"); How to repeat: Make a long semaphore wait occur in InnoDB Suggested fix: I don't know.