| Bug #83528 | InnoDB mutex bs: periodic waiters wakeup | ||
|---|---|---|---|
| Submitted: | 25 Oct 2016 13:06 | Modified: | 2 Dec 2016 14:58 |
| Reporter: | Sergey Vojtovich | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
| Version: | 8.0 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[25 Oct 2016 14:49]
MySQL Verification Team
Hi Sergey, Thank you very much for your bug report. Can you be more specific about the "How to repeat" section. A test case is not obligatory, but a detailed code analysis would do. Thank you in advance ....
[25 Oct 2016 15:05]
Sergey Vojtovich
Hi Sinisa, Apparently it's duplicate of https://bugs.mysql.com/bug.php?id=79477, which I missed somehow. As for additional information... The fact that InnoDB expects that threads may get stuck is documented here https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/include/ib0mutex.h#L673, right? The fact that srv_error_monitor_thread() does periodic wake up (calls sync_arr_wake_threads_if_sema_free()) can be seen here: https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/srv/srv0srv.cc#L1633 The fact that srv_error_monitor_thread() acquires mutex itself starts here: https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/srv/srv0srv.cc#L1606 just follow call chain log_get_lsn() / log_mutex_enter() / mutex_enter() Thanks for your time!
[2 Dec 2016 14:58]
MySQL Verification Team
This is a duplicate of the bug # 79477: http://bugs.mysql.com/bug.php?id=79477 I have verified that bug and provided it with proper priority.

Description: InnoDB mutex implementation is designed so that in contested scenario thread releasing a mutex may miss to awake threads waiting on the same mutex. That is threads may get stuck in waiting state forever. To workaround this broken implementation InnoDB srv_error_monitor_thread periodically awakes waiters, there's a nice comment in TTASEventMutex::exit(): /* A problem: we assume that mutex_reset_lock word is a memory barrier, that is when we read the waiters field next, the read must be serialized in memory after the reset. A speculative processor might perform the read first, which could leave a waiting thread hanging indefinitely. Our current solution call every second sync_arr_wake_threads_if_sema_free() to wake up possible hanging threads if they are missed in mutex_signal_object. */ However srv_error_monitor_thread itself does mutex lock and can get stuck in waiting state, e.g. srv_error_monitor_thread() / log_get_lsn() / log_mutex_enter() / mutex_enter(). If srv_error_monitor_thread is stuck, there's nobody to wake it up. As well as there's nobody to wake up other stuck threads anymore. This can lead to a massive server deadlock. It isn't performance wise either: threads may experience ~1 second dips for no good reason. We observed this deadlock a few times with older InnoDB versions even on x86. It pretends to be fixed by 93e6f388860490a6066cc07253a6aab94029e502, but in fact that was just another non-portable x86 specific workaround. How to repeat: Code analysis. Suggested fix: Decent mutex implementation must not need periodic waiters wakeup. In 8.0 TTASEventMutex can be fixed by combining m_waiters and m_lock_word, pretty much like TTASMutex and TTASFutexMutex do. And remove this shameful workaround: sync_arr_wake_threads_if_sema_free(), sync_arr_cell_can_wake_up(), sync_array_wake_threads_if_sema_free_low().