MySQL Bugs: #83528: InnoDB mutex bs: periodic waiters wakeup

Bug #83528	InnoDB mutex bs: periodic waiters wakeup
Submitted:	25 Oct 2016 13:06	Modified:	2 Dec 2016 14:58
Reporter:	Sergey Vojtovich	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
InnoDB mutex implementation is designed so that in contested scenario thread releasing a mutex may miss to awake threads waiting on the same mutex. That is threads may get stuck in waiting state forever.

To workaround this broken implementation InnoDB srv_error_monitor_thread periodically awakes waiters, there's a nice comment in TTASEventMutex::exit():

/* A problem: we assume that mutex_reset_lock word
is a memory barrier, that is when we read the waiters
field next, the read must be serialized in memory
after the reset. A speculative processor might
perform the read first, which could leave a waiting
thread hanging indefinitely.

Our current solution call every second
sync_arr_wake_threads_if_sema_free()
to wake up possible hanging threads if they are missed
in mutex_signal_object. */

However srv_error_monitor_thread itself does mutex lock and can get stuck in waiting state, e.g. srv_error_monitor_thread() / log_get_lsn() / log_mutex_enter() / mutex_enter(). If srv_error_monitor_thread is stuck, there's nobody to wake it up. As well as there's nobody to wake up other stuck threads anymore. This can lead to a massive server deadlock.

It isn't performance wise either: threads may experience ~1 second dips for no good reason.

We observed this deadlock a few times with older InnoDB versions even on x86. It pretends to be fixed by 93e6f388860490a6066cc07253a6aab94029e502, but in fact that was just another non-portable x86 specific workaround.

How to repeat:
Code analysis.

Suggested fix:
Decent mutex implementation must not need periodic waiters wakeup.

In 8.0 TTASEventMutex can be fixed by combining m_waiters and m_lock_word, pretty much like TTASMutex and TTASFutexMutex do. And remove this shameful workaround: sync_arr_wake_threads_if_sema_free(), sync_arr_cell_can_wake_up(), sync_array_wake_threads_if_sema_free_low().

Hi Sergey,

Thank you very much for your bug report.

Can you be more specific about the "How to repeat" section. A test case is not obligatory, but a detailed code analysis would do.

Thank you in advance ....

Hi Sinisa,

Apparently it's duplicate of https://bugs.mysql.com/bug.php?id=79477, which I missed somehow.

As for additional information...

The fact that InnoDB expects that threads may get stuck is documented here https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/include/ib0mutex.h#L673, right?

The fact that srv_error_monitor_thread() does periodic wake up (calls sync_arr_wake_threads_if_sema_free()) can be seen here: https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/srv/srv0srv.cc#L1633

The fact that srv_error_monitor_thread() acquires mutex itself starts here: https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/srv/srv0srv.cc#L1606 just follow call chain log_get_lsn() / log_mutex_enter() / mutex_enter()

Thanks for your time!

This is a duplicate of the bug # 79477:

http://bugs.mysql.com/bug.php?id=79477

I have verified that bug and provided it with proper priority.