Bug #79477 Remove sync_arr_wake_threads_if_sema_free hack
Submitted: 1 Dec 2015 10:36 Modified: 2 Dec 2016 14:51
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: innodb, mutex, rwlock

[1 Dec 2015 10:36] Laurynas Biveinis
Description:
InnoDB error monitor thread has code to wake up missed mutex/rwlock waiters. This code is commented to be necessary in case of lock word reset not being ordered with waiters flag read properly:

		/* A problem: we assume that mutex_reset_lock word
		is a memory barrier, that is when we read the waiters
		field next, the read must be serialized in memory
		after the reset. A speculative processor might
		perform the read first, which could leave a waiting
		thread hanging indefinitely. */

This code should go away if event-based mutex stays, because:
- all the barriers can be implemented properly using proper means such as compiler intrincs;
- in case the scenario being protected against happens and this thread saves the day, it's still possibly a ~0.5 second stall;
- the scenario might happen when the thread is not running (--innodb-read-only, server startup);
- the scenario might happen on e.g. log sys mutex, and the error monitor thread itself will block on that mutex at the start of the loop (BTW that code for checking that LSN is not going backwards could be removed as well);
- less code.

How to repeat:
Code analysis

Suggested fix:
Remove sync_arr_wake_threads_if_sema_free(), sync_arr_cell_can_wake_up(), sync_array_wake_threads_if_sema_free_low(). Review that the mutex lock_word / waiters are properly synchronised by barriers.
[31 Mar 2016 13:51] Inaam Rana
+1

If we think it is too risky to remove the code we can add an option where semaphore checking can be disabled or where error monitor will spit out a big fat error if it catches a thread napping (though there might be a race here).
[31 Mar 2016 14:19] Laurynas Biveinis
Since 5.7 is not likely to receive this change, if I were to decide, I'd do it in 5.8 in the most aggressive way (no option, no warning), and use two years until GA to catch any bugs - which become easier to catch without this hack than with.
[2 Dec 2016 14:51] MySQL Verification Team
Hi!

This is a bug reported by other users / contributors , like Sergey Vojtovich.

This is verified as a bug whose fixing is fully justified.