Bug #40915 Events takes mutex in wrong order which can easily lead to deadlocks
Submitted: 21 Nov 2008 6:58 Modified: 21 Jul 2010 8:32
Reporter: Michael Widenius Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: General Severity:S1 (Critical)
Version:5.1 OS:Any
Assigned to: Dmitry Shulga CPU Architecture:Any

[21 Nov 2008 6:58] Michael Widenius
Description:
When running mysql-5.1 with the new wrong mutex usage detector, several critical wrong use of mutex was found in the event code. Any of these can lead to a deadlock when using events in combination with show status, create database etc.

Some examples of mutex that are taken in wrong order:

LOCK_event_metadata and LOCK_schduler_state
LOCK_event_metadata and LOCK_open
LOCK_scheduler_state and LOCK_global_system_variables
LOCK_event_queue and LOCK_scheduler_state
LOCK_event_queue and LOCK_open

How to repeat:
pull mysql-maria tree with deadlock detector

Remove MYF_NO_DEADLOCK_DETECTION flag from event mutex

Run test suite. At least:

mysql-test-run main.events_bugs main.events_trans should show some of the problems

Suggested fix:
Fix event code to ensure that mutex are always locked in the same order
[24 Nov 2008 13:49] Konstantin Osipov
Hi Monty, what is wrong in this order? There does not seem to be any inter-dependences?
[24 Nov 2008 15:26] Konstantin Osipov
locking diagram

Attachment: locking.jpg (image/jpeg, text), 17.19 KiB.

[24 Nov 2008 15:26] Konstantin Osipov
Monty, the attached diagram has no cycles.
[25 Nov 2008 15:10] Michael Widenius
event_scheduler are taking mutex in different order 
For example, in one case it's doing:
mutex_lock(LOCK_event_metadata) ; mutex_lock(LOCK_schduler_state);
In other cases:
mutex_lock(LOCK_schduler_state); mutex_lock(LOCK_event_metadata);
which leads to deadlock senarios
[20 Jan 2010 19:03] Sven Sandberg
See also BUG#50483
[22 Feb 2010 17:34] Philip Stoev
See bug #51160
[22 Feb 2010 17:35] Philip Stoev
See also bug #51391
[4 Mar 2010 10:10] Philip Stoev
IRC log with davi:

Conversation with davi at 22/02/2010 19:31:48 on pstoev@10.100.1.29 (irc)
(19:31:48) davi: hi philip, around?
(19:31:52) pstoev: hello yes
(19:32:36) davi: i see that you are reporting quite a few deadlocks involving event scheduler statements
(19:33:22) pstoev: I recently reported one about SET GLOBAL EVENT SCHEDULER OFF | ON. I do not recall any others recently.
(19:33:22) davi: for each of those, it would be nice if you could post a "ping" on Bug#40915 so we can track them.
(19:33:54) pstoev: ok will do so
(19:34:05) davi: i think i also saw 51391
(19:34:31) davi: its just so i can track the thing and later post about the difficult on fixing those..
(19:34:59) davi: it will be hard to fix :/.. there is a wrong lock order involving 4 or more mutexes in some cases
(19:35:45) pstoev: oh
(19:35:55) pstoev: ok I linked all those bugs together
(19:36:06) davi: like: LOCK_event_metadata, LOCK_scheduler_state, LOCK_event_global_system variables.. so any concurrent workload that stresses those (LIKE, SET GLOBAL EVENT SCHEDULER ON/OFF) could easily trigger a deadlock
(19:36:12) davi: ok, thanks!
(19:36:51) pstoev: uh, not good
(19:37:14) pstoev: I am sorry to hear that there is no easy solution -- I thought that swapping some internal event schedulers would be enough
(19:37:26) pstoev: but I guess 4 separate bugs would mean that eventually this will be fixed
(19:41:31) davi: thanks. it's not easy because the wrong orders are established in the sysvars code and the THD creating code, which are already quite hairy. i just hope concurrent event scheduler maintenance statements aren't very common :)
(19:41:58) pstoev: it is a DoS vector, except that it requires SUPER
[25 Mar 2010 16:25] John Embretsen
See also Bug#52367.
[21 Jul 2010 8:32] Dmitry Shulga
The bug can not be reproduced in 5.5, most likely due to changes done in scope of the fix for Bug#51160. Manual checking and asserts in 5.1 suggest that the problem is more likely in the deadlock detector code, rather than in the implementation of Events.