Bug #95223 Locking loop between LOCK_event_queue and event MDL lock
Submitted: 2 May 2019 12:57 Modified: 16 Jul 2019 16:24
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:8.0.15 OS:Any
Assigned to: CPU Architecture:Any
Tags: event scheduler, mdl

[2 May 2019 12:57] Laurynas Biveinis
Description:
Test main.events_and_binlog fails with timeout intermittently. Upon stacktrace analysis we find server hanging at DROP EVENT IF EXISTS e1 and following two threads:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007fbc97d1cb11 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7fbc8c1f78c0) at ../nptl/pthread_mutex_lock.c:133
#2  0x0000558aaea1e924 in native_mutex_lock (mutex=0x7fbc8c1f78c0) at ./include/thr_mutex.h:186
#3  my_mutex_lock (mp=0x7fbc8c1f78c0) at ./include/thr_mutex.h:186
#4  inline_mysql_mutex_lock (src_line=671, src_file=0x558ab03c509f "/tmp/ps/sql/event_queue.cc", that=0x7fbc8c1f78c0) at ./include/mysql/psi/mysql_mutex.h:250
#5  Event_queue::lock_data (this=this@entry=0x7fbc8c1f78c0, func=func@entry=0x558ab0484748 <Event_queue::drop_event(THD*, MYSQL_LEX_STRING, MYSQL_LEX_STRING)::__func__> "drop_event", line=line@entry=252) at ./sql/event_queue.cc:671
#6  0x0000558aaea1eb25 in Event_queue::drop_event (this=0x7fbc8c1f78c0, thd=thd@entry=0x7fbc66003000, dbname=..., name=...) at ./sql/event_queue.cc:252
#7  0x0000558aae8af9d0 in Events::drop_event (thd=thd@entry=0x7fbc66003000, dbname=..., name=..., if_exists=<optimized out>) at ./sql/events.cc:668
#8  0x0000558aae783e48 in mysql_execute_command (thd=0x7fbc66003000, first_level=<optimized out>) at ./sql/sql_parse.cc:3844
#9  0x0000558aae78860b in mysql_parse (thd=thd@entry=0x7fbc66003000, parser_state=parser_state@entry=0x7fbc870fae70, update_userstat=update_userstat@entry=false, force_primary_storage_engine=force_primary_storage_engine@entry=false) at ./sql/sql_parse.cc:5338
#10 0x0000558aae78b8e3 in dispatch_command (thd=0x7fbc66003000, com_data=<optimized out>, command=COM_QUERY) at ./sql/sql_parse.cc:1765
#11 0x0000558aae78c3dc in do_command (thd=thd@entry=0x7fbc66003000) at ./sql/sql_parse.cc:1292
#12 0x0000558aae8a4a50 in handle_connection (arg=arg@entry=0x7fbc8c33a800) at ./sql/conn_handler/connection_handler_per_thread.cc:311
#13 0x0000558aaf641401 in pfs_spawn_thread (arg=0x7fbc6e1e9320) at ./storage/perfschema/pfs.cc:2836
#14 0x00007fbc97d1a164 in start_thread (arg=<optimized out>) at pthread_create.c:486
#15 0x00007fbc96db5def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

which has acquired MDL lock for the event and now is trying to acquire LOCK_event_queue, and

#0  futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fbc7f2ad8a0, expected=0, futex_word=0x7fbc6401e0e8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x7fbc7f2ad8a0, mutex=0x7fbc6401e090, cond=0x7fbc6401e0c0) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=cond@entry=0x7fbc6401e0c0, mutex=mutex@entry=0x7fbc6401e090, abstime=abstime@entry=0x7fbc7f2ad8a0) at pthread_cond_wait.c:667
#3  0x0000558aae6543f1 in native_cond_timedwait (abstime=0x7fbc7f2ad8a0, mutex=0x7fbc6401e090, cond=0x7fbc6401e0c0) at ./include/thr_cond.h:149
#4  my_cond_timedwait (abstime=0x7fbc7f2ad8a0, mp=0x7fbc6401e090, cond=0x7fbc6401e0c0) at ./include/thr_cond.h:149
#5  inline_mysql_cond_timedwait (src_file=0x558ab03b0f75 "/tmp/ps/sql/mdl.cc", src_line=1817, abstime=0x7fbc7f2ad8a0, mutex=0x7fbc6401e090, that=0x7fbc6401e0c0) at ./include/mysql/psi/mysql_cond.h:217
#6  MDL_wait::timed_wait (this=this@entry=0x7fbc6401e090, owner=0x7fbc6401e000, abs_timeout=abs_timeout@entry=0x7fbc7f2ad8a0, set_status_on_timeout=set_status_on_timeout@entry=false, wait_state_name=<optimized out>) at ./sql/mdl.cc:1816
#7  0x0000558aae658d32 in MDL_context::acquire_lock (this=0x7fbc6401e090, mdl_request=0x7fbc7f2ae010, lock_wait_timeout=<optimized out>) at ./sql/mdl.h:745
#8  0x0000558aae6597c2 in MDL_context::acquire_locks (this=this@entry=0x7fbc6401e090, mdl_requests=mdl_requests@entry=0x7fbc7f2adab0, lock_wait_timeout=31536000) at ./sql/mdl.cc:3613
#9  0x0000558aae9ba388 in lock_object_name (thd=thd@entry=0x7fbc6401e000, mdl_type=mdl_type@entry=MDL_key::EVENT, db=<optimized out>, name=<optimized out>) at ./sql/lock.cc:888
#10 0x0000558aaea1ee85 in Event_queue::recalculate_activation_times (this=0x7fbc8c1f78c0, thd=thd@entry=0x7fbc6401e000) at ./sql/event_queue.cc:414
#11 0x0000558aaea21a13 in Event_scheduler::run (this=0x7fbc8c1f7980, thd=0x7fbc6401e000) at ./sql/event_scheduler.cc:540
#12 0x0000558aaea21cc2 in event_scheduler_thread (arg=arg@entry=0x7fbc66068000) at ./sql/event_scheduler.cc:269
#13 0x0000558aaf641401 in pfs_spawn_thread (arg=0x7fbc66077420) at ./storage/perfschema/pfs.cc:2836
#14 0x00007fbc97d1a164 in start_thread (arg=<optimized out>) at pthread_create.c:486
#15 0x00007fbc96db5def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

which is in the opposite situation, hence the locking loop. I don't call it a deadlock, because MDL lock wait timeout should eventually resolve it, yet that spuriously adds MDL timeout-worth of wait time for a simple command, or a test failure in case of MTR.

How to repeat:
events_and_binlog test, source code analysis
[3 May 2019 12:56] MySQL Verification Team
Hi Laurynas,

I have analysed carefully stacktraces and the associated source code. I think that our code works exactly as it is designed to. However, it also seems to me that it could be improved without changing any default behaviour.

Verified as reported, with slightly lowered severity level.
[16 Jul 2019 16:24] Paul DuBois
Posted by developer:
 
Fixed in 8.0.18.

Simultaneous execution of the event scheduler and DROP EVENT could
result in lock acquisition hanging until lock wait timeout.
[17 Jul 2019 12:23] MySQL Verification Team
Thank you, Paul .....