MySQL Bugs: #52367: Deadlock involving SET GLOBAL EVENT_SCHEDULER = OFF during rqg_mdl

Bug #52367	Deadlock involving SET GLOBAL EVENT_SCHEDULER = OFF during rqg_mdl_deadlock test
Submitted:	25 Mar 2010 15:27	Modified:	12 May 2010 1:18
Reporter:	John Embretsen	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S2 (Serious)
Version:	5.5.4-m3	OS:	Any
Assigned to:	Jon Olav Hauglid	CPU Architecture:	Any
Tags:	pushbuild, rqg_pb2, test failure

Description:
When executing a random concurrent workload involving SET GLOBAL EVENT_SCHEDULER = OFF, mysqld sometimes deadlocks with symptoms as follows:

 - Some DML threads ("SELECT", "UPDATE")
   MDL_context::timed_wait
   MDL_context::wait_for_lock
 - Some DDL threads ("CREATE TABLE", "DROP VIEW", etc.),
   and "FLUSH TABLE":
   MDL_context::timed_wait
   MDL_context::acquire_lock_impl
 - Event scheduler thread ("SET GLOBAL EVENT_SCHEDULER = OFF")
   Event_scheduler::cond_wait
   Event_scheduler::stop

Full stack traces will be attached shortly.

Observed against the the mysql-trunk-runtime-exp branch, test "rqg_mdl_deadlock", using RQG revision john.embretsen@sun.com-20100304160558-d02dwscem797726i.

Issue is non-deterministic, reproduced locally (linux 32-bit) in 1 out of 6 test runs.

This issue most likely fits into the family of events related locking bugs, see Bug#40915.

How to repeat:
Repeat as in Pushbuild:

Linux:

export CODE=/path/to/codebase

bzr branch lp:randgen
cd randgen
perl runall.pl \ 
 --grammar=conf/runtime/WL5004_sql.yy \ 
 --reporters=Deadlock,ErrorLog,Backtrace,Shutdown \ 
 --basedir=$CODE \ 
 --threads=10 \ 
 --queries=1M \ 
 --duration=1200 \ 
 --mysqld=--innodb \ 
 --mysqld=--innodb-lock-wait-timeout=50 \ 
 --mysqld=--lock-wait-timeout=31536000 \ 
 --mysqld=--log-output=file \ 
 --mysqld=--loose-skip-safemalloc \ 
 --mysqld=--loose-table-lock-wait-timeout=1

Here, test duration is set to 20 minutes. Lock wait timeouts set to today's defaults (50 sec for innodb-lock-wait-timeout, 1 year for lock-wait-timeout).

Backtraces from all threads, Linux 32-bit.

Attachment: bug52367_backtraces_linux.txt (text/plain), 39.54 KiB.

See bug #51160

This seems to be a duplicate of Bug#51160 - at least I was able to reproduce the issue on linux using the RQG grammar from that bug's "How to repeat".

However, since the bug is closed, I overlooked it when filing this bug.
It also seems that the fix for Bug#51160 is present in mysql-trunk-runtime-exp, and I still see the issue, so it appears that the fix was not complete, or there is a regression of some sort.

Issue reproduced (on linux 32-bit) against current mysql-trunk-bugfixing, revid alik@sun.com-20100324081454-gucgfy0x4x7vgyp1, using the simplified grammar from Bug#51160.

triage: setting tag to SR55RC (P3 as changing events on off all the time will not happen)

Non-deterministic MTR test case:

let $try = 100;
connect (con1, localhost, root);

while ($try)
{
        SET GLOBAL EVENT_SCHEDULER = ON;
        connection default;
        --send SET GLOBAL EVENT_SCHEDULER = OFF
        connection con1;
        --send SET GLOBAL EVENT_SCHEDULER = OFF
        connection default;
        --reap
        connection con1;
        --reap
        dec $try;
}

connection default;
disconnect con1;

The bug title is slightly misleading. This isn't a deadlock as such, rather a hang of a connection executing "SET GLOBAL EVENT_SCHEDULER = OFF" in cases where another connection concurrently is executing the same statement.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105498

2999 Jon Olav Hauglid	2010-04-13
      Bug #52367 Deadlock involving SET GLOBAL EVENT_SCHEDULER = OFF
                 during rqg_mdl_deadlock test
      
      The problem was that if two connection threads simultaneously tries
      to execute "SET GLOBAL EVENT_SCHEDULER = OFF", one of them could
      hang waiting for the scheduler to stop.
      
      The first connection thread would kill the event scheduler thread
      and then start waiting for it to exit. The second connection thread
      would then find the event scheduler thread in the process of exiting
      and also wait for it to exit. However, since the event scheduler 
      thread used signal to wake only one waiting thread, the other connection
      thread would be left waiting.
      
      This bug was a regression introduced by the fix for Bug#51160.
      Before #51160 it was not possible for two connection threads to 
      try to stop the event scheduler thread simultaneously.
      
      This patch fixes the problem my making sure the event scheduler
      thread uses broadcast to notify all waiters that it is exiting.
      
      No test case added as this would require adding debug sync points
      to parts of the code where sync points are currently not used.
      The patch has been tested with the non-deterministic test case
      from the bug description as well as using the RQG.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105594

2999 Jon Olav Hauglid	2010-04-14
      Bug #52367 Deadlock involving SET GLOBAL EVENT_SCHEDULER = OFF
                 during rqg_mdl_deadlock test
      
      The problem was that if two connection threads simultaneously tries
      to execute "SET GLOBAL EVENT_SCHEDULER = OFF", one of them could
      hang waiting for the scheduler to stop.
      
      The first connection thread would kill the event scheduler thread
      and then start waiting for it to exit. The second connection thread
      would then find the event scheduler thread in the process of exiting
      and also wait for it to exit. However, since the event scheduler 
      thread used signal to wake only one waiting thread, the other connection
      thread would be left waiting.
      
      This bug was a regression introduced by the fix for Bug#51160.
      Before #51160 it was not possible for two connection threads to 
      try to stop the event scheduler thread simultaneously.
      
      This patch fixes the problem my making sure the event scheduler
      thread uses broadcast to notify all waiters that it is exiting.
      
      No test case added as this would require adding debug sync points
      to parts of the code where sync points are currently not used.
      The patch has been tested with the non-deterministic test case
      from the bug description as well as using the RQG.

Pushed to mysql-trunk-runtime (Ver 5.5.4-m3).

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100427094135-5s49ecp3ckson6e2) (version source revid:alik@sun.com-20100427093843-uekr85qkd7orx12t) (merge vers: 6.0.14-alpha) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100427093804-a2k3rrjpwu5jegu8) (version source revid:alik@sun.com-20100427093804-a2k3rrjpwu5jegu8) (merge vers: 5.5.5-m3) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100427094036-38frbg3famdlvjup) (version source revid:alik@sun.com-20100427093825-92wc8b22d4yg34ju) (pib:16)

Noted in 5.5.5, 6.0.14 changelogs.

Two sessions trying to set the global event_scheduler system variable
to OFF resulted in one of them hanging waiting for the event
scheduler to stop.