Bug #52367 Deadlock involving SET GLOBAL EVENT_SCHEDULER = OFF during rqg_mdl_deadlock test
Submitted: 25 Mar 2010 15:27 Modified: 12 May 2010 1:18
Reporter: John Embretsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:5.5.4-m3 OS:Any
Assigned to: Jon Olav Hauglid CPU Architecture:Any
Tags: pushbuild, rqg_pb2, test failure

[25 Mar 2010 15:27] John Embretsen
Description:
When executing a random concurrent workload involving SET GLOBAL EVENT_SCHEDULER = OFF, mysqld sometimes deadlocks with symptoms as follows:

 - Some DML threads ("SELECT", "UPDATE")
   MDL_context::timed_wait
   MDL_context::wait_for_lock
 - Some DDL threads ("CREATE TABLE", "DROP VIEW", etc.),
   and "FLUSH TABLE":
   MDL_context::timed_wait
   MDL_context::acquire_lock_impl
 - Event scheduler thread ("SET GLOBAL EVENT_SCHEDULER = OFF")
   Event_scheduler::cond_wait
   Event_scheduler::stop

Full stack traces will be attached shortly.

Observed against the the mysql-trunk-runtime-exp branch, test "rqg_mdl_deadlock", using RQG revision john.embretsen@sun.com-20100304160558-d02dwscem797726i.

Issue is non-deterministic, reproduced locally (linux 32-bit) in 1 out of 6 test runs.

This issue most likely fits into the family of events related locking bugs, see Bug#40915.

How to repeat:
Repeat as in Pushbuild:

Linux:

export CODE=/path/to/codebase

bzr branch lp:randgen
cd randgen
perl runall.pl \ 
 --grammar=conf/runtime/WL5004_sql.yy \ 
 --reporters=Deadlock,ErrorLog,Backtrace,Shutdown \ 
 --basedir=$CODE \ 
 --threads=10 \ 
 --queries=1M \ 
 --duration=1200 \ 
 --mysqld=--innodb \ 
 --mysqld=--innodb-lock-wait-timeout=50 \ 
 --mysqld=--lock-wait-timeout=31536000 \ 
 --mysqld=--log-output=file \ 
 --mysqld=--loose-skip-safemalloc \ 
 --mysqld=--loose-table-lock-wait-timeout=1

Here, test duration is set to 20 minutes. Lock wait timeouts set to today's defaults (50 sec for innodb-lock-wait-timeout, 1 year for lock-wait-timeout).
[25 Mar 2010 15:40] John Embretsen
Backtraces from all threads, Linux 32-bit.

Attachment: bug52367_backtraces_linux.txt (text/plain), 39.54 KiB.

[25 Mar 2010 17:03] Philip Stoev
See bug #51160
[26 Mar 2010 9:43] John Embretsen
This seems to be a duplicate of Bug#51160 - at least I was able to reproduce the issue on linux using the RQG grammar from that bug's "How to repeat".

However, since the bug is closed, I overlooked it when filing this bug.
It also seems that the fix for Bug#51160 is present in mysql-trunk-runtime-exp, and I still see the issue, so it appears that the fix was not complete, or there is a regression of some sort.
[26 Mar 2010 10:07] John Embretsen
Issue reproduced (on linux 32-bit) against current mysql-trunk-bugfixing, revid alik@sun.com-20100324081454-gucgfy0x4x7vgyp1, using the simplified grammar from Bug#51160.
[30 Mar 2010 19:50] Omer Barnir
triage: setting tag to SR55RC (P3 as changing events on off all the time will not happen)
[13 Apr 2010 9:20] Jon Olav Hauglid
Non-deterministic MTR test case:

let $try = 100;
connect (con1, localhost, root);

while ($try)
{
        SET GLOBAL EVENT_SCHEDULER = ON;
        connection default;
        --send SET GLOBAL EVENT_SCHEDULER = OFF
        connection con1;
        --send SET GLOBAL EVENT_SCHEDULER = OFF
        connection default;
        --reap
        connection con1;
        --reap
        dec $try;
}

connection default;
disconnect con1;

The bug title is slightly misleading. This isn't a deadlock as such, rather a hang of a connection executing "SET GLOBAL EVENT_SCHEDULER = OFF" in cases where another connection concurrently is executing the same statement.
[13 Apr 2010 12:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105498

2999 Jon Olav Hauglid	2010-04-13
      Bug #52367 Deadlock involving SET GLOBAL EVENT_SCHEDULER = OFF
                 during rqg_mdl_deadlock test
      
      The problem was that if two connection threads simultaneously tries
      to execute "SET GLOBAL EVENT_SCHEDULER = OFF", one of them could
      hang waiting for the scheduler to stop.
      
      The first connection thread would kill the event scheduler thread
      and then start waiting for it to exit. The second connection thread
      would then find the event scheduler thread in the process of exiting
      and also wait for it to exit. However, since the event scheduler 
      thread used signal to wake only one waiting thread, the other connection
      thread would be left waiting.
      
      This bug was a regression introduced by the fix for Bug#51160.
      Before #51160 it was not possible for two connection threads to 
      try to stop the event scheduler thread simultaneously.
      
      This patch fixes the problem my making sure the event scheduler
      thread uses broadcast to notify all waiters that it is exiting.
      
      No test case added as this would require adding debug sync points
      to parts of the code where sync points are currently not used.
      The patch has been tested with the non-deterministic test case
      from the bug description as well as using the RQG.
[14 Apr 2010 7:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105594

2999 Jon Olav Hauglid	2010-04-14
      Bug #52367 Deadlock involving SET GLOBAL EVENT_SCHEDULER = OFF
                 during rqg_mdl_deadlock test
      
      The problem was that if two connection threads simultaneously tries
      to execute "SET GLOBAL EVENT_SCHEDULER = OFF", one of them could
      hang waiting for the scheduler to stop.
      
      The first connection thread would kill the event scheduler thread
      and then start waiting for it to exit. The second connection thread
      would then find the event scheduler thread in the process of exiting
      and also wait for it to exit. However, since the event scheduler 
      thread used signal to wake only one waiting thread, the other connection
      thread would be left waiting.
      
      This bug was a regression introduced by the fix for Bug#51160.
      Before #51160 it was not possible for two connection threads to 
      try to stop the event scheduler thread simultaneously.
      
      This patch fixes the problem my making sure the event scheduler
      thread uses broadcast to notify all waiters that it is exiting.
      
      No test case added as this would require adding debug sync points
      to parts of the code where sync points are currently not used.
      The patch has been tested with the non-deterministic test case
      from the bug description as well as using the RQG.
[14 Apr 2010 7:32] Jon Olav Hauglid
Pushed to mysql-trunk-runtime (Ver 5.5.4-m3).
[27 Apr 2010 9:46] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100427094135-5s49ecp3ckson6e2) (version source revid:alik@sun.com-20100427093843-uekr85qkd7orx12t) (merge vers: 6.0.14-alpha) (pib:16)
[27 Apr 2010 9:48] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100427093804-a2k3rrjpwu5jegu8) (version source revid:alik@sun.com-20100427093804-a2k3rrjpwu5jegu8) (merge vers: 5.5.5-m3) (pib:16)
[27 Apr 2010 9:51] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100427094036-38frbg3famdlvjup) (version source revid:alik@sun.com-20100427093825-92wc8b22d4yg34ju) (pib:16)
[12 May 2010 1:18] Paul DuBois
Noted in 5.5.5, 6.0.14 changelogs.

Two sessions trying to set the global event_scheduler system variable
to OFF resulted in one of them hanging waiting for the event
scheduler to stop.