Bug #51391 Deadlock involving events during rqg_info_schema test
Submitted: 22 Feb 2010 15:06 Modified: 8 May 2010 22:23
Reporter: John Embretsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:5.5.3-m3 OS:Any
Assigned to: Jon Olav Hauglid CPU Architecture:Any
Tags: pushbuild, rqg_pb2, test failure

[22 Feb 2010 15:06] John Embretsen
Description:
The Random Query Generator (RQG) test "rqg_info_schema" fails intermittently in the branch mysql-next-mr-bugfixing on Solaris platforms in Pushbuild.

Test logs say:

"Server deadlock reported, initiating analysis..."

which means that client connection attempts fail because the server is stalled due to a deadlock.

Stack traces seem to indicate that EVENTs are involved, e.g.:

# 03:09:27 current thread: t@22
# 03:09:27 =>[1] __lwp_park(0x4, 0xf5dab830, 0x0, 0x0, 0xff36e100, 0x1), at 0xfee48ac4 
# 03:09:27   [2] cond_sleep_queue(0x1, 0x2abd030, 0xf5dab830, 0x0, 0x6dbb0, 0x0), at 0xfee42910 
# 03:09:27   [3] cond_wait_queue(0x2ab17b0, 0x2abd030, 0xf5dab830, 0x0, 0x142, 0x0), at 0xfee42ad4 
# 03:09:27   [4] cond_wait_common(0x2ab17b0, 0x2abd030, 0xf5dab830, 0x0, 0x0, 0x0), at 0xfee42f18 
# 03:09:27   [5] _cond_timedwait(0x2ab17b0, 0x2abd030, 0xf5daba4c, 0x10, 0xf5dabf3f, 0x0), at 0xfee430ac 
# 03:09:27   [6] cond_timedwait(0x2ab17b0, 0x2abd030, 0xf5daba4c, 0x1c00, 0xfed27a00, 0x2abd018), at 0xfee431a0 
# 03:09:27   [7] _pthread_cond_timedwait(0x2ab17b0, 0x2abd030, 0xf5daba4c, 0x0, 0xfed27a00, 0x0), at 0xfee431e0 
# 03:09:27   [8] safe_cond_timedwait(cond = 0x2ab17b0, mp = 0x2abd018, abstime = 0xf5daba4c, file = 0x11975a4 "../include/mysql/psi/mysql_thread.h", line = 812U), line 278 in "thr_mutex.c"
# 03:09:27   [9] inline_mysql_cond_timedwait(that = 0x2ab17b0, mutex = 0x2abd018, abstime = 0xf5daba4c), line 812 in "mysql_thread.h"
# 03:09:27   [10] wait_for_lock(wait = 0x2abd070, data = 0x2abd8b0, in_wait_list = '\0', lock_wait_timeout = 31536000U), line 442 in "thr_lock.c"
# 03:09:27   [11] thr_lock(data = 0x2abd8b0, owner = 0x2ac9f28, lock_type = TL_WRITE, lock_wait_timeout = 31536000U), line 779 in "thr_lock.c"
# 03:09:27   [12] thr_multi_lock(data = 0x2c24404, count = 1U, owner = 0x2ac9f28, lock_wait_timeout = 31536000U), line 1050 in "thr_lock.c"
# 03:09:27   [13] mysql_lock_tables(thd = 0x2ac9690, tables = 0x2ad3290, count = 1U, flags = 0, need_reopen = 0xf5dabf3f), line 336 in "lock.cc"
# 03:09:27   [14] lock_tables(thd = 0x2ac9690, tables = 0xf5dac028, count = 1U, flags = 0, need_reopen = 0xf5dabf3f), line 5286 in "sql_base.cc"
# 03:09:27   [15] open_and_lock_tables_derived(thd = 0x2ac9690, tables = 0xf5dac028, derived = false, flags = 0, prelocking_strategy = 0xf5dabfd8), line 5132 in "sql_base.cc"
# 03:09:27   [16] Event_db_repository::open_event_table(this = 0x14f1580, thd = 0x2ac9690, lock_type = TL_WRITE, table = 0xf5dac47c), line 577 in "event_db_repository.cc"
# 03:09:27   [17] Event_db_repository::create_event(this = 0x14f1580, thd = 0x2ac9690, parse_data = 0x2ad30f8, create_if_not = '\0'), line 625 in "event_db_repository.cc"
# 03:09:27   [18] Events::create_event(thd = 0x2ac9690, parse_data = 0x2ad30f8, if_not_exists = false), line 335 in "events.cc"
# 03:09:27   [19] mysql_execute_command(thd = 0x2ac9690), line 3526 in "sql_parse.cc"
# 03:09:27   [20] mysql_parse(thd = 0x2ac9690, inBuf = 0x2ad3008 "CREATE EVENT n ON SCHEDULE AT NOW() DO SET @a=@a", length = 48U, found_semicolon = 0xf5dafa78), line 5581 in "sql_parse.cc"
# 03:09:27   [21] dispatch_command(command = COM_QUERY, thd = 0x2ac9690, packet = 0x2add639 "CREATE EVENT n ON SCHEDULE AT NOW() DO SET @a=@a", packet_length = 48U), line 1023 in "sql_parse.cc"
# 03:09:27   [22] do_command(thd = 0x2ac9690), line 709 in "sql_parse.cc"
# 03:09:27   [23] do_handle_one_connection(thd_arg = 0x2ac9690), line 1174 in "sql_connect.cc"
# 03:09:27   [24] handle_one_connection(arg = 0x2ac9690), line 1113 in "sql_connect.cc"

The error log includes the following in the lock table output:

mysql          time_zone_transition_type            1      12     1  Low priority read lock
mysql          event                                1      12     1  Low priority read lock
mysql          event                                1       8     1  Low priority read lock
mysql          event                                1       7     1  High priority write lock
mysql          event                                1      15     1  Low priority read lock
test           E                                    1      11     1  Low priority read lock
test           BB                                   1      10     1  Low priority read lock
mysql          help_relation                        1       8     1  Low priority read lock

Note that the same test results in a deadlock with seemingly different stack traces in the mysql-next-mr and mysql-next-4284 branches, involving MDL code. Those failures are assumed to be related to Bug#51105. The issue described in this bug report looks different, although it could still be related somehow.

Also note that attempts to reproduce this manually with current next-mr-bugfixing and current RQG have resulted in hitting an assertion which may be hiding this bug, see Bug#51376.

How to repeat:
Branch and build, or otherwise obtain, code from mysql-next-mr-bugfixing. Referred to below as environment variable CODE.

Branch or otherwise obtain the Random Query Generator:

bzr branch lp:randgen

Run the test:

cd randgen

perl runall.pl \
 --grammar=conf/information_schema.yy \
 --threads=10 \
 --duration=300 \
 --queries=100000
 --reporters=Deadlock,ErrorLog,Backtrace,Shutdown \
 --basedir=$CODE \
 --mysqld=--log-output=file \
 --mysqld=--loose-skip-safemalloc \
 --mysqld=--loose-table-lock-wait-timeout=1 \
 --mysqld=--loose-lock-wait-timeout=1 \
 --mysqld=--loose-innodb-lock-wait-timeout=1
[22 Feb 2010 15:12] John Embretsen
Attached: Stacktraces from all threads.

Attachment: stacktraces_bug51391.txt (text/plain), 46.47 KiB.

[22 Feb 2010 17:35] Philip Stoev
See also bug #40915
[1 Mar 2010 13:42] John Embretsen
Pushbuild test rqg_info_schema disabled in next-mr and next-mr-bugfixing due to this bug, as well as a few other bugs (see below).

I have run (outside of Pushbuild) this test against next-mr-bugfixing plus the following patches on Solaris 10 SPARC 64-bit a few times, and did not see this issue:

 - Bug#51105 (MDL deadlock in rqg_mdl_stability test on Windows)
   http://lists.mysql.com/commits/101767
 - Bug#51376 (Assert `! is_set()' failed in Diagnostics_area::set_ok_status on DROP FUNCTION)
   http://lists.mysql.com/commits/101190
 - Bug#51456 (rqg_info_schema fails when SHOW PROCEDURE|FUNCTION CODE feature is disabled)
   http://lists.mysql.com/commits/101358

Unless this issue is seen again with all three patches applied, we can assume this bug is fixed as part of those fixes. Jon Olav Hauglid may have further details.
[4 Mar 2010 10:18] Sveta Smirnova
Thank you for the report.

Verified as described. To repeat I had to run test about 20 times.

(dbx) lwp l@14 
Current function is safe_cond_wait
  240     error=pthread_cond_wait(cond,&mp->mutex);
t@14 (l@14) stopped in __lwp_park at 0xfebfc589
0xfebfc589: __lwp_park+0x0019:  jae      __lwp_park+0x27        [ 0xfebfc597, .+0xe ]
(dbx) where    
  [1] __lwp_park(0x0, 0x0, 0x0, 0x0), at 0xfebfc589 
  [2] cond_sleep_queue(0x8aedb64, 0x8aedb30), at 0xfebf5a5c 
  [3] cond_wait_queue(0x8aedb64, 0x8aedb30, 0x0, 0xfebf61f6), at 0xfebf5d3d 
  [4] __cond_wait(0x8aedb64, 0x8aedb30, 0xf5ce9efc, 0xfebf628e), at 0xfebf624e 
  [5] cond_wait(0x8aedb64, 0x8aedb30, 0xf5ce9f2c, 0xfebf62cc), at 0xfebf629c 
  [6] pthread_cond_wait(0x8aedb64, 0x8aedb30, 0xf5ce9f4c, 0x8430666, 0x8aedb18, 0x0), at 0xfebf62e1 
=>[7] safe_cond_wait(cond = 0x8aedb64, mp = (nil), file = 0x860bc2c "os/os0sync.c", line = 0), line 240 in "thr_mutex.c"
  [8] os_event_wait_low(event = (nil), reset_sig_count = 0), line 399 in "os0sync.c"
  [9] srv_master_thread(arg = (nil)), line 2740 in "srv0srv.c"
  [10] _thrp_setup(0xf67e6200), at 0xfebfc2a6 
  [11] _lwp_start(0x0, 0x0, 0x0, 0x0, 0xfebf595e, 0x0), at 0xfebfc530 
(dbx) lwp l@15 
Current function is safe_mutex_lock
  152       error= pthread_mutex_lock(&mp->mutex);
t@15 (l@15) stopped in __lwp_park at 0xfebfc589
0xfebfc589: __lwp_park+0x0019:  jae      __lwp_park+0x27        [ 0xfebfc597, .+0xe ]
(dbx) where    
  [1] __lwp_park(0x0, 0x0), at 0xfebfc589 
  [2] mutex_lock_queue(0xf67e6a00, 0x0, 0x88f3e78, 0x0), at 0xfebf3c53 
  [3] mutex_lock_impl(0x88f3e78, 0x0, 0xd, 0x85fe755, 0x1f4, 0x88f3e60), at 0xfebf4793 
  [4] mutex_lock(0x88f3e78, 0x1d, 0x890f570, 0x85fe884), at 0xfebf4869 
=>[5] safe_mutex_lock(mp = (nil), try_lock = '\0', file = (nil), line = 4122913260U), line 152 in "thr_mutex.c"
  [6] mysql_print_status(), line 519 in "mysql_thread.h"
  [7] signal_hand(arg = (nil)), line 2989 in "mysqld.cc"
  [8] _thrp_setup(0xf67e6a00), at 0xfebfc2a6 
  [9] _lwp_start(0x0, 0x0, 0x0, 0x0, 0xfebf3b59, 0x88f3e6c), at 0xfebfc530 
(dbx) lwp l@16 
Current function is vio_read
   44     r = read(vio->sd, buf, size);
...
(dbx) lwp l@22 
Current function is safe_cond_timedwait
  278     error=pthread_cond_timedwait(cond,&mp->mutex,abstime);
t@22 (l@22) stopped in __lwp_park at 0xfebfc589
0xfebfc589: __lwp_park+0x0019:  jae      __lwp_park+0x27        [ 0xfebfc597, .+0xe ]
(dbx) where    
dbx: variable `thd' has no address - unused or optimized out
dbx: variable `tables' has no address - unused or optimized out
dbx: variable `derived' has no address - unused or optimized out
dbx: variable `flags' has no address - unused or optimized out
  [1] __lwp_park(0x0, 0xf5b864dc, 0x0, 0x0), at 0xfebfc589 
  [2] cond_sleep_queue(0x9e9775c, 0x9ee2da0), at 0xfebf5a5c 
  [3] cond_wait_queue(0x9e9775c, 0x9ee2da0, 0xf5b864dc, 0xfebf5f16), at 0xfebf5d3d 
  [4] cond_wait_common(0x9e9775c, 0x9ee2da0, 0xf5b864dc, 0xfebf6307), at 0xfebf60f3 
  [5] __cond_timedwait(0x9e9775c, 0x9ee2da0, 0xf5b865ac, 0xfebf63ce), at 0xfebf633d 
  [6] cond_timedwait(0x9e9775c, 0x9ee2da0, 0xf5b865ac, 0xfebf6414), at 0xfebf63df 
  [7] pthread_cond_timedwait(0x9e9775c, 0x9ee2da0, 0xf5b865ac, 0x8612dc6, 0x20e, 0x9ee2d88), at 0xfebf642c 
=>[8] safe_cond_timedwait(cond = 0x9e9775c, mp = (nil), abstime = 0xf5b865ac, file = (nil), line = 4122502508U), line 278 in "thr_mutex.c"
  [9] wait_for_lock(wait = 0x9ee2ddc, data = (nil), in_wait_list = '\0', lock_wait_timeout = 0), line 812 in "mysql_thread.h"
  [10] thr_multi_lock(data = 0xa39889c, count = 1U, owner = 0x9ef3d4c, lock_wait_timeout = 31536000U), line 1049 in "thr_lock.c"
  [11] mysql_lock_tables(thd = 0x9ef3498, tables = 0xf5b8666c, count = 0, flags = 0, need_reopen = 0xf5b8671f), line 340 in "lock.cc"
  [12] lock_tables(thd = 0x9ef3498, tables = 0xf5b86794, count = 0, flags = 4096U, need_reopen = 0xf5b8671f), line 5349 in "sql_base.cc"
  [13] open_and_lock_tables(thd = , tables = , derived = , flags = ), line 5195 in "sql_base.cc"
  [14] Event_db_repository::open_event_table(this = 0x8919478, thd = 0x9ef3498, lock_type = TL_UNLOCK, table = (nil)), line 1584 in "mysql_priv.h"
  [15] Event_db_repository::drop_events_by_field(this = 0x8919478, thd = 0x9ef3498, field = ET_FIELD_DB, field_value = CLASS), line 956 in "event_db_repository.cc"
  [16] Event_db_repository::drop_schema_events(this = 0x8919478, thd = 0x9ef3498, schema = CLASS), line 929 in "event_db_repository.cc"
  [17] Events::drop_schema_events(thd = 0xf5b86c8c, db = (nil)), line 598 in "events.cc"
  [18] mysql_rm_db(thd = 0x9ef3498, db = 0x9ee98d0 "j", if_exists = true, silent = false), line 967 in "sql_db.cc"
  [19] mysql_execute_command(thd = 0x9ef3498), line 3523 in "sql_parse.cc"
  [20] mysql_parse(thd = (nil), inBuf = (nil), length = 4122508716U, found_semicolon = 0xf5b87e0c), line 5710 in "sql_parse.cc"
  [21] dispatch_command(command = COM_QUERY, thd = 0x9ef3498, packet = 0x9efb491 "DROP DATABASE IF EXISTS j", packet_length = 25U), line 1070 in "sql_parse.cc"
  [22] do_command(thd = 0x9ef3498), line 710 in "sql_parse.cc"
  [23] do_handle_one_connection(thd_arg = (nil)), line 1174 in "sql_connect.cc"
  [24] handle_one_connection(arg = 0xf5b88fcc), line 1113 in "sql_connect.cc"
  [25] _thrp_setup(0xf67e7a00), at 0xfebfc2a6 
  [26] _lwp_start(0x0, 0xf5b864dc, 0x0, 0x0, 0xfebf595e, 0x1), at 0xfebfc530
[9 Mar 2010 15:39] Jon Olav Hauglid
Managed to reproduce the deadlock on Ubuntu 64 bit using the current version of mysql-next-4284.
[10 Mar 2010 9:24] Jon Olav Hauglid
This is a deadlock between:
1) CREATE/DROP event, holding LOCK_event_metadata, trying to open mysql.event
2) I_S query, having mysql.event open, trying to lock LOCK_event_metadata to access the "event_scheduler" system variable.

This problem will very likely be fixed by the patch for Bug#51160.
[10 Mar 2010 12:12] Jon Olav Hauglid
MTR test case:

--source include/have_debug_sync.inc

CREATE EVENT e1 ON SCHEDULE EVERY 5 HOUR DO SELECT 1;
CREATE EVENT e2 ON SCHEDULE EVERY 5 HOUR DO SELECT 2;

connect(con2, localhost, root);
connect(con3, localhost, root);

connection con2;
SET DEBUG_SYNC="before_lock_tables_takes_lock SIGNAL before_lock WAIT_FOR query";
--send DROP EVENT e1;

connection con3;
SET DEBUG_SYNC="now WAIT_FOR before_lock";
--send SELECT * FROM mysql.event AS t1, INFORMATION_SCHEMA.GLOBAL_VARIABLES as t2 WHERE t1.definer = t2.VARIABLE_VALUE

connection default;
--sleep 1
SET DEBUG_SYNC="now SIGNAL query";
[12 Mar 2010 10:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103066

2992 Jon Olav Hauglid	2010-03-12
      Bug #51391 Deadlock involving events during rqg_info_schema test
      
      This was a deadlock between CREATE/ALTER/DROP EVENT and a query
      accessing both the mysql.event table and I_S.GLOBAL_VARIABLES.
      
      The root of the problem was that the LOCK_event_metadata mutex was
      used to both protect the "event_scheduler" global system variable
      and the internal event data structures used by CREATE/ALTER/DROP EVENT.
      
      The deadlock would occur if CREATE/ALTER/DROP EVENT held
      LOCK_event_metadata while trying to open the mysql.event table,
      at the same time as the query had mysql.event open, trying to
      lock LOCK_event_metadata to access "event_scheduler".
      
      This bug was fixed in the scope of Bug#51160 by using only
      LOCK_global_system_variables to protect "event_scheduler".
      This makes it so that the query above won't lock LOCK_event_metadata,
      thereby preventing this deadlock from occuring.
      
      This patch contains no code changes.
      Test case added to lock_sync.test.
[15 Apr 2010 12:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105728

3002 Jon Olav Hauglid	2010-04-15
      Bug #51391 Deadlock involving events during rqg_info_schema test
      
      This was a deadlock between CREATE/ALTER/DROP EVENT and a query
      accessing both the mysql.event table and I_S.GLOBAL_VARIABLES.
      
      The root of the problem was that the LOCK_event_metadata mutex was
      used to both protect the "event_scheduler" global system variable
      and the internal event data structures used by CREATE/ALTER/DROP EVENT.
      
      The deadlock would occur if CREATE/ALTER/DROP EVENT held
      LOCK_event_metadata while trying to open the mysql.event table,
      at the same time as the query had mysql.event open, trying to
      lock LOCK_event_metadata to access "event_scheduler".
      
      This bug was fixed in the scope of Bug#51160 by using only
      LOCK_global_system_variables to protect "event_scheduler".
      This makes it so that the query above won't lock LOCK_event_metadata,
      thereby preventing this deadlock from occuring.
      
      This patch contains no code changes.
      Test case added to lock_sync.test.
[15 Apr 2010 12:16] Jon Olav Hauglid
Pushed to mysql-trunk-runtime (Ver 5.5.4-m3).
[27 Apr 2010 9:44] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100427094135-5s49ecp3ckson6e2) (version source revid:alik@sun.com-20100427093843-uekr85qkd7orx12t) (merge vers: 6.0.14-alpha) (pib:16)
[27 Apr 2010 9:47] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100427093804-a2k3rrjpwu5jegu8) (version source revid:alik@sun.com-20100427093804-a2k3rrjpwu5jegu8) (merge vers: 5.5.5-m3) (pib:16)
[27 Apr 2010 9:50] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100427094036-38frbg3famdlvjup) (version source revid:alik@sun.com-20100427093825-92wc8b22d4yg34ju) (pib:16)
[8 May 2010 22:23] Paul DuBois
Changes to test case. No changelog entry needed.
[8 May 2010 22:23] Paul DuBois
Changes to test case. No changelog entry needed.