Bug #56096 STOP SLAVE hangs if executed in parallel with user sleep
Submitted: 18 Aug 2010 19:24 Modified: 20 Nov 2010 17:18
Reporter: Elena Stepanova Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.5.5-m3, 5.6.1-m4 OS:Any
Assigned to: Davi Arnaut CPU Architecture:Any
Tags: regression

[18 Aug 2010 19:24] Elena Stepanova
Description:
If STOP SLAVE command is issued while slave SQL thread is executing a statement which involves SLEEP(), both statements hang, seemingly forever. Decreasing all timeouts I could think of did not help to resolve the problem.

Thread 2 (Thread 1192302912 (LWP 2890)):
#0  0x00002aae8a1a33b7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000aa9ef4 in safe_cond_timedwait (cond=0x4710ed60, mp=0x12e4240, abstime=0x4710ecf0,
    file=0xbdd3c8 "mysql-5.6.99-m5/include/mysql/psi/mysql_thread.h", line=1089)
    at mysql-5.6.99-m5/mysys/thr_mutex.c:278
#2  0x0000000000795b20 in inline_mysql_cond_timedwait (that=0x4710ed60, mutex=0x12e4240, abstime=0x4710ecf0,
    src_file=0xbdccc8 "mysql-5.6.99-m5/sql/item_func.cc", src_line=3724)
    at mysql-5.6.99-m5/include/mysql/psi/mysql_thread.h:1089
#3  0x0000000000795d5c in interruptible_wait (thd=0x24294b0, cond=0x4710ed60, lock=0x12e4240, time=4)
    at mysql-5.6.99-m5/sql/item_func.cc:3724
#4  0x0000000000795ed7 in Item_func_sleep::val_int (this=0x241a868) at mysql-5.6.99-m5/sql/item_func.cc:4043
#5  0x000000000074f87c in Item::save_in_field (this=0x241a868, field=0x2429120, no_conversions=false)
    at mysql-5.6.99-m5/sql/item.cc:5467
#6  0x000000000058a35a in fill_record (thd=0x24294b0, ptr=0x2429118, values=@0x242b580, ignore_errors=true)
    at mysql-5.6.99-m5/sql/sql_base.cc:8389
#7  0x000000000058c539 in fill_record_n_invoke_before_triggers (thd=0x24294b0, ptr=0x2429110, values=@0x242b580, ignore_errors=true, triggers=0x0,
    event=TRG_EVENT_INSERT) at mysql-5.6.99-m5/sql/sql_base.cc:8432
#8  0x00000000005c5357 in select_insert::store_values (this=0x241a940, values=@0x242b580)
    at mysql-5.6.99-m5/sql/sql_insert.cc:3307
#9  0x00000000005c8099 in select_insert::send_data (this=0x241a940, values=@0x242b580)
    at mysql-5.6.99-m5/sql/sql_insert.cc:3243
#10 0x0000000000638744 in JOIN::exec (this=0x2431270) at mysql-5.6.99-m5/sql/sql_select.cc:2633
#11 0x0000000000634fbb in mysql_select (thd=0x24294b0, rref_pointer_array=0x242b660, tables=0x0, wild_num=0, fields=@0x242b580, conds=0x0, og_num=0, order=0x0,
    group=0x0, having=0x0, proc_param=0x0, select_options=3490712320, result=0x241a940, unit=0x242ae60, select_lex=0x242b478)
    at mysql-5.6.99-m5/sql/sql_select.cc:3338
#12 0x000000000063a9d5 in handle_select (thd=0x24294b0, lex=0x242adb0, result=0x241a940, setup_tables_done_option=1073741824)
    at mysql-5.6.99-m5/sql/sql_select.cc:323
#13 0x00000000005e1bcd in mysql_execute_command (thd=0x24294b0) at mysql-5.6.99-m5/sql/sql_parse.cc:3205
#14 0x00000000005e6b51 in mysql_parse (thd=0x24294b0, rawbuf=0x23ac899 "INSERT INTO t SELECT SLEEP(4)", length=29, parser_state=0x47110780)
    at mysql-5.6.99-m5/sql/sql_parse.cc:5831
#15 0x00000000008723e8 in Query_log_event::do_apply_event (this=0x23f4ed0, rli=0x23daff8, query_arg=0x23ac899 "INSERT INTO t SELECT SLEEP(4)", q_len_arg=29)
    at mysql-5.6.99-m5/sql/log_event.cc:3340
#16 0x0000000000872b0f in Query_log_event::do_apply_event (this=0x23f4ed0, rli=0x23daff8)
    at mysql-5.6.99-m5/sql/log_event.cc:3121
#17 0x00000000008916c2 in Log_event::apply_event (this=0x23f4ed0, rli=0x23daff8)
    at mysql-5.6.99-m5/sql/log_event.h:1131
#18 0x00000000008a2d43 in apply_event_and_update_pos (ev=0x23f4ed0, thd=0x24294b0, rli=0x23daff8)
    at mysql-5.6.99-m5/sql/rpl_slave.cc:2602
#19 0x00000000008a3373 in exec_relay_log_event (thd=0x24294b0, rli=0x23daff8)
    at mysql-5.6.99-m5/sql/rpl_slave.cc:2779
#20 0x00000000008a4272 in handle_slave_sql (arg=0x23d9bc0) at mysql-5.6.99-m5/sql/rpl_slave.cc:3608
#21 0x00002aae8a19f143 in start_thread () from /lib64/libpthread.so.0
#22 0x00002aae8a80e8cd in clone () from /lib64/libc.so.6
#23 0x0000000000000000 in ?? ()

Thread 3 (Thread 1192036672 (LWP 2888)):
#0  0x00002aae8a1a33b7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000aa9ef4 in safe_cond_timedwait (cond=0x23dc820, mp=0x23dc778, abstime=0x470ce170,
    file=0xc03cc8 "mysql-5.6.99-m5/include/mysql/psi/mysql_thread.h", line=1089)
    at mysql-5.6.99-m5/mysys/thr_mutex.c:278
#2  0x00000000008aa308 in inline_mysql_cond_timedwait (that=0x23dc820, mutex=0x23dc778, abstime=0x470ce170,
    src_file=0xc010e8 "mysql-5.6.99-m5/sql/rpl_slave.cc", src_line=660)
    at mysql-5.6.99-m5/include/mysql/psi/mysql_thread.h:1089
#3  0x00000000008aa6ca in terminate_slave_thread (thd=0x24294b0, term_lock=0x23dc778, term_cond=0x23dc820, slave_running=0x23dcfb8, skip_lock=true)
    at mysql-5.6.99-m5/sql/rpl_slave.cc:660
#4  0x00000000008aaa63 in terminate_slave_threads (mi=0x23d9bc0, thread_mask=3, skip_lock=true)
    at mysql-5.6.99-m5/sql/rpl_slave.cc:538
#5  0x00000000008aaca5 in stop_slave (thd=0x2399b80, mi=0x23d9bc0, net_report=true)
    at mysql-5.6.99-m5/sql/rpl_slave.cc:5468
#6  0x00000000005e08b6 in mysql_execute_command (thd=0x2399b80) at mysql-5.6.99-m5/sql/sql_parse.cc:2819
#7  0x00000000005e6b51 in mysql_parse (thd=0x2399b80, rawbuf=0x244cb70 "STOP SLAVE", length=10, parser_state=0x470cfa10)
    at mysql-5.6.99-m5/sql/sql_parse.cc:5831
#8  0x00000000005e7733 in dispatch_command (command=COM_QUERY, thd=0x2399b80, packet=0x24238d1 "STOP SLAVE", packet_length=10)
    at mysql-5.6.99-m5/sql/sql_parse.cc:1128
#9  0x00000000005e8b9e in do_command (thd=0x2399b80) at mysql-5.6.99-m5/sql/sql_parse.cc:800
#10 0x00000000006b9c12 in do_handle_one_connection (thd_arg=0x2399b80) at mysql-5.6.99-m5/sql/sql_connect.cc:1191
#11 0x00000000006b9cdb in handle_one_connection (arg=0x2399b80) at mysql-5.6.99-m5/sql/sql_connect.cc:1130
#12 0x00002aae8a19f143 in start_thread () from /lib64/libpthread.so.0
#13 0x00002aae8a80e8cd in clone () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()

Reproducible on 5.5, trunk, next-mr. Could not reproduce on 5.1 (hence tagging as 'regression').
Workaround is killing the sleeping thread.

The problem might be not limited to SLEEP() only, but I tried some other, relatively long, statements and could not get the same issue with them.

How to repeat:
--source include/master-slave.inc
--source include/have_innodb.inc

--disable_warnings
DROP TABLE IF EXISTS t;

CREATE TABLE t ( i INT ) ENGINE = InnoDB;
INSERT INTO t SELECT SLEEP(4);

--connection slave
let $show_statement= SHOW PROCESSLIST;
let $field= State;
let $condition= = 'User sleep';
--source include/wait_show_condition.inc

--echo # Trying to stop slave
STOP SLAVE;

# Clean up in case we get here

--connection master
DROP TABLE t;

--connection slave
DROP TABLE t;

--exit
[29 Sep 2010 0:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/119325

3206 Davi Arnaut	2010-09-28
      Bug#56096: STOP SLAVE hangs if executed in parallel with user sleep
      
      The problem is that although the slave thread is awaken (via a
      spurious event) when a slave stop is issued, the slave thread
      is not explicitly killed. This can lead to a infinite loop
      if the event which the slave thread was waiting for is not
      properly restarted (and comes to a end eventually). Since the
      SLEEP function sleeps on a timed event in order to be killable
      and to perform periodic checks until the requested time has
      elapsed, the spurious wake up was causing the requested sleep
      time to be reset every two seconds.
      
      The solution is to calculate the requested absolute time only
      once and to ensure that the thread only sleeps until this
      time is elapsed. In case of a spurious wake up, the sleep is
      restarted using the previously calculated absolute time. This
      restores the behavior present in previous releases. If a slave
      thread is executing a SLEEP function, a STOP SLAVE statement
      will wait until the time requested in the sleep function
      has elapsed.
     @ mysql-test/extra/rpl_tests/rpl_start_stop_slave.test
        Add test case result for Bug#56096.
     @ mysql-test/suite/rpl/r/rpl_stm_start_stop_slave.result
        Add test case for Bug#56096.
     @ sql/item_func.cc
        Reorganize interruptible_wait so that the absolute time can be
        preserved across calls to the function. This allow the sleep
        to be properly restarted in the presence of spurious wake ups
        generated by a STOP SLAVE.
[14 Oct 2010 1:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120740

3244 Davi Arnaut	2010-10-13
      Bug#56096: STOP SLAVE hangs if executed in parallel with user sleep
      
      The root of the problem is that to interrupt a slave SQL thread
      wait, the STOP SLAVE implementation uses thd->awake(THD::NOT_KILLED).
      This appears as a spurious wakeup (e.g. from a sleep on a
      condition variable) to the code that the slave SQL thread is
      executing at the time of the STOP. If the code is not written
      to be spurious-wakeup safe, unexpected behavior can occur. For
      the reported case, this problem led to an infinite loop around
      the interruptible_wait() function in item_func.cc (SLEEP()
      function implementation).  The loop was not being properly
      restarted and, consequently, would not come to an end. Since the
      SLEEP function sleeps on a timed event in order to be killable
      and to perform periodic checks until the requested time has
      elapsed, the spurious wake up was causing the requested sleep
      time to be reset every two seconds.
      
      The solution is to calculate the requested absolute time only
      once and to ensure that the thread only sleeps until this
      time is elapsed. In case of a spurious wake up, the sleep is
      restarted using the previously calculated absolute time. This
      restores the behavior present in previous releases. If a slave
      thread is executing a SLEEP function, a STOP SLAVE statement
      will wait until the time requested in the sleep function
      has elapsed.
     @ mysql-test/extra/rpl_tests/rpl_start_stop_slave.test
        Add test case for Bug#56096.
     @ mysql-test/suite/rpl/r/rpl_stm_start_stop_slave.result
        Add test case result for Bug#56096.
     @ sql/item_func.cc
        Reorganize interruptible_wait into a class so that the absolute
        time can be preserved across calls to the wait function. This
        allows the sleep to be properly restarted in the presence of
        spurious wake ups, including those generated by a STOP SLAVE.
[14 Oct 2010 2:29] Davi Arnaut
Queued to mysql-5.5-bugteam.
[13 Nov 2010 16:15] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113152450-2zzcm50e7i4j35v7) (merge vers: 5.6.1-m4) (pib:21)
[13 Nov 2010 16:33] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)
[20 Nov 2010 17:18] Paul DuBois
Noted in 5.5.8 changelog.

If a STOP SLAVE statement was issued while the slave SQL thread was 
executing a statement that invoked the SLEEP() function, both
statements hung.
[16 Dec 2010 22:35] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)