| Bug #56096 | STOP SLAVE hangs if executed in parallel with user sleep | ||
|---|---|---|---|
| Submitted: | 18 Aug 2010 19:24 | Modified: | 20 Nov 2010 17:18 |
| Reporter: | Elena Stepanova | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server | Severity: | S3 (Non-critical) |
| Version: | 5.5.5-m3, 5.6.1-m4 | OS: | Any |
| Assigned to: | Davi Arnaut | CPU Architecture: | Any |
| Tags: | regression | ||
[29 Sep 2010 0:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/119325 3206 Davi Arnaut 2010-09-28 Bug#56096: STOP SLAVE hangs if executed in parallel with user sleep The problem is that although the slave thread is awaken (via a spurious event) when a slave stop is issued, the slave thread is not explicitly killed. This can lead to a infinite loop if the event which the slave thread was waiting for is not properly restarted (and comes to a end eventually). Since the SLEEP function sleeps on a timed event in order to be killable and to perform periodic checks until the requested time has elapsed, the spurious wake up was causing the requested sleep time to be reset every two seconds. The solution is to calculate the requested absolute time only once and to ensure that the thread only sleeps until this time is elapsed. In case of a spurious wake up, the sleep is restarted using the previously calculated absolute time. This restores the behavior present in previous releases. If a slave thread is executing a SLEEP function, a STOP SLAVE statement will wait until the time requested in the sleep function has elapsed. @ mysql-test/extra/rpl_tests/rpl_start_stop_slave.test Add test case result for Bug#56096. @ mysql-test/suite/rpl/r/rpl_stm_start_stop_slave.result Add test case for Bug#56096. @ sql/item_func.cc Reorganize interruptible_wait so that the absolute time can be preserved across calls to the function. This allow the sleep to be properly restarted in the presence of spurious wake ups generated by a STOP SLAVE.
[14 Oct 2010 1:54]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/120740 3244 Davi Arnaut 2010-10-13 Bug#56096: STOP SLAVE hangs if executed in parallel with user sleep The root of the problem is that to interrupt a slave SQL thread wait, the STOP SLAVE implementation uses thd->awake(THD::NOT_KILLED). This appears as a spurious wakeup (e.g. from a sleep on a condition variable) to the code that the slave SQL thread is executing at the time of the STOP. If the code is not written to be spurious-wakeup safe, unexpected behavior can occur. For the reported case, this problem led to an infinite loop around the interruptible_wait() function in item_func.cc (SLEEP() function implementation). The loop was not being properly restarted and, consequently, would not come to an end. Since the SLEEP function sleeps on a timed event in order to be killable and to perform periodic checks until the requested time has elapsed, the spurious wake up was causing the requested sleep time to be reset every two seconds. The solution is to calculate the requested absolute time only once and to ensure that the thread only sleeps until this time is elapsed. In case of a spurious wake up, the sleep is restarted using the previously calculated absolute time. This restores the behavior present in previous releases. If a slave thread is executing a SLEEP function, a STOP SLAVE statement will wait until the time requested in the sleep function has elapsed. @ mysql-test/extra/rpl_tests/rpl_start_stop_slave.test Add test case for Bug#56096. @ mysql-test/suite/rpl/r/rpl_stm_start_stop_slave.result Add test case result for Bug#56096. @ sql/item_func.cc Reorganize interruptible_wait into a class so that the absolute time can be preserved across calls to the wait function. This allows the sleep to be properly restarted in the presence of spurious wake ups, including those generated by a STOP SLAVE.
[14 Oct 2010 2:29]
Davi Arnaut
Queued to mysql-5.5-bugteam.
[13 Nov 2010 16:15]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113152450-2zzcm50e7i4j35v7) (merge vers: 5.6.1-m4) (pib:21)
[13 Nov 2010 16:33]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)
[20 Nov 2010 17:18]
Paul DuBois
Noted in 5.5.8 changelog. If a STOP SLAVE statement was issued while the slave SQL thread was executing a statement that invoked the SLEEP() function, both statements hung.
[16 Dec 2010 22:35]
Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)

Description: If STOP SLAVE command is issued while slave SQL thread is executing a statement which involves SLEEP(), both statements hang, seemingly forever. Decreasing all timeouts I could think of did not help to resolve the problem. Thread 2 (Thread 1192302912 (LWP 2890)): #0 0x00002aae8a1a33b7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000aa9ef4 in safe_cond_timedwait (cond=0x4710ed60, mp=0x12e4240, abstime=0x4710ecf0, file=0xbdd3c8 "mysql-5.6.99-m5/include/mysql/psi/mysql_thread.h", line=1089) at mysql-5.6.99-m5/mysys/thr_mutex.c:278 #2 0x0000000000795b20 in inline_mysql_cond_timedwait (that=0x4710ed60, mutex=0x12e4240, abstime=0x4710ecf0, src_file=0xbdccc8 "mysql-5.6.99-m5/sql/item_func.cc", src_line=3724) at mysql-5.6.99-m5/include/mysql/psi/mysql_thread.h:1089 #3 0x0000000000795d5c in interruptible_wait (thd=0x24294b0, cond=0x4710ed60, lock=0x12e4240, time=4) at mysql-5.6.99-m5/sql/item_func.cc:3724 #4 0x0000000000795ed7 in Item_func_sleep::val_int (this=0x241a868) at mysql-5.6.99-m5/sql/item_func.cc:4043 #5 0x000000000074f87c in Item::save_in_field (this=0x241a868, field=0x2429120, no_conversions=false) at mysql-5.6.99-m5/sql/item.cc:5467 #6 0x000000000058a35a in fill_record (thd=0x24294b0, ptr=0x2429118, values=@0x242b580, ignore_errors=true) at mysql-5.6.99-m5/sql/sql_base.cc:8389 #7 0x000000000058c539 in fill_record_n_invoke_before_triggers (thd=0x24294b0, ptr=0x2429110, values=@0x242b580, ignore_errors=true, triggers=0x0, event=TRG_EVENT_INSERT) at mysql-5.6.99-m5/sql/sql_base.cc:8432 #8 0x00000000005c5357 in select_insert::store_values (this=0x241a940, values=@0x242b580) at mysql-5.6.99-m5/sql/sql_insert.cc:3307 #9 0x00000000005c8099 in select_insert::send_data (this=0x241a940, values=@0x242b580) at mysql-5.6.99-m5/sql/sql_insert.cc:3243 #10 0x0000000000638744 in JOIN::exec (this=0x2431270) at mysql-5.6.99-m5/sql/sql_select.cc:2633 #11 0x0000000000634fbb in mysql_select (thd=0x24294b0, rref_pointer_array=0x242b660, tables=0x0, wild_num=0, fields=@0x242b580, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=3490712320, result=0x241a940, unit=0x242ae60, select_lex=0x242b478) at mysql-5.6.99-m5/sql/sql_select.cc:3338 #12 0x000000000063a9d5 in handle_select (thd=0x24294b0, lex=0x242adb0, result=0x241a940, setup_tables_done_option=1073741824) at mysql-5.6.99-m5/sql/sql_select.cc:323 #13 0x00000000005e1bcd in mysql_execute_command (thd=0x24294b0) at mysql-5.6.99-m5/sql/sql_parse.cc:3205 #14 0x00000000005e6b51 in mysql_parse (thd=0x24294b0, rawbuf=0x23ac899 "INSERT INTO t SELECT SLEEP(4)", length=29, parser_state=0x47110780) at mysql-5.6.99-m5/sql/sql_parse.cc:5831 #15 0x00000000008723e8 in Query_log_event::do_apply_event (this=0x23f4ed0, rli=0x23daff8, query_arg=0x23ac899 "INSERT INTO t SELECT SLEEP(4)", q_len_arg=29) at mysql-5.6.99-m5/sql/log_event.cc:3340 #16 0x0000000000872b0f in Query_log_event::do_apply_event (this=0x23f4ed0, rli=0x23daff8) at mysql-5.6.99-m5/sql/log_event.cc:3121 #17 0x00000000008916c2 in Log_event::apply_event (this=0x23f4ed0, rli=0x23daff8) at mysql-5.6.99-m5/sql/log_event.h:1131 #18 0x00000000008a2d43 in apply_event_and_update_pos (ev=0x23f4ed0, thd=0x24294b0, rli=0x23daff8) at mysql-5.6.99-m5/sql/rpl_slave.cc:2602 #19 0x00000000008a3373 in exec_relay_log_event (thd=0x24294b0, rli=0x23daff8) at mysql-5.6.99-m5/sql/rpl_slave.cc:2779 #20 0x00000000008a4272 in handle_slave_sql (arg=0x23d9bc0) at mysql-5.6.99-m5/sql/rpl_slave.cc:3608 #21 0x00002aae8a19f143 in start_thread () from /lib64/libpthread.so.0 #22 0x00002aae8a80e8cd in clone () from /lib64/libc.so.6 #23 0x0000000000000000 in ?? () Thread 3 (Thread 1192036672 (LWP 2888)): #0 0x00002aae8a1a33b7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000aa9ef4 in safe_cond_timedwait (cond=0x23dc820, mp=0x23dc778, abstime=0x470ce170, file=0xc03cc8 "mysql-5.6.99-m5/include/mysql/psi/mysql_thread.h", line=1089) at mysql-5.6.99-m5/mysys/thr_mutex.c:278 #2 0x00000000008aa308 in inline_mysql_cond_timedwait (that=0x23dc820, mutex=0x23dc778, abstime=0x470ce170, src_file=0xc010e8 "mysql-5.6.99-m5/sql/rpl_slave.cc", src_line=660) at mysql-5.6.99-m5/include/mysql/psi/mysql_thread.h:1089 #3 0x00000000008aa6ca in terminate_slave_thread (thd=0x24294b0, term_lock=0x23dc778, term_cond=0x23dc820, slave_running=0x23dcfb8, skip_lock=true) at mysql-5.6.99-m5/sql/rpl_slave.cc:660 #4 0x00000000008aaa63 in terminate_slave_threads (mi=0x23d9bc0, thread_mask=3, skip_lock=true) at mysql-5.6.99-m5/sql/rpl_slave.cc:538 #5 0x00000000008aaca5 in stop_slave (thd=0x2399b80, mi=0x23d9bc0, net_report=true) at mysql-5.6.99-m5/sql/rpl_slave.cc:5468 #6 0x00000000005e08b6 in mysql_execute_command (thd=0x2399b80) at mysql-5.6.99-m5/sql/sql_parse.cc:2819 #7 0x00000000005e6b51 in mysql_parse (thd=0x2399b80, rawbuf=0x244cb70 "STOP SLAVE", length=10, parser_state=0x470cfa10) at mysql-5.6.99-m5/sql/sql_parse.cc:5831 #8 0x00000000005e7733 in dispatch_command (command=COM_QUERY, thd=0x2399b80, packet=0x24238d1 "STOP SLAVE", packet_length=10) at mysql-5.6.99-m5/sql/sql_parse.cc:1128 #9 0x00000000005e8b9e in do_command (thd=0x2399b80) at mysql-5.6.99-m5/sql/sql_parse.cc:800 #10 0x00000000006b9c12 in do_handle_one_connection (thd_arg=0x2399b80) at mysql-5.6.99-m5/sql/sql_connect.cc:1191 #11 0x00000000006b9cdb in handle_one_connection (arg=0x2399b80) at mysql-5.6.99-m5/sql/sql_connect.cc:1130 #12 0x00002aae8a19f143 in start_thread () from /lib64/libpthread.so.0 #13 0x00002aae8a80e8cd in clone () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () Reproducible on 5.5, trunk, next-mr. Could not reproduce on 5.1 (hence tagging as 'regression'). Workaround is killing the sleeping thread. The problem might be not limited to SLEEP() only, but I tried some other, relatively long, statements and could not get the same issue with them. How to repeat: --source include/master-slave.inc --source include/have_innodb.inc --disable_warnings DROP TABLE IF EXISTS t; CREATE TABLE t ( i INT ) ENGINE = InnoDB; INSERT INTO t SELECT SLEEP(4); --connection slave let $show_statement= SHOW PROCESSLIST; let $field= State; let $condition= = 'User sleep'; --source include/wait_show_condition.inc --echo # Trying to stop slave STOP SLAVE; # Clean up in case we get here --connection master DROP TABLE t; --connection slave DROP TABLE t; --exit