Bug #53985 STOP SLAVE hangs due to incomplete transaction modifies non-transactional data
Submitted: 26 May 2010 8:58 Modified: 26 May 2010 13:19
Reporter: Yuan WANG Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1.41 OS:Linux
Assigned to: CPU Architecture:Any
Tags: replication

[26 May 2010 8:58] Yuan WANG
Description:
In our test environment sometimes we found that STOP SLAVE won't complete. Using gdb, we found that the "STOP SLAVE" thread has stopped the IO thread, but SQL thread was blocked at the following position:

Thread 2 (Thread 0x45520950 (LWP 7248)):
#0  Relay_log_info::is_in_group (this=0x47e00000000) at rpl_rli.h:411
#1  <function called from gdb>
#2  0x00007f821eb71d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#3  0x0000000000d01e50 in safe_cond_wait (cond=0xca0e2a0, mp=0xca0dcf8, 
    file=0xe6afff "log.cc", line=4639) at thr_mutex.c:237
#4  0x00000000008001b4 in MYSQL_BIN_LOG::wait_for_update (this=0xca0dcf0, thd=0xca46158, 
    is_slave=true) at log.cc:4639
#5  0x00000000008eecef in next_event (rli=0xca0d860) at slave.cc:4215
#6  0x00000000008f35e8 in exec_relay_log_event (thd=0xca46158, rli=0xca0d860) at slave.cc:2242
#7  0x00000000008f435d in handle_slave_sql (arg=0xca0c470) at slave.cc:3023
#8  0x00007f821eb6dfc7 in start_thread () from /lib/libpthread.so.0
#9  0x00007f821d8d25ad in clone () from /lib/libc.so.6
#10 0x0000000000000000 in ?? ()

The SQL thread have executed all binlog events read by IO thread, however it was still wait for some more events.  After some debuging, we thought the problem may be the following codes in function sql_slave_killed.

    if (rli->abort_slave && rli->is_in_group() &&
        thd->transaction.all.modified_non_trans_table)
      DBUG_RETURN(0);

We found rli->abort_slave in 1, rli->is_in_group() is true, and thd->transaction.all.modified_non_trans_table is true. So SQL thread were in the middle of executing a transaction, and this transactions modifies non-transactional table(this was true, we use non-transactional tables). Because stop in the middle of a such transaction is not safe, so SQL thread decided to continue, hopping for completing this transaction. However, because IO thread has been stopped, SQL thread could not get more binlogs, so it hangs forever. 

How to repeat:
Make a lot of mixed transaction that modifies both transactional and non-transactional tables and do replication. However, its hard to repeat. For most of the time its just ok. 

Suggested fix:
Stop SQL thread before stopping IO thread?
[26 May 2010 11:34] Sveta Smirnova
Thank you for the report.

This is duplicate of bug #45940 which was fixed in 5.5 series. Please upgrade to version 5.5.4.
[26 May 2010 13:16] Yuan WANG
Thank you for quick reply. However, MySQL 5.5 is not a GA release, so we could not take the risk of upgrading. Could the fix be ported to MySQL 5.1?
[26 May 2010 13:19] Yuan WANG
Or when will MySQL 5.5 be declared GA? If it will be soon, we can wait.