Bug #55185 STOP SLAVE io_thread spins sql_thread indefinitely
Submitted: 12 Jul 2010 18:32 Modified: 13 Jul 2010 23:42
Reporter: Kavita T Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1.41 OS:Linux (RHEL4U7 - 2.6.9-78.ELsmp)
Assigned to: CPU Architecture:Any
Tags: STOP SLAVE SQL_THREAD

[12 Jul 2010 18:32] Kavita T
Description:
daily LVM snapshot backup of mysql data_dir is scheduled using - 
1. slave stop io_thread
2. Following code that ensures sql_thread has caught-up with I/O thread.
However, Sql_thread stops in the middle of the transaction (noted a number of times) and the backup script hangs until a manual intervention of "stop slave, start slave" is initiated.

if ($args{sync_sql}) {
       # Wait until the slave SQL thread has caught up to the current
       # I/O thread position.
       while(1) {
           my $slave_status = $self->slave_status;
           unless ($slave_status) {
               croak $@;
           }
           if(($slave_status->{'Master_Log_File'} eq
               $slave_status->{'Relay_Master_Log_File'}) &&
              ($slave_status->{'Read_Master_Log_Pos'} ==
               $slave_status->{'Exec_Master_Log_Pos'})) {
               last;
           } else {
               print "Waiting for SQL thread to catch up to I/O thread.\n" if $args{verbose};
               sleep 5;
           }
       }                             

3. Show slave status :
              Master_Log_File: dbmaster1_live_media_sp2.000390
        Relay_Master_Log_File: dbmaster1_live_media_sp2.000390
          Read_Master_Log_Pos: 374263413
          Exec_Master_Log_Pos: 374263333

             Slave_IO_Running: No
            Slave_SQL_Running: Yes

4. Relay log status
# at 374263493
#100525  1:01:02 server id 1653179955  end_log_pos 374263413    Query   thread_id=151497650     exec_time=0    
error_code=0
SET TIMESTAMP=1274749262/*!*/;
BEGIN
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

Master binlog extract (dbmaster1_live_media_sp2.000390)
-------------
# at 374263333
#100525  1:01:02 server id 1653179955  end_log_pos 374263413    Query   thread_id=151497650     exec_time=0    
error_code=0
SET TIMESTAMP=1274749262/*!*/;
BEGIN
/*!*/;
# at 374263413
#100525  1:01:02 server id 1653179955  end_log_pos 374263843    Query   thread_id=151497650     exec_time=0    
error_code=0
SET TIMESTAMP=1274749262/*!*/;
REPLACE INTO `sched_cache` SET `group`='resolvedScheduleSet-full', `key`='full', `from_time`='2010-05-25 00:00:00',
`to_time`='2010-05-26 01:00:00', `data`='<?xml version=\"1.0\
" encoding=\"utf-8\"?>\n<ResolvedScheduleSet dbSnapShotTime=\"2010-05-25T01:01:02Z\"
fromTime=\"2010-05-25T00:00:00Z\"
toTime=\"2010-05-26T01:00:00Z\">\n</ResolvedScheduleSet>\n
'
/*!*/;
# at 374263843
#100525  1:01:02 server id 1653179955  end_log_pos 374263870    Xid = 846003619
COMMIT/*!*/;

How to repeat:
Appears to occur randomly, unable to replicate at will.
[12 Jul 2010 19:57] Sveta Smirnova
Thank you for the report.

What do you mean by "Sql_thread stops in the middle of the transaction"? I see in your application no check if transaction in progress. Maybe do you mean by this SQL thread can not reach IO thread forever?
[12 Jul 2010 20:16] Kavita T
Correct. sql_thread waits to catch-up with io_thread with the following messages in system log.

May 25 01:07:43 db1 mysql_backup: Slave IO thread stopped.
May 25 01:07:43 db1 mysql_backup: Waiting for SQL thread to catch up to I/O thread.
May 25 01:14:33 db1 mysql_backup: Waiting for SQL thread to catch up to I/O thread.
May 25 01:35:03 db1 mysql_backup: Waiting for SQL thread to catch up to I/O thread.
May 25 01:41:53 db1 mysql_backup: Waiting for SQL thread to catch up to I/O thread.
May 25 01:55:34 db1 mysql_backup: Waiting for SQL thread to catch up to I/O thread.
May 25 02:16:04 db1 mysql_backup: Waiting for SQL thread to catch up to I/O thread.
May 25 02:43:20 db1 mysql_backup: Waiting for SQL thread to catch up to I/O thread.
May 25 02:50:10 db1 mysql_backup: Waiting for SQL thread to catch up to I/O thread.
[12 Jul 2010 20:24] Sveta Smirnova
Thank you for the feedback.

This is not a bug in this case. SQL thread never catches IO thread up if IO thread stopped in the middle of SQL transaction.
[12 Jul 2010 20:30] Kavita T
Thanks Sveta, this seems to occur sporadically and while I understand sql_thread will never catch-up with io_thread if stopped in the middle of a transaction, I'm unsure how to avoid this error or prevent io_thread from stopping in the middle of a transaction. Any pointers appreciated.
[12 Jul 2010 20:43] Rick James
The times I have dug into it, I see a single command waiting:  BEGIN.

Meanwhile other slaves are happily chugging along.  (I mention this to imply that there is nothing wrong with replication in general, nor is there a dangling BEGIN.)
[13 Jul 2010 23:42] Kavita T
Could this be related to bug 53985 (http://bugs.mysql.com/bug.php?id=53985) ?