MySQL Bugs: #68568: Stop Slave statement will timeout not waiting the event group complete

Bug #68568	Stop Slave statement will timeout not waiting the event group complete
Submitted:	5 Mar 2013 3:17	Modified:	14 Nov 2013 21:10
Reporter:	Max Gao	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6.10	OS:	Linux (CentOS 5.8 X64)
Assigned to:		CPU Architecture:	Any
Tags:	event group, incomplete, slave, stop slave, timeout

Description:
i have a huge MYISAM table , and the application will update the table use follow statement everday :

LOAD DATA LOCAL INFILE 'db.last_log.save.txt' REPLACE INTO TABLE user_stats (log_date, uid ... over 20 fields ...)

this update will take over 3 minutes.

master config binlog_format=mixed

after the statement complete, the row image bin log is generated and transfer to slave.

the slave will take over 3 minutes to update the table, at this time, my backup script do a 'stop slave' and do mysqldump backup to some tables ( but not all ).

at this time, the slave thread should wait for the event group end according to the document:

Note
In MySQL 5.6, STOP SLAVE waits until the current replication event group affecting one or more non-transactional tables has finished executing (if there is any such replication group), or until the user issues a KILL QUERY or KILL CONNECTION statement. (Bug #319, Bug #38205)

but on my server , it ends exactly after 1 minute but not wait the event group finish.

error log below：

2013-03-02 15:12:22 28337 [Warning] Slave SQL: Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group ... , Error_code: 0
2013-03-02 15:13:23 28337 [ERROR] Slave SQL: Fatal error: ... Slave SQL Thread stopped with incomplete event group having non-transactional changes. If the group consists solely of row-based events, you can try to restart the slave with --slave-exec-mode=IDEMPOTENT, which ignores duplicate key, key not found, and similar errors (see documentation for details). Error_code: 1593

i try at different server and different data file , and get the same error at exactly 1 minute.

after this error when i start slave , it will ends with HA_ERR_KEY_NOT_FOUND, and i need to skip these error using config:
slave-exec-mode=IDEMPOTENT

but i think that will cause some data mismacth between master and slave.

How to repeat:
create a MYISAM table with huge initial data over million rows.

use LOAD DATA LOCAL INFILE 'db.last_log.save.txt' REPLACE INTO TABLE
to update the table, the update file should enough huge to make this update take over 3-4 minutes to finish. i use a file over 300,000 lines.

after the slave get the relay log, and do 'stop slave', and wait 1 minute you will see the error in the slave error log file.

Suggested fix:
i doubt there is some timeout setting will finish the 'stop slave' statement ,and this will lead to Slave SQL Thread stopped with incomplete event group having non-transactional changes.

i found a timeout config net_write_timeout, default value is 60 seconds , i set it to 30 seconds to test, but not work, the statement ends at 1 minute also.

Thank you for the report.

I can not repeat described behavior.

Please send us output of SHOW SLAVE STATUS\G right after you are seeing the issue.

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: x
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000008
          Read_Master_Log_Pos: 222923842
               Relay_Log_File: inner-relay-bin.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000008
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 111461981
              Relay_Log_Space: 111462317
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 205
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: fd1de1d2-8495-11e2-9e18-001708517368
             Master_Info_File: /home/mysql2/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Reading event from the relay log
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

mysql> stop slave;
Query OK, 0 rows affected (1 min 7.19 sec)
（slave stop after 1 min，but the update should take over 2 mins）

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: x
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000008
          Read_Master_Log_Pos: 222923842
               Relay_Log_File: inner-relay-bin.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000008
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1593
                   Last_Error: Fatal error: ... Slave SQL Thread stopped with incomplete event group having non-transactional changes. If the group consists solely of row-based events, you can try to restart the slave with --slave-exec-mode=IDEMPOTENT, which ignores duplicate key, key not found, and similar errors (see documentation for details).
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 111461981
              Relay_Log_Space: 111462317
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1593
               Last_SQL_Error: Fatal error: ... Slave SQL Thread stopped with incomplete event group having non-transactional changes. If the group consists solely of row-based events, you can try to restart the slave with --slave-exec-mode=IDEMPOTENT, which ignores duplicate key, key not found, and similar errors (see documentation for details).
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: fd1de1d2-8495-11e2-9e18-001708517368
             Master_Info_File: /home/mysql2/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 130322 21:48:38
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

Verified in 5.5 and 5.6

Quick test can recreate on 5.6.12 with large load data into myisam ,bin log_format=row;

2013-11-08 22:36:05 4663 [Warning] Slave SQL: Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group ... , Error_code: 0
2013-11-08 22:37:11 4663 [ERROR] Slave SQL: Fatal error: ... Slave SQL Thread stopped with incomplete event group having non-transactional changes. If the group consists solely of row-based events,

mysql> load data local infile '/usr/local/mysql/data/outfile2.txt' into table test.test FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';
Query OK, 104606784 rows affected, 65535 warnings (9 min 40.38 sec)

mysql> start slave;select sleep(5);show slave status\G;stop slave sql_thread;

               Last_SQL_Error: Fatal error: ... Slave SQL Thread stopped with incomplete event group having non-transactional changes. If the group consists solely of row-based events, you can try to restart the slave with --slave-exec-mode=IDEMPOTENT, which ignores duplicate key, key not found, and similar errors (see documentation for details).

2013-11-08 22:36:05 4663 [Warning] Slave SQL: Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group ... , Error_code: 0
2013-11-08 22:37:11 4663 [ERROR] Slave SQL: Fatal error: ... Slave SQL Thread stopped with incomplete event group having non-transactional changes. If the group consists solely of row-based events,

^ notice in all cases seems to timeout right at minute

This also happens in MySQL 5.5.39

I see this issue on mysql-5.7.3-m13-linux-glibc2.5-x86_64 also.

PRD-DC2 root@data106:/usr/local/mysql$ service mysql restart
Shutting down MySQL
.2015-09-28T20:39:53.530188Z 0 [Note] Giving 3 client threads a chance to die gracefully
2015-09-28T20:39:53.530247Z 0 [Note] Shutting down slave threads
2015-09-28T20:39:53.841470Z 1 [Note] Slave I/O thread killed while reading event
2015-09-28T20:39:53.841514Z 1 [Note] Slave I/O thread exiting, read up to log 'mysql_binary_log.000194', position 590447037
..............................2015-09-28T20:40:23.746872Z 2 [Warning] Slave SQL: Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group ... , Error_code: 0
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... * 
 * Failed to stop running server, so refusing to try to start.

I'm getting this on CentOS7/MySQL 5.5.52. 

Would love to see it get escalated, or at least if someone has found a decent work around. We've gotten it twice this month and each time had to manually resync the affected table.

Hi,

There is a little trick you can do in order to prevent this bug.

1) Run "Stop slave IO_THREAD"
2) Wait all transactions from relay logs to be executed by SQL_THREAD
3) Check the state of SQL_THREAD (show slave status):
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
4) Now you can run "Stop slave" safely.

Best regards

We see this several times a month during hourly backups of the slave.  Mikhail Golenkov's workaround does not work for us.  I can confirm the slave state is "Slave has read all relay log; waiting for more updates", and upon stopping the slave we still (sometimes) see "Fatal error: ... Slave SQL Thread stopped with incomplete event group having non-transactional changes...".  We need to stop the slave to run a backup, maybe someone can recommend another method of gracefully stopping or backing up a slave?

mysql  Ver 14.14 Distrib 5.7.27