| Bug #87102 | Changing MASTER_DELAY to 0 purges all binlogs and makes recovery impossible | ||
|---|---|---|---|
| Submitted: | 18 Jul 2017 4:23 | Modified: | 20 Sep 2017 6:22 |
| Reporter: | Mikhail Golenkov | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
| Version: | 5.6.36 | OS: | CentOS (CentOS 6,7. x86_64) |
| Assigned to: | Venkatesh Duggirala | CPU Architecture: | Any |
| Tags: | replication delay recovery | ||
[23 Jul 2017 7:33]
MySQL Verification Team
Bug 26410977 - CHANGING MASTER_DELAY DISCARDS RELAY LOGS IF IO_THREAD IS STOPPED
[24 Jul 2017 11:41]
MySQL Verification Team
Hi, Thanks for your report. This is a duplicate of Bug #26410977 - CHANGING MASTER_DELAY DISCARDS RELAY LOGS IF IO_THREAD IS STOPPED A workaround is to use only STOP SLAVE SQL_THREAD; rather than STOP SLAVE; before executing CHANGE MASTER TO MASTER_DELAY=N; all best Bogdan
[24 Jul 2017 11:47]
Дмитрий Афонасов
Dear support team, mysql> stop slave SQL_THREAD; Query OK, 0 rows affected (0.05 sec) mysql> change master to master_delay=0; ERROR 1198 (HY000): This operation cannot be performed with a running slave; run STOP SLAVE first Thus, STOP SLAVE must be executed before executing CHANGE MASTER TO MASTER_DELAY=0. Best regards.
[24 Jul 2017 12:39]
MySQL Verification Team
Hi,
That workaround works on 5.7, on 5.6 there's no workaround I'm afraid
The bug is being processed, not much we can do now but wait for it to be fixed.
all best
Bogdan
slave1 [localhost] {msandbox} ((none)) > CHANGE MASTER TO MASTER_DELAY = 100000;
ERROR 3085 (HY000): This operation cannot be performed with a running slave sql thread; run STOP SLAVE SQL_THREAD FOR CHANNEL '' first.
slave1 [localhost] {msandbox} ((none)) > STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)
slave1 [localhost] {msandbox} ((none)) > CHANGE MASTER TO MASTER_DELAY = 100000;
Query OK, 0 rows affected (0.00 sec)
slave1 [localhost] {msandbox} ((none)) > start slave;
Query OK, 0 rows affected (0.00 sec)
slave1 [localhost] {msandbox} ((none)) > CHANGE MASTER TO MASTER_DELAY = 0;
ERROR 3085 (HY000): This operation cannot be performed with a running slave sql thread; run STOP SLAVE SQL_THREAD FOR CHANNEL '' first.
slave1 [localhost] {msandbox} ((none)) > STOP SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)
slave1 [localhost] {msandbox} ((none)) > CHANGE MASTER TO MASTER_DELAY = 0;
Query OK, 0 rows affected (0.00 sec)
slave1 [localhost] {msandbox} ((none)) > select @@version;
+------------+
| @@version |
+------------+
| 5.7.19-log |
+------------+
1 row in set (0.00 sec)
[25 Jul 2017 2:39]
Mikhail Golenkov
Dear support team, Yes, It works on 5.7. Thank you. Unfortunately, one of our critical application works with 5.6 only. Hope, that the same workaround will be available on 5.6 soon. Best regards.
[25 Jul 2017 4:51]
MySQL Verification Team
:D thanks for confirmation. I rather think the bug will be fixed and the workaround will not be needed. all best Bogdan
[20 Sep 2017 6:22]
Venkatesh Duggirala
Post by Developer: ================== Hello Mikhail, Thank you for the bug report. There are two things that I would like to mention in this bug while I mark its status as "Not a bug" 1) In the description(and the title) you were mentioning "binlogs". Actually the term for the logs that were downloaded on Slave by receiver thread are called "Relay logs". I assume that you were actually talking about "Relay logs" are getting purged and not "binary logs". Please correct me if that is not the case. 2) Assuming that point-1 holds good, purging the relay logs during "Change master" command is known, documented legacy behaviour if you do not specify Relay_log_file/Relay_log_pos in the command. Please have a look at https://dev.mysql.com/doc/refman/5.6/en/change-master-to.html , this says "CHANGE MASTER TO" deletes all relay log files and starts a new one, unless you specify RELAY_LOG_FILE or RELAY_LOG_POS" Please let me know if you see any issues with the relay log files even when you specify one of these two values. Regards, Venkatesh.

Description: Executing "CHANGE MASTER TO MASTER_DELAY = 0" on a slave purges all binary logs that have been downloaded from master. When the master crashes, all transactions should be executed on a slave as fast as possible in order to prepare it to be a new master. However after executing "CHANGE MASTER TO MASTER_DELAY = 0" the slave deletes all binary logs with all data changes and makes restoring impossible. How to repeat: For example, we have standard master-slave replication with a delay = N seconds (6 hours in this case, but in general it can be any delay) Slave is working (IO_THREAD and SQL_THREAD are running), SQL_DELAY = 21600. Creating some content on master: create database mygolenk; create table mygolenk.user like mysql.user; insert into mygolenk.user select * from mysql.user; Then shutting down the master (simulating master crash and loosing all binary logs on master). Thus, our master is down and our aim is to recover database using slave (6 hour lagged) + downloaded binlogs on slave with all data that have been changed for the last 6 hours. List of binary logs on a slave: [root@testforlab-lin2 bin-logs]# ls -lh total 817M -rw-rw---- 1 mysql mysql 120 Jul 18 10:26 mysql-bin.000022 -rw-rw---- 1 mysql mysql 33 Jul 18 10:27 mysql-bin.index -rw-rw---- 1 mysql mysql 211M Jul 18 10:35 mysql-relay-bin.000003 -rw-rw---- 1 mysql mysql 606M Jul 18 11:30 mysql-relay-bin.000004 -rw-rw---- 1 mysql mysql 78 Jul 18 10:44 mysql-relay-bin.index With all changes for last 6 hours, including our create database mygolenk: [root@testforlab-lin2 bin-logs]# mysqlbinlog mysql-relay-bin.000004 | grep -B5 -i "create database" # at 631506131 #170718 11:28:26 server id 1 end_log_pos 631506074 CRC32 0x64d1c757 Query thread_id=197 exec_time=0 error_code=0 SET TIMESTAMP=1500341306/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/; create database mygolenk Relay-log.info file content: [root@testforlab-lin2 mysql]# cat relay-log.info /mysql/bin-logs/mysql-relay-bin.000004 423141238 mysql-bin.000005 423141075 21600 Master.info file content: [root@testforlab-lin2 mysql]# cat master.info mysql-bin.000005 634712142 Master crashed at mysql-bin.000005 pos 634712142 (last event in the binlog) On slave we have: mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Reconnecting after a failed master event read Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 634712142 Relay_Log_File: mysql-relay-bin.000004 Relay_Log_Pos: 429552454 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: Connecting Slave_SQL_Running: Yes Exec_Master_Log_Pos: 429552291 Relay_Log_Space: 855898791 Seconds_Behind_Master: 21376 Last_IO_Errno: 2003 Last_IO_Error: error reconnecting to master 'replication@10.147.39.20:3306' - retry-time: 60 retries: 21 Last_SQL_Errno: 0 Last_SQL_Error: SQL_Delay: 21600 SQL_Remaining_Delay: 224 Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event Trying to rollforward all transactions until the position when the master crashed: START SLAVE UNTIL MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=634712142; 1 warning in a result: mysql> show warnings; +-------+------+--------------------------+ | Level | Code | Message | +-------+------+--------------------------+ | Note | 1254 | Slave is already running | +-------+------+--------------------------+ Stopping SQL_THREAD (SQL_TREAD only to protect downloaded binary logs): mysql> stop slave SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> START SLAVE UNTIL MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=634712142; Query OK, 0 rows affected (0.00 sec) SQL_THREAD is running, but it still executing transactions according SQL_DELAY mysql> show slave status\G *************************** 1. row *************************** Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event We need to remove SQL_Delay (set it to 0) and roll forward all transactions from the binary logs. Trying change master delay without stopping IO_THREAD mysql> stop slave SQL_THREAD; Query OK, 0 rows affected (0.05 sec) mysql> change master to master_delay=0; ERROR 1198 (HY000): This operation cannot be performed with a running slave; run STOP SLAVE first It is necessary to stop both threads to set sql_delay. Stopping slave and changing SQL_Delay: mysql> stop slave; Query OK, 0 rows affected (0.00 sec) mysql> change master to master_delay=0; Query OK, 0 rows affected (0.37 sec) [root@testforlab-lin2 bin-logs]# ls -lh total 16K -rw-rw---- 1 mysql mysql 120 Jul 18 10:26 mysql-bin.000022 -rw-rw---- 1 mysql mysql 33 Jul 18 10:27 mysql-bin.index -rw-rw---- 1 mysql mysql 120 Jul 18 12:14 mysql-relay-bin.000001 -rw-rw---- 1 mysql mysql 39 Jul 18 12:14 mysql-relay-bin.index Finally, we set SQL_Delay = 0, but we lost all downloaded binary logs. mysql> start slave; Query OK, 0 rows affected (0.01 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Connecting to master Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 442374704 Relay_Log_File: mysql-relay-bin.000001 Relay_Log_Pos: 4 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: Connecting Slave_SQL_Running: Yes Exec_Master_Log_Pos: 442374704 Relay_Log_Space: 120 Seconds_Behind_Master: NULL Last_IO_Errno: 2003 Last_IO_Error: error connecting to master 'replication@10.147.39.20:3306' - retry-time: 60 retries: 1 Last_SQL_Errno: 0 Last_SQL_Error: SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | +--------------------+ 4 rows in set (0.00 sec) No mygolenk database. All data for last 6 hours have been lost Suggested fix: In case when the master is down and master's binlogs are not available anymore - changing master_delay leads to purging all downloaded binary logs and loosing all data changes for the period of SQL_Delay. Alternatively, it is possible to wait for 6 hours while SQL_THREAD executes all transactions. If the delay is 1 day - it will take 1 day to recover the database. Obviously it's not appropriate solution. Delayed replication is a good tool that can help DBAs to recover database after user's destructive commands like drop, delete or update (start slave until). However it can not help in case of master's physical crash. Workaround: after master's crash stop SQL_THREAD on a slave, check the file name and position of the last executed transaction (SP). Convert binary logs to a text file using mysqlbinlog tool with a start position (SP). Execute all transactions using mysql client (mysql -u -p < mysql-relay-bin.000005.text). Suggested fix: Change master to master_delay = 0 should not purge all relay logs on a slave.