| Bug #84375 | Changing MASTER_DELAY to 0 purges the relay log, causing corruption. | ||
|---|---|---|---|
| Submitted: | 30 Dec 2016 10:27 | Modified: | 2 Aug 2017 14:26 |
| Reporter: | Saverio Miroddi | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Documentation | Severity: | S1 (Critical) |
| Version: | 5.7.17 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[30 Dec 2016 15:33]
MySQL Verification Team
Thank you for the bug report. Please first check with the latest MySQL released binary 5.7.17 and comment here if you get the same issue. Thanks.
[3 Jan 2017 8:01]
Saverio Miroddi
For reference, changed the version to 5.7.17 (Community Edition).
[3 Jan 2017 8:02]
Saverio Miroddi
Proposing the severity to Critical, and in certain setups, this bug will lead to catastrophic results.
[3 Jan 2017 8:05]
Saverio Miroddi
Reproduced on MySQL Community edition 5.7.17. It's very easily reproducible: 1. set a large enough delay 2. create a table on the master 3. make sure the update is replicated to the relay log 4. reset the delay 5. insert a record in the table on the master 6. slave will fail the replication. based on my test, the slave will reset the relay log on 4., which contains the table creation statement. 5. will therefore cause the slave to fail the replication, since it (the slave) won't replay the creation statement, but only the insertion.
[3 Jan 2017 10:54]
MySQL Verification Team
>>based on my test, the slave will reset the relay log on 4., which contains the table creation statement. 5. will therefore cause the slave to fail the replication, since it (the slave) won't replay the creation statement, but only the insertion. Prior to MySQL 5.7.4, CHANGE MASTER TO deletes all relay log files and starts a new one, unless you specify RELAY_LOG_FILE or RELAY_LOG_POS. In that case, relay log files are kept; the relay_log_purge global variable is set silently to 0. In MySQL 5.7.4 and later, relay logs are preserved when neither the slave SQL thread nor the slave I/O thread is stopped; if both threads are stopped, all relay log files are deleted unless you at least one of RELAY_LOG_FILE or RELAY_LOG_POS is specified. In your case, after step 4 - new relay log is generated(i.e slave server creates a new relay log file each time the I/O thread starts). So instead of STOP SLAVE; CHANGE MASTER TO MASTER_DELAY=<value>; START SLAVE; it is enough to do the following with this feature. STOP SLAVE SQL_THREAD; CHANGE MASTER TO MASTER_DELAY=<value>; START SLAVE SQL_THREAD; Please see more details on this here http://mysqlserverteam.com/mysql-5-7-4-change-master-without-stopping-slave-altogether/ For example: -- relay log is rotated WITH STOP SLAVE root@localhost [(none)]> \! ls -l slave/hod03-relay-bin* -rw-r----- 1 umshastr common 177 Jan 3 11:48 slave/hod03-relay-bin.000001 -rw-r----- 1 umshastr common 25 Jan 3 11:48 slave/hod03-relay-bin.index root@localhost [(none)]> root@localhost [(none)]> STOP SLAVE; Query OK, 0 rows affected, 1 warning (0.00 sec) root@localhost [(none)]> CHANGE MASTER TO MASTER_DELAY=900; Query OK, 0 rows affected (0.00 sec) root@localhost [(none)]> START SLAVE; Query OK, 0 rows affected (0.01 sec) root@localhost [(none)]> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: localhost Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 5551 Relay_Log_File: hod03-relay-bin.000002 Relay_Log_Pos: 5766 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes . root@localhost [(none)]> \! ls -l slave/hod03-relay-bin* -rw-r----- 1 umshastr common 207 Jan 3 11:49 slave/hod03-relay-bin.000001 -rw-r----- 1 umshastr common 5766 Jan 3 11:49 slave/hod03-relay-bin.000002 -rw-r----- 1 umshastr common 50 Jan 3 11:49 slave/hod03-relay-bin.index -- relay log is not rotated WITH STOP SLAVE root@localhost [(none)]> \! ls -l slave/hod03-relay-bin* -rw-r----- 1 umshastr common 207 Jan 3 11:49 slave/hod03-relay-bin.000001 -rw-r----- 1 umshastr common 5766 Jan 3 11:49 slave/hod03-relay-bin.000002 -rw-r----- 1 umshastr common 50 Jan 3 11:49 slave/hod03-relay-bin.index root@localhost [(none)]> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) root@localhost [(none)]> CHANGE MASTER TO MASTER_DELAY=0; Query OK, 0 rows affected (0.00 sec) root@localhost [(none)]> START SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) root@localhost [(none)]> \! ls -l slave/hod03-relay-bin* -rw-r----- 1 umshastr common 207 Jan 3 11:49 slave/hod03-relay-bin.000001 -rw-r----- 1 umshastr common 5766 Jan 3 11:49 slave/hod03-relay-bin.000002 -rw-r----- 1 umshastr common 50 Jan 3 11:49 slave/hod03-relay-bin.index root@localhost [(none)]> Could you please check this at your end? Thanks, Umesh
[3 Jan 2017 11:19]
Saverio Miroddi
OK, this works, thanks - not an issue, then.
[3 Jan 2017 11:21]
Saverio Miroddi
I suggest, for clarity, to add a brief explanation to the "Delayed Replication" chapter (https://dev.mysql.com/doc/refman/5.7/en/replication-delayed.html).
[3 Jan 2017 12:37]
MySQL Verification Team
Thank you for the feedback!
[2 Mar 2017 10:42]
Daniƫl van Eeden
Looks like this is a duplicate of: Bug #84249 CHANGE MASTER TO MASTER_DELAY can lead a slave to skip transactions
[24 Jul 2017 19:18]
Shlomi Noach
Affected by same bug.
[2 Aug 2017 14:26]
Margaret Fisher
Posted by developer: Confirmed with development that the problem behavior is now removed by the fix for Changing master_delay after stop slave results in loss of events https://bugs.mysql.com/bug.php?id=81232 which was implemented in the 5.7.19 and 8.0.2 releases. The fix means that issuing CHANGE MASTER TO does not purge the relay log even if both threads are stopped. Added a "See also" reference to the changelog entry for that bug.

Description: When changing the replication delay from a non-zero value to 0, MySQL will cause corruption on the slave, apparently because it purges the relay log. I've checked the documentation (mainly the two pages about CHANGE MASTER, and Delayed replication), and there is no mention of resets in association with MASTER DELAY. How to repeat: Workflow: - setup a slave with a MASTER_DELAY=900 - execute STOP SLAVE - execute CHANGE MASTER TO MASTER_DELAY=0 - execute START SLAVE Slave status before: Read_Master_Log_Pos: 43779970 Relay_Log_File: mysql-relay-bin.000913 Relay_Log_Pos: 12797792 Slave_SQL_Running: Yes Last_Errno: 0 Last_Error: Exec_Master_Log_Pos: 12797579 Relay_Log_Space: 43780437 Seconds_Behind_Master: 899 Last_SQL_Errno: 0 Last_SQL_Error: SQL_Delay: 900 SQL_Remaining_Delay: 1 Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event Last_SQL_Error_Timestamp: Slave status after: Read_Master_Log_Pos: 44869872 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 869 Slave_SQL_Running: No Last_Errno: 1032 Last_Error: Could not execute Update_rows event on table mydb.mytable; Can't find record in 'mytable', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.003470, end_log_pos 44068724 Exec_Master_Log_Pos: 44065228 Relay_Log_Space: 805720 Seconds_Behind_Master: NULL Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Update_rows event on table mydb.mytable; Can't find record in 'mytable', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.003470, end_log_pos 44068724 SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Last_SQL_Error_Timestamp: 161230 09:59:33 Relay log files after: 207 mysql-relay-bin.000001 3587081 mysql-relay-bin.000002 78 mysql-relay-bin.index Suggested fix: There is a possibility that this is a documentation issue, although I don't think this is the case, because this behavior prevents (without trickery) changing a slave from delayed to non-delayed.