Bug #70669 | Slave can't continue replication after master's crash recovery | ||
---|---|---|---|
Submitted: | 20 Oct 2013 4:25 | Modified: | 27 Feb 2014 13:16 |
Reporter: | Yoshinori Matsunobu (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.6.14 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[20 Oct 2013 4:25]
Yoshinori Matsunobu
[21 Oct 2013 8:32]
MySQL Verification Team
Hello Yoshinori, Thank you for the bug report. Verified as described. Thanks, Umesh
[27 Feb 2014 13:16]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. Fixed in 5.6+. Documented fix in the 5.6.17 and 5.7.4 changelogs as follows: Binary log events could be sent to slaves before they were flushed to disk on the master, even when sync_binlog was set to 1. This could lead to either of those of the following two issues when the master was restarted following a crash of the operating system: ·Replication cannot continue because one or more slaves are requesting replicate events that do not exist on the master. ·Data exists on one or more slaves, but not on the master. Such problems are expected on less durable settings (sync_binlog not equal to 1), but it should not happen when sync_binlog is 1. To fix this issue, a lock (LOCK_log) is now held during synchronization and is released only after the binary events are actually written to disk. Closed. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html
[29 Mar 2014 8:25]
Laurynas Biveinis
5.6$ bzr log -r 5838 ------------------------------------------------------------ revno: 5838 committer: Libing Song <libing.song@oracle.com> branch nick: mysql-5.6 timestamp: Tue 2014-02-25 09:39:34 +0800 message: BUG#17632285 SLAVE CAN'T CONTINUE REPLICATION AFTER MASTER'S CRASH RECOVERY Binary events might be sent to slaves before they are flushed to disk on master, even sync_binlog is set to 1. It can cause two problems if the master restarts after an OS crash. * Replication cannot continue because the slaves are requesting to replication the events don't exist on master. * Data exists on slaves, but not exists on the master. The problems are expected on less durable settings( sync_binlog != 1), but it should not happen on durable setting(sync_binlog = 1). Since 5.6 binlog group commit implementation, binlog write and sync have been protected by separate mutexes. So dump threads can read the binary events simultaneously or even before it is synced to disk. To fixing the problem on durable setting, LOCK_log is hold in sync stage and it is released after the binary events are synced to disk.