Bug #40337 | Fsyncing master and relay log to disk after every event is too slow | ||
---|---|---|---|
Submitted: | 27 Oct 2008 3:05 | Modified: | 12 Nov 2009 15:02 |
Reporter: | Alfranio Tavares Correia Junior | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S5 (Performance) |
Version: | 5.0, 5.1, 6.0 | OS: | Any |
Assigned to: | Alfranio Tavares Correia Junior | CPU Architecture: | Any |
[27 Oct 2008 3:05]
Alfranio Tavares Correia Junior
[27 Oct 2008 3:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57079 2704 Alfranio Correia 2008-10-27 Introduced a recovery mechanism that throws away relay-log.bin* retrieved from a master. See BUG#40337.
[27 Oct 2008 3:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57080 2705 Alfranio Correia 2008-10-27 Changed result file for a test case affected by fix proposed for BUG#40337
[27 Oct 2008 4:14]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57081 2706 Alfranio Correia 2008-10-27 Merged patch that was proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.
[27 Oct 2008 15:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57114 2704 Alfranio Correia 2008-10-27 Merged patch that was proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337. And introduced a new recovery process that disregards old relay-log.bin(s) and starts re-fetching from the master based on the information in relay-log.info which may be flushed and synced after each transaction commit.
[29 Oct 2008 16:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57329 2705 Alfranio Correia 2008-10-29 Introduced a recovery mechanism that throws away relay-log.bin* retrieved from a master in case of a crash and updates the master.info based on the relay.info. This is done because such files may be corrupted after a crash. We are assuming that relay.info is fsynced per transaction. See BUG#40337.
[29 Oct 2008 17:53]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57346 2704 Alfranio Correia 2008-10-29 Added option to sync master.info (--sync-master-info, integer) and relay log (--sync-relay-log, integer) to disk after every event and the relay info (--sync-relay-log-info, boolean) per transaction. This supersedes the patch proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.
[29 Oct 2008 18:19]
Alfranio Tavares Correia Junior
How to use the patches: http://lists.mysql.com/commits/57329 and http://lists.mysql.com/commits/57346. To throw away possible corrupted relay-log-bin* and master.info due to a crash and update both information based on the relay-log.info, set the --relay-log-recovery = 1. This option only makes sense, when the relay.info is fsynced per transaction as described below. Side effect: a normal shutdown and startup also throws away logs. To avoid this, be careful to disable the parameter before starting up the slave. In a near future, we will improve the recovery and use the fsyncs provided by --sync-master-info and --sync-relay-log as described below to avoid throwing away logs that are fsynced to disk and therefore are not corrupted. To fsync the master.info and relay.log periodically, change the parameters --sync-master-info = (number of events before the fsync) and --sync-relay-log = (number of events before the fsync). To fsync the relay.info per transaction, we should enable the parameter --sync-relay-log-info = 1.
[5 Nov 2008 18:30]
David Lutz
I am not the first or only person to bring this up, but I want to mention that we should seriously consider dropping the sync-relay-log and sync-master-info options that are being added by this patch. My understanding is that the sync-relay-log-info option will be the only one needed for resiliency, and that sync-relay-log and sync-master-info will only be used to limit the amount of binlog data that may need to be refetched from the replication master during recovery. In most/all modern operating systems, there are already mechanisms to ensure that dirty file system pages will be flushed to disk within a reasonable time. For example, Solaris has fsflush, Linux has update/bdflush, and Windows has lazy write worker threads. These mechanisms generally ensure that a flush to disk will occur within 30 seconds to a few minutes of the buffered file update, so we can also expect that this would be a typical upper limit on the amount of time required to refetch binlog data from the master. It seems to me that if someone were going to use the sync-relay-log and sync-master-info options, the most likely settings would be those that approximate what we already get for free from the operating system. Setting them higher would add no value, and setting them lower would add very little value at a potentially high cost. For example, setting them to a very low value could have a huge performance impact.
[7 Nov 2008 19:22]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/58228 2704 Alfranio Correia 2008-11-07 Added option to sync master.info (--sync-master-info, integer) and relay log (--sync-relay-log, integer) to disk after every event and the relay info (--sync-relay-log-info, boolean) per transaction. This supersedes the patch proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.
[7 Nov 2008 19:25]
Alfranio Tavares Correia Junior
I've removed some dead code in the previous patch and incremented the counters that keep track of the number of events in the same way in all places.
[10 Nov 2008 10:17]
Lars Thalmann
Also see BUG#26540.
[11 Nov 2008 11:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/58440 2704 Alfranio Correia 2008-11-11 Added option to sync master.info (--sync-master-info, integer) and relay log (--sync-relay-log, integer) to disk after every event and the relay info (--sync-relay-log-info, boolean) per transaction. This supersedes the patch proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.
[13 Nov 2008 19:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/58696 2704 Alfranio Correia 2008-11-13 This is an update to BUG#40337 - Fsyncing master and relay log to disk after every event is too slow Added option to sync master.info (--sync-master-info, integer) and relay log (--sync-relay-log, integer) to disk after every event and the relay info (--sync-relay-log-info, boolean) per transaction. This supersedes the patch proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665.
[13 Nov 2008 20:19]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/58700 2704 Alfranio Correia 2008-11-13 BUG#40337 Fsyncing master and relay log to disk after every event is too slow. Introduced a recovery mechanism that throws away relay-log.bin* retrieved from a master in case of a crash and updates the master.info based on the relay.info. This is done because such files may be corrupted after a crash. It is advisalbe to use this feature, when the parameter --syn-relay-log-info = 1.
[8 Dec 2008 1:40]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/60843 2747 Alfranio Correia 2008-12-08 BUG#40337. Introduced four options to improve reliability in the slave: . (--sync-master-info, integer) which syncs the master.info after #th event; . (--sync-relay-log, integer) which syncs the relay-log.bin* after #th events. . (--sync-relay-log-info, integer) which syncs the relay.info after #th transactions. . (--relay-log-recovery, boolean) which enables a recovery mechanism that throws away relay-log.bin* after a crash. The recovery process is carried on because such files may be corrupted due to a crash. This obliges to re-fetch events from them master. It is advisable to use this feature, when the parameter --syn-relay-log-info = 1. This supersedes the patch proposed to fix bugs: BUG#35542 and BUG#31665.
[8 Dec 2008 1:45]
Alfranio Tavares Correia Junior
The previous patches were used to custom builds. The last patch (http://lists.mysql.com/commits/60843) however is to be pushed into 6.0.
[4 Feb 2009 15:12]
Jon Stephens
Added changelog entry, info for --sync-relay-log to 6.0.10 changelog. See Bug #31665 for details. Left status as In-Progress since Alfranio is still working on this (and this particular bug wasn't yet in Documenting).
[4 Feb 2009 16:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/65196 2810 Alfranio Correia 2009-02-04 BUG#40337 Fsyncing master and relay log to disk after every event is too slow The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue when fsyncing the master.info, relay.info and relay-log.bin* after #th events. Although such solution has been proposed to reduce the probability of corrupted files due to a slave-crash, the performance penalty introduced by it has made the approach impractical for highly intensive workloads. In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665 simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and this is the main source of performance issues. This patch introduces new options that give more control to the user on what should be fsynced and how often: 1) (--sync-master-info, integer) which syncs the master.info after #th event; 2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th events. 3) (--sync-relay-log-info, integer) which syncs the relay.info after #th transactions. To provide both performance and increased reliability, we recommend the following setup: 1) --sync-master-info = 0 eventually the operating system will fsync it; 2) --sync-relay-log = 0 eventually the operating system will fsync it; 3) --sync-relay-log-info = 1 fsyncs it after every transaction; Notice, however, that the previous setup does not reduce the probability of corrupted master.info and relay-log.bin*. To overcome the issue, this patch also introduces a recovery mechanism that right after restart throws away relay-log.bin* retrieved from a master and updates the master.info based on the relay.info: 4) (--relay-log-recovery, boolean) which enables a recovery mechanism that throws away relay-log.bin* after a crash. So, it is advisable to setup: 4) --syn-relay-log-info = 1.
[11 Feb 2009 10:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/65869 2811 Alfranio Correia 2009-02-11 BUG#40337 Fsyncing master and relay log to disk after every event is too slow Post-fix.
[12 Feb 2009 8:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/65989 2811 Alfranio Correia 2009-02-12 BUG#40337 Fsyncing master and relay log to disk after every event is too slow Post-fix.
[12 Feb 2009 9:14]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66001 2811 Alfranio Correia 2009-02-12 BUG#40337 Fsyncing master and relay log to disk after every event is too slow Post-fix.
[13 Feb 2009 10:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66173 2810 Alfranio Correia 2009-02-13 BUG#40337 Fsyncing master and relay log to disk after every event is too slow The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue when fsyncing the master.info, relay.info and relay-log.bin* after #th events. Although such solution has been proposed to reduce the probability of corrupted files due to a slave-crash, the performance penalty introduced by it has made the approach impractical for highly intensive workloads. In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665 simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and this is the main source of performance issues. This patch introduces new options that give more control to the user on what should be fsynced and how often: 1) (--sync-master-info, integer) which syncs the master.info after #th event; 2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th events. 3) (--sync-relay-log-info, integer) which syncs the relay.info after #th transactions. To provide both performance and increased reliability, we recommend the following setup: 1) --sync-master-info = 0 eventually the operating system will fsync it; 2) --sync-relay-log = 0 eventually the operating system will fsync it; 3) --sync-relay-log-info = 1 fsyncs it after every transaction; Notice, however, that the previous setup does not reduce the probability of corrupted master.info and relay-log.bin*. To overcome the issue, this patch also introduces a recovery mechanism that right after restart throws away relay-log.bin* retrieved from a master and updates the master.info based on the relay.info: 4) (--relay-log-recovery, boolean) which enables a recovery mechanism that throws away relay-log.bin* after a crash. So, it is advisable to setup: --syn-relay-log-info = 1.
[17 Feb 2009 23:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66709 2810 Alfranio Correia 2009-02-17 BUG#40337 Fsyncing master and relay log to disk after every event is too slow The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue when fsyncing the master.info, relay.info and relay-log.bin* after #th events. Although such solution has been proposed to reduce the probability of corrupted files due to a slave-crash, the performance penalty introduced by it has made the approach impractical for highly intensive workloads. In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665 simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and this is the main source of performance issues. This patch introduces new options that give more control to the user on what should be fsynced and how often: 1) (--sync-master-info, integer) which syncs the master.info after #th event; 2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th events. 3) (--sync-relay-log-info, integer) which syncs the relay.info after #th transactions. To provide both performance and increased reliability, we recommend the following setup: 1) --sync-master-info = 0 eventually the operating system will fsync it; 2) --sync-relay-log = 0 eventually the operating system will fsync it; 3) --sync-relay-log-info = 1 fsyncs it after every transaction; Notice, that the previous setup does not reduce the probability of corrupted master.info and relay-log.bin*. To overcome the issue, this patch also introduces a recovery mechanism that right after restart throws away relay-log.bin* retrieved from a master and updates the master.info based on the relay.info: 4) (--relay-log-recovery, boolean) which enables a recovery mechanism that throws away relay-log.bin* after a crash. However, it can only recover the incorrect binlog file and position in master.info, if other informations (host, port password, etc) are corrupted or incorrect, then this recovery mechanism will fail to work.
[18 Feb 2009 19:14]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66789 2816 Alfranio Correia 2009-02-18 Post-fix for BUG#40337.
[19 Feb 2009 8:54]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66834 2816 Alfranio Correia 2009-02-19 Post-fix for BUG#40337.
[24 Mar 2009 17:20]
Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090324171507-s5aac9guj21l0jz6) (version source revid:alfranio.correia@sun.com-20090219085407-oqz225j1ksij42k1) (merge vers: 6.0.10-alpha) (pib:6)
[25 Mar 2009 21:15]
Jon Stephens
Documented bugfix in the 6.0.11 changelog as follows: The global server variables sync_master_info and sync_relay_log_info are introduced for use on replication slaves to control synchronization of, respectively, the master.info and relay.info files. In each case, setting the variable to a nonzero integer value N causes the slave to synchonize the corresponding file to disk after every N events. Setting its value to 0 allows the operation system to handle syncronization of the file instead. The actions of these variables, when enabled, are analogous to how the sync_binlog variable works with regard to binary logs on a replication master. These variables can also be set in my.cnf, or by using the server options --sync-master-info and --sync-relay-log-info respectively. An additional system variable relay_log_recovery is also now available. When enabled, this variable causes a replication slave to discard relay log files obtained from the replication master following a crash and fetch them again from the master. This variable can also be set in my.cnf, or by using the --relay-log-recovery server option. This fix improves and expands upon one made in MySQL 6.0.10 which introduced the sync_relay_log variable. For more information about all of the server system variables introduced by these fixes, see "Replication Slave Options and Variables". Also added documentation of the new server variables/options to the indicated section of the Manual.
[29 Sep 2009 14:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/85046 3119 Alfranio Correia 2009-09-29 BUG#40337 Fsyncing master and relay log to disk after every event is too slow NOTE: Backporting the patch to next-mr. The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue when fsyncing the master.info, relay.info and relay-log.bin* after #th events. Although such solution has been proposed to reduce the probability of corrupted files due to a slave-crash, the performance penalty introduced by it has made the approach impractical for highly intensive workloads. In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665 simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and this is the main source of performance issues. This patch introduces new options that give more control to the user on what should be fsynced and how often: 1) (--sync-master-info, integer) which syncs the master.info after #th event; 2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th events. 3) (--sync-relay-log-info, integer) which syncs the relay.info after #th transactions. To provide both performance and increased reliability, we recommend the following setup: 1) --sync-master-info = 0 eventually the operating system will fsync it; 2) --sync-relay-log = 0 eventually the operating system will fsync it; 3) --sync-relay-log-info = 1 fsyncs it after every transaction; Notice, that the previous setup does not reduce the probability of corrupted master.info and relay-log.bin*. To overcome the issue, this patch also introduces a recovery mechanism that right after restart throws away relay-log.bin* retrieved from a master and updates the master.info based on the relay.info: 4) (--relay-log-recovery, boolean) which enables a recovery mechanism that throws away relay-log.bin* after a crash. However, it can only recover the incorrect binlog file and position in master.info, if other informations (host, port password, etc) are corrupted or incorrect, then this recovery mechanism will fail to work.
[27 Oct 2009 9:48]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091027094604-9p7kplu1vd2cvcju) (version source revid:zhenxing.he@sun.com-20091026140226-uhnqejkyqx1aeilc) (merge vers: 6.0.14-alpha) (pib:13)
[27 Oct 2009 19:12]
Jon Stephens
Already documented in 6.0.10/6.0.11. Closed.
[12 Nov 2009 8:18]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:alik@sun.com-20091027095744-rf45u3x3q5d1f5y0) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 15:02]
Jon Stephens
Also documented bugfix in the 5.5.0 changelog; closed.