Bug #40337 Fsyncing master and relay log to disk after every event is too slow
Submitted: 27 Oct 2008 3:05 Modified: 12 Nov 2009 15:02
Reporter: Alfranio Junior Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S5 (Performance)
Version:5.0, 5.1, 6.0 OS:Any
Assigned to: Alfranio Junior CPU Architecture:Any

[27 Oct 2008 3:05] Alfranio Junior
Description:
BUG#35542 and BUG#31665 proposed to fsync the master and relay log to disk after every event in order to make slave more resilient to crashes.

This adds the configuration option: sync_relay_log. There are fsync calls on three files that are introduced by the --sync-relay-log option:

1) relay-log, fsync on every event
2) master.info, fsync on every event
3) relay-log.info, fsync on every transaction

== Problem

* There is a HUGE performance hit while replaying logs into the
slave.
* This is introduced by the extra fsyncs happening.
* Facts:
   - SQL Thread is always waiting for the IO Thread to queue event:
   "Has read all relay log; waiting for the slave I/O thread to
update it". Checking the source code, one finds that SQL thread and IO
thread share the log_lock mutex (if reading and writing to the same
log file). The SQL thread waits on the update_cond while IO thread
appends (appendv call) events to the relay_log.

  - IO Thread always reports queuing event:
  "Queueing master event to the relay log"

How to repeat:
== Performance Hit Assessment done by Luis Soares and reproduced by Alfranio
by means of the Query Generator. 

Runs
   1. vanilla
   2. patched - sync_relay_log=0
   2. fsync relay-log.info only
   3. fsync master.info and slave-relay-log.info
   4. fsync relay-log, master.info and relay-log.info

Obs

   1) relay-log, fsync on every event
   2) master.info, fsync on every event
   3) relay-log.info, fsync on every transaction
   4) a huge set of small transactions.
   5) some of the runs consisted in removing some fsyncs
      from the patch (2. and 3.)
   7) tests were conducted by: (i) inserting everything in
      the master; (ii) connecting the slave once inserts in master are
      done; (iii) timing slave catch-up.
   
In what follows, we have hdparm -W0 --direct on harddrive: 

Results

   1. vanilla:
         0m57.720s

   2. patched, sync_relay_log=0:
         1m3.772s

   3. fsync relay-log.info:
         0m58.427s

     3.1. fsync relay-log.info (each transaction is one insert only):
         it is much worse (>>) than 3. (~20x worse)

   4. fsync master.info + relay-log.info:
         16m12.957s

The smaller the transaction the worse the result. For instance, with one statement per transaction: 
   5. fsync relay-log + master.info + relay-log.info:
         90m50.770s

Suggested fix:
== Proposed Solution to Improve Performance

Some thoughts on how to do just one sync (relay-log.info):

   1. Flush the relay-log.info after each transaction whenever the
--sync-relay-log is not equal to zero (default value). 

   2. Flush the master.info and relay-log.bin as is done by the patch http://lists.mysql.com/commits/56245. However, instead of having a --sync-relay-log equals to one as suggested by the patch, we propose to keep this value >>>>>>> 1. Fsyncing the master-log.info and relay-log.bin does not increase resilience and only contributes to a faster recovery after a crash.

   3. If a crash happens look at the relay-log.info to see which transactions need to be retrieved from the master. Note that we cannot relay on the master.info. There are two possible approaches to this third step:

     3.1 Forget any relay-log.bin already retrieved and start from the
position given by the relay-log.info.

     3.2 Try to see which is the latest relay-log.bin that is not corrupted and update the necessary meta-information.

A patch for items 1, 2 and 3.1 is being developed.
[27 Oct 2008 3:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57079

2704 Alfranio Correia	2008-10-27
      Introduced a recovery mechanism that throws away relay-log.bin* retrieved
      from a master. See BUG#40337.
[27 Oct 2008 3:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57080

2705 Alfranio Correia	2008-10-27
      Changed result file for a test case affected by fix proposed for BUG#40337
[27 Oct 2008 4:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57081

2706 Alfranio Correia	2008-10-27
      Merged patch that was proposed by Sinisa and modified by He Zhenxing to
      fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.
[27 Oct 2008 15:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57114

2704 Alfranio Correia	2008-10-27
      Merged patch that was proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.
      And introduced a new recovery process that disregards old relay-log.bin(s) and starts
      re-fetching from the master based on the information in relay-log.info which may be flushed and synced after each transaction commit.
[29 Oct 2008 16:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57329

2705 Alfranio Correia	2008-10-29
      Introduced a recovery mechanism that throws away relay-log.bin* retrieved
      from a master in case of a crash and updates the master.info based on
      the relay.info. This is done because such files may be corrupted after
      a crash. We are assuming that relay.info is fsynced per transaction.
      See BUG#40337.
[29 Oct 2008 17:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57346

2704 Alfranio Correia	2008-10-29
      Added option to sync master.info (--sync-master-info, integer) and relay log (--sync-relay-log, integer) to disk after every event and the relay info (--sync-relay-log-info, boolean) per transaction. This supersedes the patch proposed by Sinisa and modified by He Zhenxing 
      to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.
[29 Oct 2008 18:19] Alfranio Junior
How to use the patches: http://lists.mysql.com/commits/57329 and http://lists.mysql.com/commits/57346.

To throw away possible corrupted relay-log-bin* and master.info due to a crash and update both information based on the relay-log.info, set the --relay-log-recovery = 1. This option only makes sense, when the relay.info is fsynced per transaction as described below.

Side effect: a normal shutdown and startup also throws away logs. To avoid this, be careful to disable the parameter before starting up the slave. In a near future, we will improve the recovery and use the fsyncs provided by --sync-master-info and --sync-relay-log as described below to avoid throwing away logs that are fsynced to disk and therefore are not corrupted.

To fsync the master.info and relay.log periodically, change the parameters --sync-master-info = (number of events before the fsync) and --sync-relay-log = (number of events before the fsync). To fsync the relay.info per transaction, we should enable the parameter --sync-relay-log-info = 1.
[5 Nov 2008 18:30] David Lutz
I am not the first or only person to bring this up, but I want to mention that we should seriously consider dropping the sync-relay-log and sync-master-info options that are being added by this patch.  My understanding is that the sync-relay-log-info option will be the only one needed for resiliency, and that sync-relay-log and sync-master-info will only be used to limit the amount of binlog data that may need to be refetched from the replication master during recovery.

In most/all modern operating systems, there are already mechanisms to ensure that dirty file system pages will be flushed to disk within a reasonable time.  For example, Solaris has fsflush, Linux has update/bdflush, and Windows has lazy write worker threads.  These mechanisms generally ensure that a flush to disk will occur within 30 seconds to a few minutes of the buffered file update, so we can also expect that this would be a typical upper limit on the amount of time required to refetch binlog data from the master.

It seems to me that if someone were going to use the sync-relay-log and sync-master-info options, the most likely settings would be those that approximate what we already get for free from the operating system.  Setting them higher would add no value, and setting them lower would add very little value at a potentially high cost.  For example, setting them to a very low value could have a huge performance impact.
[7 Nov 2008 19:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58228

2704 Alfranio Correia	2008-11-07
      Added option to sync master.info (--sync-master-info, integer) 
      and relay log (--sync-relay-log, integer) to disk after 
      every event and the relay info (--sync-relay-log-info, boolean) per transaction. 
      This supersedes the patch proposed 
      by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.
[7 Nov 2008 19:25] Alfranio Junior
I've removed some dead code in the previous patch and incremented the counters that keep track of the number of events in the same way in all places.
[10 Nov 2008 10:17] Lars Thalmann
Also see BUG#26540.
[11 Nov 2008 11:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58440

2704 Alfranio Correia	2008-11-11
      Added option to sync master.info (--sync-master-info, integer) and relay log (--sync-relay-log, integer) to disk after
      every event and the relay info (--sync-relay-log-info, boolean) per transaction.
      This supersedes the patch proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. 
      This is an update to BUG#40337.
[13 Nov 2008 19:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58696

2704 Alfranio Correia	2008-11-13
      This is an update to BUG#40337 - Fsyncing master and relay log to disk after every event is too slow
      
      Added option to sync master.info (--sync-master-info, integer) and relay log
      (--sync-relay-log, integer) to disk after every event and the relay info 
      (--sync-relay-log-info, boolean) per transaction.
      
      This supersedes the patch proposed by Sinisa and modified by He Zhenxing to fix
      bugs: BUG#35542 and BUG#31665.
[13 Nov 2008 20:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58700

2704 Alfranio Correia	2008-11-13
      BUG#40337 Fsyncing master and relay log to disk after every
      event is too slow.
      
      Introduced a recovery mechanism that throws away relay-log.bin* retrieved
      from a master in case of a crash and updates the master.info based on 
      the relay.info. This is done because such files may be corrupted after 
      a crash. 
      
      It is advisalbe to use this feature, when the parameter --syn-relay-log-info = 1.
[8 Dec 2008 1:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60843

2747 Alfranio Correia	2008-12-08
      BUG#40337. Introduced four options to improve reliability in the slave:
      
      . (--sync-master-info, integer) which syncs the master.info after #th event;
      . (--sync-relay-log, integer) which syncs the relay-log.bin* after #th events.
      . (--sync-relay-log-info, integer) which syncs the relay.info after #th transactions.
      . (--relay-log-recovery, boolean) which enables a recovery mechanism that
      throws away relay-log.bin* after a crash.
      
      The recovery process is carried on because such files may be corrupted due to
      a crash. This obliges to re-fetch events from them master. It is advisable
      to use this feature, when the parameter --syn-relay-log-info = 1.
      
      This supersedes the patch proposed to fix bugs: BUG#35542 and BUG#31665.
[8 Dec 2008 1:45] Alfranio Junior
The previous patches were used to custom builds. 
The last patch (http://lists.mysql.com/commits/60843) 
however is to be pushed into 6.0.
[4 Feb 2009 15:12] Jon Stephens
Added changelog entry, info for --sync-relay-log to 6.0.10 changelog.

See Bug #31665 for details.

Left status as In-Progress since Alfranio is still working on this (and this particular bug wasn't yet in Documenting).
[4 Feb 2009 16:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65196

2810 Alfranio Correia	2009-02-04
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
      
      The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
      when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
      Although such solution has been proposed to reduce the probability of corrupted
      files due to a slave-crash, the performance penalty introduced by it has
      made the approach impractical for highly intensive workloads.
      
      In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
      simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
      this is the main source of performance issues.
      
      This patch introduces new options that give more control to the user on
      what should be fsynced and how often:
            1) (--sync-master-info, integer) which syncs the master.info after #th event;
            2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th events.
            3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
            transactions.
      
      To provide both performance and increased reliability, we recommend the following
      setup:
            1) --sync-master-info = 0 eventually the operating system will fsync it;
            2) --sync-relay-log = 0 eventually the operating system will fsync it;
            3) --sync-relay-log-info = 1 fsyncs it after every transaction;
      
      Notice, however, that the previous setup does not reduce the probability of corrupted
      master.info and relay-log.bin*. To overcome the issue, this patch also introduces
      a recovery mechanism that right after restart throws away relay-log.bin* retrieved
      from a master and updates the master.info based on the relay.info:
            4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
            throws away relay-log.bin* after a crash.
      
      So, it is advisable to setup:
            4) --syn-relay-log-info = 1.
[11 Feb 2009 10:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65869

2811 Alfranio Correia	2009-02-11
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
      
      Post-fix.
[12 Feb 2009 8:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65989

2811 Alfranio Correia	2009-02-12
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
            
      Post-fix.
[12 Feb 2009 9:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66001

2811 Alfranio Correia	2009-02-12
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
      
      Post-fix.
[13 Feb 2009 10:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66173

2810 Alfranio Correia	2009-02-13
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
            
      The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
      when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
      Although such solution has been proposed to reduce the probability of corrupted
      files due to a slave-crash, the performance penalty introduced by it has
      made the approach impractical for highly intensive workloads.
            
      In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
      simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
      this is the main source of performance issues.
            
      This patch introduces new options that give more control to the user on
      what should be fsynced and how often:
        1) (--sync-master-info, integer) which syncs the master.info after #th event;
        2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th
      events.
        3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
        transactions.
            
      To provide both performance and increased reliability, we recommend the following
      setup:
        1) --sync-master-info = 0 eventually the operating system will fsync it;
        2) --sync-relay-log = 0 eventually the operating system will fsync it;
        3) --sync-relay-log-info = 1 fsyncs it after every transaction;
            
        Notice, however, that the previous setup does not reduce the probability of
      corrupted master.info and relay-log.bin*. To overcome the issue, this patch also 
      introduces a recovery mechanism that right after restart throws away relay-log.bin* 
      retrieved from a master and updates the master.info based on the relay.info:
        4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
        throws away relay-log.bin* after a crash.
            
        So, it is advisable to setup: --syn-relay-log-info = 1.
[17 Feb 2009 23:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66709

2810 Alfranio Correia	2009-02-17
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
      
      The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
      when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
      Although such solution has been proposed to reduce the probability of corrupted
      files due to a slave-crash, the performance penalty introduced by it has
      made the approach impractical for highly intensive workloads.
      
      In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
      simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
      this is the main source of performance issues.
      
      This patch introduces new options that give more control to the user on
      what should be fsynced and how often:
      
         1) (--sync-master-info, integer) which syncs the master.info after #th event;
         2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th
         events.
         3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
         transactions.
      
         To provide both performance and increased reliability, we recommend the following
         setup:
      
         1) --sync-master-info = 0 eventually the operating system will fsync it;
         2) --sync-relay-log = 0 eventually the operating system will fsync it;
         3) --sync-relay-log-info = 1 fsyncs it after every transaction;
      
      Notice, that the previous setup does not reduce the probability of
      corrupted master.info and relay-log.bin*. To overcome the issue, this patch also
      introduces a recovery mechanism that right after restart throws away relay-log.bin*
      retrieved from a master and updates the master.info based on the relay.info:
      
      
         4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
         throws away relay-log.bin* after a crash.
      
      However, it can only recover the incorrect binlog file and position in master.info,
      if other informations (host, port password, etc) are corrupted or incorrect,
      then this recovery mechanism will fail to work.
[18 Feb 2009 19:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66789

2816 Alfranio Correia	2009-02-18
      Post-fix for BUG#40337.
[19 Feb 2009 8:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66834

2816 Alfranio Correia	2009-02-19
      Post-fix for BUG#40337.
[24 Mar 2009 17:20] Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090324171507-s5aac9guj21l0jz6) (version source revid:alfranio.correia@sun.com-20090219085407-oqz225j1ksij42k1) (merge vers: 6.0.10-alpha) (pib:6)
[25 Mar 2009 21:15] Jon Stephens
Documented bugfix in the 6.0.11 changelog as follows:
      
        The global server variables sync_master_info and
        sync_relay_log_info are introduced for use on replication 
        slaves to control synchronization of, respectively, the
        master.info and relay.info files.
      
        In each case, setting the variable to a nonzero integer value
        N causes the slave to synchonize the corresponding file to 
        disk after every N events. Setting its value to 0
        allows the operation system to handle syncronization of the
        file instead.
      
        The actions of these variables, when enabled, are analogous to
        how the sync_binlog variable works with regard to binary logs 
        on a replication master.
      
        These variables can also be set in my.cnf, or by using the 
        server options --sync-master-info and --sync-relay-log-info
        respectively.
      
        An additional system variable relay_log_recovery is also now
        available. When enabled, this variable causes a replication
        slave to discard relay log files obtained from the replication
        master following a crash and fetch them again from the master.
      
        This variable can also be set in my.cnf, or by using the 
        --relay-log-recovery server option.
      
        This fix improves and expands upon one made in MySQL 6.0.10
        which introduced the sync_relay_log variable. For more 
        information about all of the server system variables introduced 
        by these fixes, see "Replication Slave Options and Variables".
      
Also added documentation of the new server variables/options to the indicated section of the Manual.
[29 Sep 2009 14:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85046

3119 Alfranio Correia	2009-09-29
      BUG#40337 Fsyncing master and relay log to disk after every event is too slow
      
      NOTE: Backporting the patch to next-mr.
            
      The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
      when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
      Although such solution has been proposed to reduce the probability of corrupted
      files due to a slave-crash, the performance penalty introduced by it has
      made the approach impractical for highly intensive workloads.
            
      In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
      simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
      this is the main source of performance issues.
            
      This patch introduces new options that give more control to the user on
      what should be fsynced and how often:
            
         1) (--sync-master-info, integer) which syncs the master.info after #th event;
         2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th
         events.
         3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
         transactions.
            
         To provide both performance and increased reliability, we recommend the following
         setup:
            
         1) --sync-master-info = 0 eventually the operating system will fsync it;
         2) --sync-relay-log = 0 eventually the operating system will fsync it;
         3) --sync-relay-log-info = 1 fsyncs it after every transaction;
            
      Notice, that the previous setup does not reduce the probability of
      corrupted master.info and relay-log.bin*. To overcome the issue, this patch also
      introduces a recovery mechanism that right after restart throws away relay-log.bin*
      retrieved from a master and updates the master.info based on the relay.info:
            
            
         4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
         throws away relay-log.bin* after a crash.
            
      However, it can only recover the incorrect binlog file and position in master.info,
      if other informations (host, port password, etc) are corrupted or incorrect,
      then this recovery mechanism will fail to work.
[27 Oct 2009 9:48] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091027094604-9p7kplu1vd2cvcju) (version source revid:zhenxing.he@sun.com-20091026140226-uhnqejkyqx1aeilc) (merge vers: 6.0.14-alpha) (pib:13)
[27 Oct 2009 19:12] Jon Stephens
Already documented in 6.0.10/6.0.11. Closed.
[12 Nov 2009 8:18] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:alik@sun.com-20091027095744-rf45u3x3q5d1f5y0) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 15:02] Jon Stephens
Also documented bugfix in the 5.5.0 changelog; closed.