MySQL Bugs: #40337: Fsyncing master and relay log to disk after every event is too slow

Bug #40337	Fsyncing master and relay log to disk after every event is too slow
Submitted:	27 Oct 2008 3:05	Modified:	12 Nov 2009 15:02
Reporter:	Alfranio Tavares Correia Junior	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S5 (Performance)
Version:	5.0, 5.1, 6.0	OS:	Any
Assigned to:	Alfranio Tavares Correia Junior	CPU Architecture:	Any

Description:
BUG#35542 and BUG#31665 proposed to fsync the master and relay log to disk after every event in order to make slave more resilient to crashes.

This adds the configuration option: sync_relay_log. There are fsync calls on three files that are introduced by the --sync-relay-log option:

1) relay-log, fsync on every event
2) master.info, fsync on every event
3) relay-log.info, fsync on every transaction

== Problem

* There is a HUGE performance hit while replaying logs into the
slave.
* This is introduced by the extra fsyncs happening.
* Facts:
- SQL Thread is always waiting for the IO Thread to queue event:
"Has read all relay log; waiting for the slave I/O thread to
update it". Checking the source code, one finds that SQL thread and IO
thread share the log_lock mutex (if reading and writing to the same
log file). The SQL thread waits on the update_cond while IO thread
appends (appendv call) events to the relay_log.

- IO Thread always reports queuing event:
"Queueing master event to the relay log"

How to repeat:
== Performance Hit Assessment done by Luis Soares and reproduced by Alfranio
by means of the Query Generator.

Runs
1. vanilla
2. patched - sync_relay_log=0
2. fsync relay-log.info only
3. fsync master.info and slave-relay-log.info
4. fsync relay-log, master.info and relay-log.info

Obs

1) relay-log, fsync on every event
2) master.info, fsync on every event
3) relay-log.info, fsync on every transaction
4) a huge set of small transactions.
5) some of the runs consisted in removing some fsyncs
from the patch (2. and 3.)
7) tests were conducted by: (i) inserting everything in
the master; (ii) connecting the slave once inserts in master are
done; (iii) timing slave catch-up.

In what follows, we have hdparm -W0 --direct on harddrive:

Results

1. vanilla:
0m57.720s

2. patched, sync_relay_log=0:
1m3.772s

3. fsync relay-log.info:
0m58.427s

3.1. fsync relay-log.info (each transaction is one insert only):
it is much worse (>>) than 3. (~20x worse)

4. fsync master.info + relay-log.info:
16m12.957s

The smaller the transaction the worse the result. For instance, with one statement per transaction:
5. fsync relay-log + master.info + relay-log.info:
90m50.770s

Suggested fix:
== Proposed Solution to Improve Performance

Some thoughts on how to do just one sync (relay-log.info):

1. Flush the relay-log.info after each transaction whenever the
--sync-relay-log is not equal to zero (default value).

2. Flush the master.info and relay-log.bin as is done by the patch http://lists.mysql.com/commits/56245. However, instead of having a --sync-relay-log equals to one as suggested by the patch, we propose to keep this value >>>>>>> 1. Fsyncing the master-log.info and relay-log.bin does not increase resilience and only contributes to a faster recovery after a crash.

3. If a crash happens look at the relay-log.info to see which transactions need to be retrieved from the master. Note that we cannot relay on the master.info. There are two possible approaches to this third step:

3.1 Forget any relay-log.bin already retrieved and start from the
position given by the relay-log.info.

3.2 Try to see which is the latest relay-log.bin that is not corrupted and update the necessary meta-information.

A patch for items 1, 2 and 3.1 is being developed.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57079

2704 Alfranio Correia	2008-10-27
      Introduced a recovery mechanism that throws away relay-log.bin* retrieved
      from a master. See BUG#40337.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57080

2705 Alfranio Correia	2008-10-27
      Changed result file for a test case affected by fix proposed for BUG#40337

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57081

2706 Alfranio Correia	2008-10-27
      Merged patch that was proposed by Sinisa and modified by He Zhenxing to
      fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57114

2704 Alfranio Correia	2008-10-27
      Merged patch that was proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.
      And introduced a new recovery process that disregards old relay-log.bin(s) and starts
      re-fetching from the master based on the information in relay-log.info which may be flushed and synced after each transaction commit.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57329

2705 Alfranio Correia	2008-10-29
      Introduced a recovery mechanism that throws away relay-log.bin* retrieved
      from a master in case of a crash and updates the master.info based on
      the relay.info. This is done because such files may be corrupted after
      a crash. We are assuming that relay.info is fsynced per transaction.
      See BUG#40337.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57346

2704 Alfranio Correia	2008-10-29
      Added option to sync master.info (--sync-master-info, integer) and relay log (--sync-relay-log, integer) to disk after every event and the relay info (--sync-relay-log-info, boolean) per transaction. This supersedes the patch proposed by Sinisa and modified by He Zhenxing 
      to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.

How to use the patches: http://lists.mysql.com/commits/57329 and http://lists.mysql.com/commits/57346.

To throw away possible corrupted relay-log-bin* and master.info due to a crash and update both information based on the relay-log.info, set the --relay-log-recovery = 1. This option only makes sense, when the relay.info is fsynced per transaction as described below.

Side effect: a normal shutdown and startup also throws away logs. To avoid this, be careful to disable the parameter before starting up the slave. In a near future, we will improve the recovery and use the fsyncs provided by --sync-master-info and --sync-relay-log as described below to avoid throwing away logs that are fsynced to disk and therefore are not corrupted.

To fsync the master.info and relay.log periodically, change the parameters --sync-master-info = (number of events before the fsync) and --sync-relay-log = (number of events before the fsync). To fsync the relay.info per transaction, we should enable the parameter --sync-relay-log-info = 1.

I am not the first or only person to bring this up, but I want to mention that we should seriously consider dropping the sync-relay-log and sync-master-info options that are being added by this patch.  My understanding is that the sync-relay-log-info option will be the only one needed for resiliency, and that sync-relay-log and sync-master-info will only be used to limit the amount of binlog data that may need to be refetched from the replication master during recovery.

In most/all modern operating systems, there are already mechanisms to ensure that dirty file system pages will be flushed to disk within a reasonable time.  For example, Solaris has fsflush, Linux has update/bdflush, and Windows has lazy write worker threads.  These mechanisms generally ensure that a flush to disk will occur within 30 seconds to a few minutes of the buffered file update, so we can also expect that this would be a typical upper limit on the amount of time required to refetch binlog data from the master.

It seems to me that if someone were going to use the sync-relay-log and sync-master-info options, the most likely settings would be those that approximate what we already get for free from the operating system.  Setting them higher would add no value, and setting them lower would add very little value at a potentially high cost.  For example, setting them to a very low value could have a huge performance impact.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58228

2704 Alfranio Correia	2008-11-07
      Added option to sync master.info (--sync-master-info, integer) 
      and relay log (--sync-relay-log, integer) to disk after 
      every event and the relay info (--sync-relay-log-info, boolean) per transaction. 
      This supersedes the patch proposed 
      by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. This is an update to BUG#40337.

I've removed some dead code in the previous patch and incremented the counters that keep track of the number of events in the same way in all places.

Also see BUG#26540.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58440

2704 Alfranio Correia	2008-11-11
      Added option to sync master.info (--sync-master-info, integer) and relay log (--sync-relay-log, integer) to disk after
      every event and the relay info (--sync-relay-log-info, boolean) per transaction.
      This supersedes the patch proposed by Sinisa and modified by He Zhenxing to fix bugs: BUG#35542 and BUG#31665. 
      This is an update to BUG#40337.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58696

2704 Alfranio Correia	2008-11-13
      This is an update to BUG#40337 - Fsyncing master and relay log to disk after every event is too slow
      
      Added option to sync master.info (--sync-master-info, integer) and relay log
      (--sync-relay-log, integer) to disk after every event and the relay info 
      (--sync-relay-log-info, boolean) per transaction.
      
      This supersedes the patch proposed by Sinisa and modified by He Zhenxing to fix
      bugs: BUG#35542 and BUG#31665.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58700

2704 Alfranio Correia	2008-11-13
      BUG#40337 Fsyncing master and relay log to disk after every
      event is too slow.
      
      Introduced a recovery mechanism that throws away relay-log.bin* retrieved
      from a master in case of a crash and updates the master.info based on 
      the relay.info. This is done because such files may be corrupted after 
      a crash. 
      
      It is advisalbe to use this feature, when the parameter --syn-relay-log-info = 1.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60843

2747 Alfranio Correia	2008-12-08
      BUG#40337. Introduced four options to improve reliability in the slave:
      
      . (--sync-master-info, integer) which syncs the master.info after #th event;
      . (--sync-relay-log, integer) which syncs the relay-log.bin* after #th events.
      . (--sync-relay-log-info, integer) which syncs the relay.info after #th transactions.
      . (--relay-log-recovery, boolean) which enables a recovery mechanism that
      throws away relay-log.bin* after a crash.
      
      The recovery process is carried on because such files may be corrupted due to
      a crash. This obliges to re-fetch events from them master. It is advisable
      to use this feature, when the parameter --syn-relay-log-info = 1.
      
      This supersedes the patch proposed to fix bugs: BUG#35542 and BUG#31665.

The previous patches were used to custom builds. 
The last patch (http://lists.mysql.com/commits/60843) 
however is to be pushed into 6.0.

Added changelog entry, info for --sync-relay-log to 6.0.10 changelog.

See Bug #31665 for details.

Left status as In-Progress since Alfranio is still working on this (and this particular bug wasn't yet in Documenting).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65196

2810 Alfranio Correia	2009-02-04
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
      
      The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
      when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
      Although such solution has been proposed to reduce the probability of corrupted
      files due to a slave-crash, the performance penalty introduced by it has
      made the approach impractical for highly intensive workloads.
      
      In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
      simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
      this is the main source of performance issues.
      
      This patch introduces new options that give more control to the user on
      what should be fsynced and how often:
            1) (--sync-master-info, integer) which syncs the master.info after #th event;
            2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th events.
            3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
            transactions.
      
      To provide both performance and increased reliability, we recommend the following
      setup:
            1) --sync-master-info = 0 eventually the operating system will fsync it;
            2) --sync-relay-log = 0 eventually the operating system will fsync it;
            3) --sync-relay-log-info = 1 fsyncs it after every transaction;
      
      Notice, however, that the previous setup does not reduce the probability of corrupted
      master.info and relay-log.bin*. To overcome the issue, this patch also introduces
      a recovery mechanism that right after restart throws away relay-log.bin* retrieved
      from a master and updates the master.info based on the relay.info:
            4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
            throws away relay-log.bin* after a crash.
      
      So, it is advisable to setup:
            4) --syn-relay-log-info = 1.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65869

2811 Alfranio Correia	2009-02-11
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
      
      Post-fix.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65989

2811 Alfranio Correia	2009-02-12
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
            
      Post-fix.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66001

2811 Alfranio Correia	2009-02-12
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
      
      Post-fix.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66173

2810 Alfranio Correia	2009-02-13
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
            
      The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
      when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
      Although such solution has been proposed to reduce the probability of corrupted
      files due to a slave-crash, the performance penalty introduced by it has
      made the approach impractical for highly intensive workloads.
            
      In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
      simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
      this is the main source of performance issues.
            
      This patch introduces new options that give more control to the user on
      what should be fsynced and how often:
        1) (--sync-master-info, integer) which syncs the master.info after #th event;
        2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th
      events.
        3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
        transactions.
            
      To provide both performance and increased reliability, we recommend the following
      setup:
        1) --sync-master-info = 0 eventually the operating system will fsync it;
        2) --sync-relay-log = 0 eventually the operating system will fsync it;
        3) --sync-relay-log-info = 1 fsyncs it after every transaction;
            
        Notice, however, that the previous setup does not reduce the probability of
      corrupted master.info and relay-log.bin*. To overcome the issue, this patch also 
      introduces a recovery mechanism that right after restart throws away relay-log.bin* 
      retrieved from a master and updates the master.info based on the relay.info:
        4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
        throws away relay-log.bin* after a crash.
            
        So, it is advisable to setup: --syn-relay-log-info = 1.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66709

2810 Alfranio Correia	2009-02-17
      BUG#40337  Fsyncing master and relay log to disk after every event is too slow
      
      The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
      when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
      Although such solution has been proposed to reduce the probability of corrupted
      files due to a slave-crash, the performance penalty introduced by it has
      made the approach impractical for highly intensive workloads.
      
      In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
      simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
      this is the main source of performance issues.
      
      This patch introduces new options that give more control to the user on
      what should be fsynced and how often:
      
         1) (--sync-master-info, integer) which syncs the master.info after #th event;
         2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th
         events.
         3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
         transactions.
      
         To provide both performance and increased reliability, we recommend the following
         setup:
      
         1) --sync-master-info = 0 eventually the operating system will fsync it;
         2) --sync-relay-log = 0 eventually the operating system will fsync it;
         3) --sync-relay-log-info = 1 fsyncs it after every transaction;
      
      Notice, that the previous setup does not reduce the probability of
      corrupted master.info and relay-log.bin*. To overcome the issue, this patch also
      introduces a recovery mechanism that right after restart throws away relay-log.bin*
      retrieved from a master and updates the master.info based on the relay.info:
      
      
         4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
         throws away relay-log.bin* after a crash.
      
      However, it can only recover the incorrect binlog file and position in master.info,
      if other informations (host, port password, etc) are corrupted or incorrect,
      then this recovery mechanism will fail to work.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66789

2816 Alfranio Correia	2009-02-18
      Post-fix for BUG#40337.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66834

2816 Alfranio Correia	2009-02-19
      Post-fix for BUG#40337.

Pushed into 6.0.11-alpha (revid:alik@sun.com-20090324171507-s5aac9guj21l0jz6) (version source revid:alfranio.correia@sun.com-20090219085407-oqz225j1ksij42k1) (merge vers: 6.0.10-alpha) (pib:6)

Documented bugfix in the 6.0.11 changelog as follows:

The global server variables sync_master_info and
sync_relay_log_info are introduced for use on replication
slaves to control synchronization of, respectively, the
master.info and relay.info files.

In each case, setting the variable to a nonzero integer value
N causes the slave to synchonize the corresponding file to
disk after every N events. Setting its value to 0
allows the operation system to handle syncronization of the
file instead.

The actions of these variables, when enabled, are analogous to
how the sync_binlog variable works with regard to binary logs
on a replication master.

These variables can also be set in my.cnf, or by using the
server options --sync-master-info and --sync-relay-log-info
respectively.

An additional system variable relay_log_recovery is also now
available. When enabled, this variable causes a replication
slave to discard relay log files obtained from the replication
master following a crash and fetch them again from the master.

This variable can also be set in my.cnf, or by using the
--relay-log-recovery server option.

This fix improves and expands upon one made in MySQL 6.0.10
which introduced the sync_relay_log variable. For more
information about all of the server system variables introduced
by these fixes, see "Replication Slave Options and Variables".

Also added documentation of the new server variables/options to the indicated section of the Manual.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85046

3119 Alfranio Correia	2009-09-29
      BUG#40337 Fsyncing master and relay log to disk after every event is too slow
      
      NOTE: Backporting the patch to next-mr.
            
      The fix proposed in BUG#35542 and BUG#31665 introduces a performance issue
      when fsyncing the master.info, relay.info and relay-log.bin* after #th events.
      Although such solution has been proposed to reduce the probability of corrupted
      files due to a slave-crash, the performance penalty introduced by it has
      made the approach impractical for highly intensive workloads.
            
      In a nutshell, the option --syn-relay-log proposed in BUG#35542 and BUG#31665
      simultaneously fsyncs master.info, relay-log.info and relay-log.bin* and
      this is the main source of performance issues.
            
      This patch introduces new options that give more control to the user on
      what should be fsynced and how often:
            
         1) (--sync-master-info, integer) which syncs the master.info after #th event;
         2) (--sync-relay-log, integer) which syncs the relay-log.bin* after #th
         events.
         3) (--sync-relay-log-info, integer) which syncs the relay.info after #th
         transactions.
            
         To provide both performance and increased reliability, we recommend the following
         setup:
            
         1) --sync-master-info = 0 eventually the operating system will fsync it;
         2) --sync-relay-log = 0 eventually the operating system will fsync it;
         3) --sync-relay-log-info = 1 fsyncs it after every transaction;
            
      Notice, that the previous setup does not reduce the probability of
      corrupted master.info and relay-log.bin*. To overcome the issue, this patch also
      introduces a recovery mechanism that right after restart throws away relay-log.bin*
      retrieved from a master and updates the master.info based on the relay.info:
            
            
         4) (--relay-log-recovery, boolean) which enables a recovery mechanism that
         throws away relay-log.bin* after a crash.
            
      However, it can only recover the incorrect binlog file and position in master.info,
      if other informations (host, port password, etc) are corrupted or incorrect,
      then this recovery mechanism will fail to work.

Pushed into 6.0.14-alpha (revid:alik@sun.com-20091027094604-9p7kplu1vd2cvcju) (version source revid:zhenxing.he@sun.com-20091026140226-uhnqejkyqx1aeilc) (merge vers: 6.0.14-alpha) (pib:13)

Already documented in 6.0.10/6.0.11. Closed.

Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:alik@sun.com-20091027095744-rf45u3x3q5d1f5y0) (merge vers: 5.5.0-beta) (pib:13)

Also documented bugfix in the 5.5.0 changelog; closed.