Bug #74321 Execute relay-log-recovery only when needed.
Submitted: 10 Oct 2014 13:34 Modified: 10 Oct 2014 13:42
Reporter: Jean-François Gagné Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Replication Severity:S4 (Feature request)
Version:5.6.21 OS:Any
Assigned to:
Tags: Crash-safe replication, Delayed Slave, relay-log-recovery, replication

[10 Oct 2014 13:34] Jean-François Gagné
Description:
When putting relay-log-recovery=1 in MySQL configuration file, relay-log-recovery is executed every time MySQL is started, even if MySQL was cleanly stopped.  This means that, on startup, all unexecuted relay-logs on the slave are skipped, and that a lagging slave needs to redownload binary logs.  This is bad for many reasons:

1. If the master in unavailable when a slave is restarted, no progress can be made by the slave.

2. If the master has purged its binary logs, the slave will not be able to resume replication.

3. If the volume of unexecuted relay-log is significant, the network interface of the master will be overloaded, which might prevent/degrade other accesses to the master.

4. Again if the volume of unexecuted relay-log is significant and if the slave is on a remote site, the WAN might get saturated, which would prevent/degrade other usages of the WAN.

The reason # 3 and 4 are worst on delayed slaves: a delayed slave with 10 GB of unexecuted relay-logs will saturate a 1 Gb NIC for more than one minute.  The WAN will also be affected depending of its capacity.

Thanks in advance for improving this.

How to repeat:
Setup a master/slave replication.
Stop the SQL_THREAD on the slave and run many transactions on the master (this accumulates un-executed events in the relay-logs of the slave).
Once you have at least 100 MB of unexecuted events, restart mysqld on the slave.
Observe network usage on master and slave while the slave is resuming replication.

If running above with relay-log-recovery=0, network usage will not be significantly impacted on master and slave.

If running above with relay-log-recovery=1, network usage will be significantly impacted on master and slave.
[10 Oct 2014 13:42] Jean-François Gagné
Related to Bug #74089 and Bug #74323.
[10 Oct 2014 14:44] Shane Bester
I had filed this internally last year:
Bug 17848777 - RELAY_LOG_RECOVERY SHOULD HAVE OPTION TO REREAD ONLY CORRUPT/MISSING LOGS