Bug #59598 network can cause seconds_behind_master to fluctuate between 0 and large number
Submitted: 19 Jan 2011 1:46 Modified: 19 Jan 2011 4:11
Reporter: Ben Krug Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S4 (Feature request)
Version:5.5, 5.6 OS:Any
Assigned to: Luis Soares CPU Architecture:Any

[19 Jan 2011 1:46] Ben Krug
Description:
In SHOW SLAVE STATUS, Seconds_behind_master will fluctuate between 0 and a large number when there is a network issue. This is due to the fact that the IO Thread can not keep
up with the SQL thread. When the SQL thread is all caught up on the relay logs Seconds_behind_master is set to 0.

Basically the problem is that the SQL thread doesn't know the difference between two cases:

1. The IO thread not being able to download the logs fast enough
2. There is nothing for the IO thread to download

This is a known issue and expected with 5.1.

However, 5.5 should be able to fix this if heartbeats are enabled since it knows that there should always be something for the IO thread to be downloading. Looking at the
code, there is even a comment which says that heartbeats can fix the issue:

http://bazaar.launchpad.net/~mysql/mysql-server/mysql-5.5/view/head:/sql/slave.cc#L4577

So what would be nice is to get this done in 5.5 (basically by removing the setting to 0 when heartbeats are enabled and a heartbeat isn't the last event received).  It looks as if this could be accomplished by adding an if() before setting it to 0, so this may be a quick fix.

How to repeat:
set up replication, break connection between master and slave sporadically

Suggested fix:
looks as if an if() could be added in the code, to use replication heartbeat, and not set seconds_behind_master to 0 if heartbeat is not current.
[30 Jan 2012 21:35] Ben Krug
There has been some progress on this general issue.  See
http://bugs.mysql.com/bug.php?id=52166 for details.
[25 Sep 2017 4:53] Rick James
I think https://dba.stackexchange.com/a/186785/1876 is discussing a possible explanation for this bug, even in 5.7.