Bug #80084 Seconds_behind_master distorted because of verbatim master's FDE in relaylog
Submitted: 20 Jan 2016 19:49 Modified: 30 May 2016 9:53
Reporter: Kenny Gryp Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.9, 5.6.22 OS:Any
Assigned to: CPU Architecture:Any

[20 Jan 2016 19:49] Kenny Gryp
Description:
Reopening http://bugs.mysql.com/bug.php?id=66921

Master's FDE is written in relay log, with its verbatim timestamp, which is usually start of master's binary log.
That means that in relay log one event may be hours or days or minutes older than the remaining context.

Those timestamps are used to show "Seconds behind master", which is the only usable metric telling about slave health, and when SQL thread is reading master's FDE (when it is reading rotated binlog), Seconds_behind_master will be calculated against start of binlog position and not the binlog position that matches relay log position, which is obviously a bug.

In the relay log I see this:

This still happens, check the timestamps in in the commented lines:

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#160105 14:23:18 server id 27436  end_log_pos 120 CRC32 0xca565278 	Start: binlog v 4, server v 5.6.22-log created 160105 14:23:18
BINLOG '
xsOLVg8sawAAdAAAAHgAAABAAAQANS42LjIyLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAXhS
Vso=
'/*!*/;
# at 120
#160115 15:51:40 server id 29436  end_log_pos 134218413 CRC32 0x49967c04 	Rotate to mysql-bin.024718  pos: 4
# at 167
#160105 14:23:18 server id 27436  end_log_pos 216 CRC32 0xdd391511 	Rotate to mysql-relay.049495  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

How to repeat:

In order to reproduce the distorted Seconds_behind_master, 
- set max_relay_log_size low so a lot of relay logs are created
- create a write heavy workload
- fetch SHOW SLAVE STATUS at least once per second
- watch seconds_behind_master, at slave rotation slave lag will be wrong and increasing over time..

Suggested fix:
SQL thread either should not change 'seconds behind master' timestamp if it encounters FDE.
Alternatively, master's FDE written to relay log can be rewritten to reflect the context better (either use just relaylog timestamp, or surrounding binlog events, somehow ;-)
[20 Jan 2016 19:59] Kenny Gryp
Example output of per second metrics over a day.

Attachment: Asynchronous_Slave_Lag.png (image/png, text), 75.39 KiB.

[20 Jan 2016 20:01] Kenny Gryp
As you can see in the attached image which displays SHOW SLAVE STATUS every second (this is 5.7.9). Having single second points in time where there is 3.500.00 replication lag is not normal. All this is because of the relay log timestamp.
[30 May 2016 9:53] Kenny Gryp
Hello,

Would you please have a look at this bug? It's been open for quite a long time and it's not been verified.

I have various companies I work with that have this issue when we're looking at seconds_behind_master every second. It kind of breaks trending.

Thanks!
[19 Sep 2018 16:55] Luis Soares
MySQL 8.0 introduces a better infrastructure in performance schema to track
how far behind in time a slave is w.r.t. its immediate master, as well as
to the original master where the transaction executed in the first place.

https://mysqlhighavailability.com/new-monitoring-replication-features-and-more/