MySQL Bugs: #72376: Seconds_behind_master distorted because of previous

Bug #72376	Seconds_behind_master distorted because of previous_gtid event
Submitted:	17 Apr 2014 23:18	Modified:	18 Nov 2014 16:39
Reporter:	Domas Mituzas	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Utilities: Binlog Events	Severity:	S3 (Non-critical)
Version:	5.6.*	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	Seconds_Behind_Master, spike

Description:
this is a sister bug for #66921 - same problem, different source of timestamp on the event

PREVIOUS_GTIDS_LOG_EVENT has timestamp of when replication thread has started, hence seconds_behind_master calculation is showing lag spikes when the server is not lagging

unfortunately unlike in #66921 I won't be able to print the actual event output from mysqlbinlog as mysqlbinlog skips it and one has to use other means of binary log inspection, hehe. 

How to repeat:
copypasting from 66921:

How to repeat:
Replicate, observe Seconds_behind_master show incorrect values at relay log rotations. 

Suggested fix:
.

Relay log after the executing FLUSH LOGS on master.

It seems even the rotate events in the relay log have old timestamps. Except the fake rotate from master, other rotate events are used in calculating the last master timestamp, so a spike in sbm is possible

mysqld-relay-bin-3307.000234

# at 120
#140417 17:06:24 server id 127966511  end_log_pos 187   Previous-GTIDs
# 79169a35-ba7d-11e3-82ee-0002c9521040:911507535-916628573
# at 187
#140417 17:28:53 server id 127969309  end_log_pos 425738623     Rotate to binary-logs-3307.001766  pos: 4
# at 237
#140417 17:06:24 server id 127966511  end_log_pos 292   Rotate to mysqld-relay-bin-3307.000235  pos: 4
DELIMITER ;

mysqld-relay-bin-3307.000235
# at 353
#140417 17:06:24 server id 0  end_log_pos 403   Rotate to binary-logs-3307.001766  pos: 267

Thank you for the report.

Verified as described.

To repeat:

1. Start MTR as:  ./mtr --start --suite=rpl rpl_alter --mysqld=--gtid_mode=ON --mysqld=--log-bin --mysqld=--log-slave-updates --mysqld=--enforce-gtid-consistency --mysqld=--slave-parallel-workers=2 --mysqld=--expire_logs_days=14 --mysqld=--sync_binlog=1 --mysqld=--binlog_format=row --mysqld=--read_only=1 --mysqld=--master_info_repository=TABLE --mysqld=--relay_log_info_repository=TABLE --mysqld=--binlog_checksum=1 --mysqld=--master_verify_checksum=1 --mysqld=--slave_sql_verify_checksum=1 --mysqld=--binlog_rows_query_log_events=1 --mysqld=--relay_log_purge=0 &

2. Setup slave

3. Stop slave

4. Run sysbench:

sysbench --test=/home/sveta/src/sysbench/sysbench/tests/db/oltp.lua --oltp-table-size=10000 --oltp-test-mode=simple --oltp-tables-count=100 --max-requests=0 --max-time=1800 --mysql-port=13000 --mysql-host=127.0.0.1 --mysql-user=root  --report-interval=5 --oltp-skip-trx --oltp-read-write --num-threads=64 --mysql-db=test prepare

sysbench --test=/home/sveta/src/sysbench/sysbench/tests/db/oltp.lua --oltp-table-size=10000 --oltp-test-mode=simple --oltp-tables-count=100 --max-requests=0 --max-time=1800 --mysql-port=13000 --mysql-host=127.0.0.1 --mysql-user=root  --report-interval=5 --oltp-skip-trx --oltp-read-write --num-threads=64 --mysql-db=test run

5. Wait some time

6. Start slave

7. Watch how Seconds_Behind_Master is increasing

8. Connect to master and issue FLUSH LOGS

9. Examine relay logs content:

$ ../client/mysqlbinlog var/mysqld.2/data/slave-relay-bin.000012 | grep end_log_pos | head -n 40
#140419  0:50:43 server id 2  end_log_pos 120 CRC32 0x680c930b 	Start: binlog v 4, server v 5.6.18-log created 140419  0:50:43
#140419  0:50:43 server id 2  end_log_pos 191 CRC32 0x3acfdfc9 	Previous-GTIDs
#700101  2:00:00 server id 1  end_log_pos 0 CRC32 0x246e1b8a 	Rotate to master-bin.000005  pos: 298728736
#140418 22:51:51 server id 1  end_log_pos 0 CRC32 0xe80e49aa 	Start: binlog v 4, server v 5.6.18-log created 140418 22:51:51
#140419  0:48:44 server id 1  end_log_pos 298728784 CRC32 0x4d595c62 	GTID [commit=yes]
#140419  0:48:44 server id 1  end_log_pos 298728856 CRC32 0xdcb21b1f 	Query	thread_id=81	exec_time=0	error_code=0
#140419  0:48:44 server id 1  end_log_pos 298728977 CRC32 0xf8dd2cd4 	Table_map: `test`.`sbtest93` mapped to number 255
...
$ ../client/mysqlbinlog var/mysqld.2/data/slave-relay-bin.000013 | grep end_log_pos | head -n 40
#140419  0:50:43 server id 2  end_log_pos 120 CRC32 0x680c930b 	Start: binlog v 4, server v 5.6.18-log created 140419  0:50:43
#140419  0:50:43 server id 2  end_log_pos 191 CRC32 0x70c053e3 	Previous-GTIDs
#140419  0:50:54 server id 1  end_log_pos 301244614 CRC32 0x20eb8a77 	Rotate to master-bin.000006  pos: 4
#140419  0:50:43 server id 2  end_log_pos 292 CRC32 0xa3d1cbd9 	Rotate to slave-relay-bin.000014  pos: 4
$ ../client/mysqlbinlog var/mysqld.2/data/slave-relay-bin.000014 | grep end_log_pos | head -n 40
#140419  0:50:43 server id 2  end_log_pos 120 CRC32 0x680c930b 	Start: binlog v 4, server v 5.6.18-log created 140419  0:50:43
#140419  0:50:43 server id 2  end_log_pos 191 CRC32 0x70c053e3 	Previous-GTIDs
#140419  0:50:43 server id 1  end_log_pos 0 CRC32 0xf5e6f37b 	Rotate to master-bin.000006  pos: 4
#140419  0:50:54 server id 1  end_log_pos 120 CRC32 0xf439e0a9 	Start: binlog v 4, server v 5.6.18-log created 140419  0:50:54
#140419  0:50:43 server id 0  end_log_pos 403 CRC32 0x8475f572 	Rotate to master-bin.000006  pos: 191
#140419  0:50:55 server id 1  end_log_pos 239 CRC32 0x0c448854 	GTID [commit=yes]
#140419  0:50:52 server id 1  end_log_pos 311 CRC32 0x99bc0e9f 	Query	thread_id=78	exec_time=2	error_code=0
#140419  0:50:52 server id 1  end_log_pos 430 CRC32 0xd2f06d14 	Table_map: `test`.`sbtest3` mapped to number 75
...

Similar behavior in seconds behind master statistic with multi threaded slave.

The timestamp of rotate_log_event which is very big is used here.

int Rotate_log_event::do_update_pos(Relay_log_info *rli) {
if (rli->is_parallel_exec())
rli->reset_notified_checkpoint(0, when.tv_sec + (time_t) exec_time,
true/*need_data_lock=true*/);

}

Thanks for your feedback. The following was added to the 5.6.23 and 5.7.6 changelog with commit 4672:
When using SHOW SLAVE STATUS to monitor replication performance, Seconds_Behind_Master sometimes displayed unexpected lag behind the master. This was caused by Previous_gtids_log_events being written to the slave's relay log with a timestamp behind the master, and then being used to calculate the Seconds_Behind_Master. This fix ensures that events generated on the slave that are added to the relay log and not used when calculating Seconds_Behind_Master.

Posted by developer:
 
Thanks for your feedback. The following was added to the 5.6.23 and 5.7.6 changelog with commit 4672:
When using SHOW SLAVE STATUS to monitor replication performance, Seconds_Behind_Master sometimes displayed unexpected lag behind the master. This was caused by Previous_gtids_log_events being written to the slave's relay log with a timestamp behind the master, and then being used to calculate the Seconds_Behind_Master. This fix ensures that events generated on the slave that are added to the relay log and not used when calculating Seconds_Behind_Master.

$ git show -s 4ded569
commit 4ded5690d954ded1d8bfcbb2807a6893ddd208ef
Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com>
Date:   Thu Nov 13 21:46:26 2014 +0530

    Bug#18622657 SECONDS_BEHIND_MASTER DISTORTED BECAUSE OF
    PREVIOUS_GTID EVENT
    
    Problem: Previous gtid log event in relay log should not be contributed to
    seconds behind master calculation.
    
    Analysis: A log event contains a timestamp in it's header which represents
    the time at which it got generated on Master. In relaylog, we have events which got
    transferred from Master and also contains which were generated on Slave, which
    we call it as " a relay log event" for eg: when relaylog is rotated,
    Slave generates and writes Previous_gtid_log_event in to the relay log.
    These events (which got generated on Slave) contains the
    timestamp of thread's starting time. When we are calculating Second Behind
    Master(SBM), the events which are generated on Slave should be ignored and should
    not contribute to SBM calculation. We already have a logic in the server
    if (!ev->is_relay_log_event())
    {
      last_master_timestamp=
    }
    
    But the previous gtid log events generated on Slave are not marked as "relay log events"
    hence last_master_timestamp is getting set with thread's start time and SBM
    will have wrong value.
    
    And also in some situations, we generate Rotate log events
    on slave to maintain the SQL positions in sync with Master
    these log events are generated with server_id 0.
    These events also should not contribute to SBM calculation.
    
    Fix: Mark previous_gtid_log_event as relay log event.
    And don't set last_master_timestamp when server_id= 0.

The fix doesn't handle ROTATE_EVENT. As i mentioned in the comment earlier, some ROTATE events have old timestamps.

It seems even the rotate events in the relay log have old timestamps. Except the fake rotate from master, other rotate events are used in calculating the last master timestamp, so a spike in sbm is possible

mysqld-relay-bin-3307.000234

# at 120
#140417 17:06:24 server id 127966511  end_log_pos 187   Previous-GTIDs
# 79169a35-ba7d-11e3-82ee-0002c9521040:911507535-916628573
# at 187
#140417 17:28:53 server id 127969309  end_log_pos 425738623     Rotate to binary-logs-3307.001766  pos: 4
# at 237
#140417 17:06:24 server id 127966511  end_log_pos 292   Rotate to mysqld-relay-bin-3307.000235  pos: 4
DELIMITER ;

mysqld-relay-bin-3307.000235
# at 353
#140417 17:06:24 server id 0  end_log_pos 403   Rotate to binary-logs-3307.001766  pos: 267

Correct fix would be

@@ -7219,8 +7219,7 @@ int Rotate_log_event::do_update_pos(Relay_log_info *rli)
                         (ulong) rli->get_group_master_log_pos()));
     mysql_mutex_unlock(&rli->data_lock);
     if (rli->is_parallel_exec())
-      rli->reset_notified_checkpoint(0,
-                                     server_id ? when.tv_sec + (time_t) exec_time : 0,
+      rli->reset_notified_checkpoint(0, 0,
                                      true/*need_data_lock=true*/);
 
     /*
diff --git a/sql/rpl_slave.cc b/sql/rpl_slave.cc
index bde4b90..fe8b5a6 100644
--- a/sql/rpl_slave.cc
+++ b/sql/rpl_slave.cc
@@ -4322,8 +4322,10 @@ static int exec_relay_log_event(THD* thd, Relay_log_info* rli)
     if (!(rli->is_parallel_exec() ||
           ev->is_artificial_event() || ev->is_relay_log_event() ||
           ev->when.tv_sec == 0 || ev->get_type_code() == FORMAT_DESCRIPTION_EVENT ||
-          ev->server_id == 0))
+          ev->server_id == 0 || ev->get_type_code() == ROTATE_EVENT))
     {
+      DBUG_ASSERT(ev->get_type_code() != ROTATE_EVENT && ev->get_type_code() !=
+                  PREVIOUS_GTIDS_LOG_EVENT);
       rli->last_master_timestamp= ev->when.tv_sec + (time_t) ev->exec_time;
       DBUG_ASSERT(rli->last_master_timestamp >= 0);

Followup:

Bug 23532304 - SECONDS_BEHIND_MASTER REPORTS IO_THREAD TIME WHEN USING MTS & ROTATING RELAY-LOG

Fixed in next versions,  5.6.36, 5.7.18, 8.0.1