Bug #72376 | Seconds_behind_master distorted because of previous_gtid event | ||
---|---|---|---|
Submitted: | 17 Apr 2014 23:18 | Modified: | 18 Nov 2014 16:39 |
Reporter: | Domas Mituzas | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Utilities: Binlog Events | Severity: | S3 (Non-critical) |
Version: | 5.6.* | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Seconds_Behind_Master, spike |
[17 Apr 2014 23:18]
Domas Mituzas
[18 Apr 2014 0:44]
Santosh Praneeth Banda
Relay log after the executing FLUSH LOGS on master. It seems even the rotate events in the relay log have old timestamps. Except the fake rotate from master, other rotate events are used in calculating the last master timestamp, so a spike in sbm is possible mysqld-relay-bin-3307.000234 # at 120 #140417 17:06:24 server id 127966511 end_log_pos 187 Previous-GTIDs # 79169a35-ba7d-11e3-82ee-0002c9521040:911507535-916628573 # at 187 #140417 17:28:53 server id 127969309 end_log_pos 425738623 Rotate to binary-logs-3307.001766 pos: 4 # at 237 #140417 17:06:24 server id 127966511 end_log_pos 292 Rotate to mysqld-relay-bin-3307.000235 pos: 4 DELIMITER ; mysqld-relay-bin-3307.000235 # at 353 #140417 17:06:24 server id 0 end_log_pos 403 Rotate to binary-logs-3307.001766 pos: 267
[18 Apr 2014 21:57]
Sveta Smirnova
Thank you for the report. Verified as described. To repeat: 1. Start MTR as: ./mtr --start --suite=rpl rpl_alter --mysqld=--gtid_mode=ON --mysqld=--log-bin --mysqld=--log-slave-updates --mysqld=--enforce-gtid-consistency --mysqld=--slave-parallel-workers=2 --mysqld=--expire_logs_days=14 --mysqld=--sync_binlog=1 --mysqld=--binlog_format=row --mysqld=--read_only=1 --mysqld=--master_info_repository=TABLE --mysqld=--relay_log_info_repository=TABLE --mysqld=--binlog_checksum=1 --mysqld=--master_verify_checksum=1 --mysqld=--slave_sql_verify_checksum=1 --mysqld=--binlog_rows_query_log_events=1 --mysqld=--relay_log_purge=0 & 2. Setup slave 3. Stop slave 4. Run sysbench: sysbench --test=/home/sveta/src/sysbench/sysbench/tests/db/oltp.lua --oltp-table-size=10000 --oltp-test-mode=simple --oltp-tables-count=100 --max-requests=0 --max-time=1800 --mysql-port=13000 --mysql-host=127.0.0.1 --mysql-user=root --report-interval=5 --oltp-skip-trx --oltp-read-write --num-threads=64 --mysql-db=test prepare sysbench --test=/home/sveta/src/sysbench/sysbench/tests/db/oltp.lua --oltp-table-size=10000 --oltp-test-mode=simple --oltp-tables-count=100 --max-requests=0 --max-time=1800 --mysql-port=13000 --mysql-host=127.0.0.1 --mysql-user=root --report-interval=5 --oltp-skip-trx --oltp-read-write --num-threads=64 --mysql-db=test run 5. Wait some time 6. Start slave 7. Watch how Seconds_Behind_Master is increasing 8. Connect to master and issue FLUSH LOGS 9. Examine relay logs content: $ ../client/mysqlbinlog var/mysqld.2/data/slave-relay-bin.000012 | grep end_log_pos | head -n 40 #140419 0:50:43 server id 2 end_log_pos 120 CRC32 0x680c930b Start: binlog v 4, server v 5.6.18-log created 140419 0:50:43 #140419 0:50:43 server id 2 end_log_pos 191 CRC32 0x3acfdfc9 Previous-GTIDs #700101 2:00:00 server id 1 end_log_pos 0 CRC32 0x246e1b8a Rotate to master-bin.000005 pos: 298728736 #140418 22:51:51 server id 1 end_log_pos 0 CRC32 0xe80e49aa Start: binlog v 4, server v 5.6.18-log created 140418 22:51:51 #140419 0:48:44 server id 1 end_log_pos 298728784 CRC32 0x4d595c62 GTID [commit=yes] #140419 0:48:44 server id 1 end_log_pos 298728856 CRC32 0xdcb21b1f Query thread_id=81 exec_time=0 error_code=0 #140419 0:48:44 server id 1 end_log_pos 298728977 CRC32 0xf8dd2cd4 Table_map: `test`.`sbtest93` mapped to number 255 ... $ ../client/mysqlbinlog var/mysqld.2/data/slave-relay-bin.000013 | grep end_log_pos | head -n 40 #140419 0:50:43 server id 2 end_log_pos 120 CRC32 0x680c930b Start: binlog v 4, server v 5.6.18-log created 140419 0:50:43 #140419 0:50:43 server id 2 end_log_pos 191 CRC32 0x70c053e3 Previous-GTIDs #140419 0:50:54 server id 1 end_log_pos 301244614 CRC32 0x20eb8a77 Rotate to master-bin.000006 pos: 4 #140419 0:50:43 server id 2 end_log_pos 292 CRC32 0xa3d1cbd9 Rotate to slave-relay-bin.000014 pos: 4 $ ../client/mysqlbinlog var/mysqld.2/data/slave-relay-bin.000014 | grep end_log_pos | head -n 40 #140419 0:50:43 server id 2 end_log_pos 120 CRC32 0x680c930b Start: binlog v 4, server v 5.6.18-log created 140419 0:50:43 #140419 0:50:43 server id 2 end_log_pos 191 CRC32 0x70c053e3 Previous-GTIDs #140419 0:50:43 server id 1 end_log_pos 0 CRC32 0xf5e6f37b Rotate to master-bin.000006 pos: 4 #140419 0:50:54 server id 1 end_log_pos 120 CRC32 0xf439e0a9 Start: binlog v 4, server v 5.6.18-log created 140419 0:50:54 #140419 0:50:43 server id 0 end_log_pos 403 CRC32 0x8475f572 Rotate to master-bin.000006 pos: 191 #140419 0:50:55 server id 1 end_log_pos 239 CRC32 0x0c448854 GTID [commit=yes] #140419 0:50:52 server id 1 end_log_pos 311 CRC32 0x99bc0e9f Query thread_id=78 exec_time=2 error_code=0 #140419 0:50:52 server id 1 end_log_pos 430 CRC32 0xd2f06d14 Table_map: `test`.`sbtest3` mapped to number 75 ...
[2 Jul 2014 23:14]
Santosh Praneeth Banda
Similar behavior in seconds behind master statistic with multi threaded slave. The timestamp of rotate_log_event which is very big is used here. int Rotate_log_event::do_update_pos(Relay_log_info *rli) { if (rli->is_parallel_exec()) rli->reset_notified_checkpoint(0, when.tv_sec + (time_t) exec_time, true/*need_data_lock=true*/); }
[18 Nov 2014 16:39]
David Moss
Thanks for your feedback. The following was added to the 5.6.23 and 5.7.6 changelog with commit 4672: When using SHOW SLAVE STATUS to monitor replication performance, Seconds_Behind_Master sometimes displayed unexpected lag behind the master. This was caused by Previous_gtids_log_events being written to the slave's relay log with a timestamp behind the master, and then being used to calculate the Seconds_Behind_Master. This fix ensures that events generated on the slave that are added to the relay log and not used when calculating Seconds_Behind_Master.
[20 Nov 2014 15:46]
David Moss
Posted by developer: Thanks for your feedback. The following was added to the 5.6.23 and 5.7.6 changelog with commit 4672: When using SHOW SLAVE STATUS to monitor replication performance, Seconds_Behind_Master sometimes displayed unexpected lag behind the master. This was caused by Previous_gtids_log_events being written to the slave's relay log with a timestamp behind the master, and then being used to calculate the Seconds_Behind_Master. This fix ensures that events generated on the slave that are added to the relay log and not used when calculating Seconds_Behind_Master.
[12 Feb 2015 12:24]
Laurynas Biveinis
$ git show -s 4ded569 commit 4ded5690d954ded1d8bfcbb2807a6893ddd208ef Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com> Date: Thu Nov 13 21:46:26 2014 +0530 Bug#18622657 SECONDS_BEHIND_MASTER DISTORTED BECAUSE OF PREVIOUS_GTID EVENT Problem: Previous gtid log event in relay log should not be contributed to seconds behind master calculation. Analysis: A log event contains a timestamp in it's header which represents the time at which it got generated on Master. In relaylog, we have events which got transferred from Master and also contains which were generated on Slave, which we call it as " a relay log event" for eg: when relaylog is rotated, Slave generates and writes Previous_gtid_log_event in to the relay log. These events (which got generated on Slave) contains the timestamp of thread's starting time. When we are calculating Second Behind Master(SBM), the events which are generated on Slave should be ignored and should not contribute to SBM calculation. We already have a logic in the server if (!ev->is_relay_log_event()) { last_master_timestamp= } But the previous gtid log events generated on Slave are not marked as "relay log events" hence last_master_timestamp is getting set with thread's start time and SBM will have wrong value. And also in some situations, we generate Rotate log events on slave to maintain the SQL positions in sync with Master these log events are generated with server_id 0. These events also should not contribute to SBM calculation. Fix: Mark previous_gtid_log_event as relay log event. And don't set last_master_timestamp when server_id= 0.
[17 Apr 2015 20:48]
Santosh Praneeth Banda
The fix doesn't handle ROTATE_EVENT. As i mentioned in the comment earlier, some ROTATE events have old timestamps. It seems even the rotate events in the relay log have old timestamps. Except the fake rotate from master, other rotate events are used in calculating the last master timestamp, so a spike in sbm is possible mysqld-relay-bin-3307.000234 # at 120 #140417 17:06:24 server id 127966511 end_log_pos 187 Previous-GTIDs # 79169a35-ba7d-11e3-82ee-0002c9521040:911507535-916628573 # at 187 #140417 17:28:53 server id 127969309 end_log_pos 425738623 Rotate to binary-logs-3307.001766 pos: 4 # at 237 #140417 17:06:24 server id 127966511 end_log_pos 292 Rotate to mysqld-relay-bin-3307.000235 pos: 4 DELIMITER ; mysqld-relay-bin-3307.000235 # at 353 #140417 17:06:24 server id 0 end_log_pos 403 Rotate to binary-logs-3307.001766 pos: 267 Correct fix would be @@ -7219,8 +7219,7 @@ int Rotate_log_event::do_update_pos(Relay_log_info *rli) (ulong) rli->get_group_master_log_pos())); mysql_mutex_unlock(&rli->data_lock); if (rli->is_parallel_exec()) - rli->reset_notified_checkpoint(0, - server_id ? when.tv_sec + (time_t) exec_time : 0, + rli->reset_notified_checkpoint(0, 0, true/*need_data_lock=true*/); /* diff --git a/sql/rpl_slave.cc b/sql/rpl_slave.cc index bde4b90..fe8b5a6 100644 --- a/sql/rpl_slave.cc +++ b/sql/rpl_slave.cc @@ -4322,8 +4322,10 @@ static int exec_relay_log_event(THD* thd, Relay_log_info* rli) if (!(rli->is_parallel_exec() || ev->is_artificial_event() || ev->is_relay_log_event() || ev->when.tv_sec == 0 || ev->get_type_code() == FORMAT_DESCRIPTION_EVENT || - ev->server_id == 0)) + ev->server_id == 0 || ev->get_type_code() == ROTATE_EVENT)) { + DBUG_ASSERT(ev->get_type_code() != ROTATE_EVENT && ev->get_type_code() != + PREVIOUS_GTIDS_LOG_EVENT); rli->last_master_timestamp= ev->when.tv_sec + (time_t) ev->exec_time; DBUG_ASSERT(rli->last_master_timestamp >= 0);
[28 Feb 2017 10:10]
MySQL Verification Team
Followup: Bug 23532304 - SECONDS_BEHIND_MASTER REPORTS IO_THREAD TIME WHEN USING MTS & ROTATING RELAY-LOG Fixed in next versions, 5.6.36, 5.7.18, 8.0.1