Bug #52166 | Seconds_Behind_Master spikes after long idle period | ||
---|---|---|---|
Submitted: | 18 Mar 2010 7:35 | Modified: | 15 Nov 2010 14:14 |
Reporter: | Mikiya Okuno | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.1,5.5 | OS: | Any |
Assigned to: | Luis Soares | CPU Architecture: | Any |
[18 Mar 2010 7:35]
Mikiya Okuno
[18 Mar 2010 8:44]
Andrei Elkin
Confirming logics of next_event() indeed allows SSS to display a Seconds_behind as time(0) - the-last-processed-event-last_master_timestamp for duration of processing a long query. So if in between of the reported INSERT INTO t VALUES(SLEEP(10)) current event and an event right before it there was 100 seconds, the 100 seconds of Seconds_behind would be displayed for 10 seconds of processing of the INSERT. Resetting last_master_timestamp at start of reading from relay log was a reasonable technique in past when master could not send heartbeats. With sending a HB event master ensures that it does not have any real event so that only when slave receives the HB it can rightfully reset last_master_timestamp.
[19 Mar 2010 4:36]
MySQL Verification Team
a patch for mysql 5.1
Attachment: bug52166.patch (application/octet-stream, text), 689 bytes.
[15 Apr 2010 16:40]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/105760 3183 Luis Soares 2010-04-15 BUG#52166: Seconds_Behind_Master spikes after long idle period Introduction: seconds behind master is calculated by checking the time difference between the current time on the slave (time(0)) and the creation timestamp, at the master, of the last executed event by the SQL thread. The last event timestamp is saved by the SQL thread in rli->last_master_timestamp. Different timezones for master and slave are taken into account when showing this delta. The rli->last_master_timestamp is updated when the SQL thread finishes the event execution at the slave. Problem: when the SQL thread has no events to replay in the relay log, it waits. Right before waiting, it saves the last event (lets call it e1) timestamp (rli->last_master_timestamp) set at the master (in save_last_master_timestamp), and sets rli->last_master_timestamp to zero. This makes 'SHOW SLAVE STATUS' show zero while the SQL thread is waiting. However, when the SQL thread awakes, most likely because of the fact that a new event was enqueued in the relay log, it restores the save_last_master_timestamp to rli->last_master_timestamp. Then it goes on to process the new event (lets name it e2). Given this and the fact that the SQL thread only updates the rli->last_master_timestamp at the end of execution, there is a time window that the user can notice a sudden spike on Seconds_Behind_Master, especially if e2 arrives a long time after e1 finished. The spike is actually enforced - value increases - during e2 execution time, which means that if e2 execution latency increases, then the likelihood that the user sees this also increases. Solution: We fix this by not saving and restoring the last_master_timestamp when the SQL thread goes to sleep. Instead, we set the rli->last_master_timestamp to zero, when the SQL thread starts waiting, and keep it to zero, when the SQL thread awakes. A complementary change in this patch, is the one that sets rli->last_master_timestamp before and not after event execution. Both of these changes make the spike go away, and keep showing zero, until the event starts being processed, as opposed to showing zero until the event finishes its execution. @ mysql-test/suite/manual/t/rpl_replication_delay.test Removed this unmaintained test case. @ mysql-test/suite/rpl/t/seconds_behind_master.test Added some new tests. Collected remains from the removed rpl_replication_delay, refactored and added in this file as well. @ sql/log_event.cc Reworked the fault injection to meet the new test case needs. Refactored the stmt_done interface (does not need 'when' anymore. @ sql/log_event_old.cc Accomodating changes in the stmt_done interface. @ sql/rpl_rli.cc Updated stmt_done implementation. @ sql/rpl_rli.h Changed stmt_done interface, removing the not needed event_creation_time. @ sql/slave.cc Two changes: (i) removed the save and restore of rli->last_master_timestamp, when the SQL thread goes to sleeps. (ii) Updating the rli->last_master_timestamp at the beginning of the event execution.
[30 Apr 2010 11:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/107068 3184 Luis Soares 2010-04-30 BUG#52166: Seconds_Behind_Master spikes after long idle period Addressing review comments: - Added comment to the assertion introduced in log_event.cc - Taking into consideration exec_time when setting rli->last_master_timestamp - Improved test case comemnts - Better time boundaries on the time based assertions - Improved robustness of the test case - Added source include/show_rpl_debug_info.inc when test dies on purpose (failed assertion). - Minor test simplifications. @ sql/log_event.cc Sets exec_time(0) in Log_event::Log_event constructor. Events that later will not set it, when deserializing, will have it set to zero.
[4 May 2010 10:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/107286 3185 Luis Soares 2010-05-04 BUG#52166: Seconds_Behind_Master spikes after long idle period Addressing review comments: - renamed the test case: seconds_behind_master -> rpl_seconds_behind_master; - more improvements on the comments; - removed sleep from the last insert which is no longer needed. Also, changed remark about the test requiring statement based replication.
[30 May 2010 23:21]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/109564 3185 Luis Soares 2010-05-31 [merge] BUG#52166: Seconds_Behind_Master spikes after long idle period Automerged original patch and incremental patches from bug report into latest mysql-5.1-rep+2.
[30 May 2010 23:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/109565 3141 Luis Soares 2010-05-31 [merge] BUG#52166: Seconds_Behind_Master spikes after long idle period Automerge from mysql-5.1-rep+2 into mysql-5.1-rep+3.
[30 May 2010 23:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/109566 3142 Luis Soares 2010-05-31 [merge] BUG#52166: Seconds_Behind_Master spikes after long idle period Automerge from mysql-5.1-rep+2 into mysql-5.1-rep+3.
[30 May 2010 23:28]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/109567 2981 Luis Soares 2010-05-31 [merge] BUG#52166: Seconds_Behind_Master spikes after long idle period Manual merge from mysql-5.1-rep+2 into mysql-next-mr-rpl-merge. Conflicts ========= Text conflict in sql/slave.cc
[3 Jun 2010 23:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/110178 3187 Luis Soares 2010-06-04 BUG#52166: Seconds_Behind_Master spikes after long idle period Post-push fix. Fixing valgrind warnings: clock_diff_with_master was not initialized (and this was even the case before patch for BUG#51266). We fix this by setting clock_diff_with_master to zero on Master_info contstructor.
[26 Jun 2010 7:40]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (pib:16)
[4 Aug 2010 8:08]
Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:24]
Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:alik@sun.com-20100626073921-t0e6q2c9nkhuemnj) (merge vers: 5.6.99-m4) (pib:18)
[6 Aug 2010 14:27]
Jon Stephens
Does not appear in any 5.6 release, thus no 5.6 changelog entry required. Set NM status, waiting for merges to 5.1 and 5.5.
[16 Aug 2010 6:39]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[3 Sep 2010 14:29]
Jon Stephens
Discussed with Luís on IRC -- no new merges expected, closing.
[13 Nov 2010 16:26]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[15 Nov 2010 12:55]
Jon Stephens
Set back to Closed. See previous comments.
[14 Jan 2011 23:48]
James Day
There are no plans to fix this in 5.1. If practical and if this bothers you, we suggest that you consider upgrading to MySQL 5.5. An alternative option with 5.1 is to have some process that does periodic updates to a row in a table that's used just to give the timestamp of the last change on the master. Those updates will cause replication to notice the change and you can check the row you're updating also if you want to. Events are one possible way to arrange this.