Bug #5116 Seconds_Behind_Master incorrectly reported
Submitted: 20 Aug 2004 0:23 Modified: 21 Sep 2004 2:52
Reporter: Devananda v Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:4.1.3-beta OS:Linux (Mandrake 9.2)
Assigned to: Assigned Account CPU Architecture:Any

[20 Aug 2004 0:23] Devananda v
Description:
We have 7 pairs of servers set up with bidirectional replication (InnoDB), such that, for example, my1 replicates to my1r and my1r replicates to my1. my1r does not handle any client connections, but is completely ready to if we need or want to fail over, which we often do. What I am seeing is Seconds_Behind_Master reports as ~0 on both my1 and my1r most of the time, but sometimes my1 will report it as increasingly large, even though replication has not failed (ie, the Exec_Master_Log_Pos is correct).

Here is an example of this.

*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: db04.sjo.neopets.com
Master_User: repuser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: db04-bin.000026
Read_Master_Log_Pos: 29800242
Relay_Log_File: my4r-relay-bin.000015
Relay_Log_Pos: 681720672
Relay_Master_Log_File: db04-bin.000026
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: 
Replicate_Ignore_DB: 
Replicate_Do_Table: 
Replicate_Ignore_Table: 
Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
Last_Errno: 0
Last_Error: 
Skip_Counter: 0
Exec_Master_Log_Pos: 29800242
Relay_Log_Space: 681720672
Until_Condition: None
Until_Log_File: 
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File: 
Master_SSL_CA_Path: 
Master_SSL_Cert: 
Master_SSL_Cipher: 
Master_SSL_Key: 
Seconds_Behind_Master: 0

**************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: rac04.neopets.com
Master_User: repuser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: my4r-bin.000010
Read_Master_Log_Pos: 79
Relay_Log_File: db04-relay-bin.000001
Relay_Log_Pos: 9676
Relay_Master_Log_File: my4r-bin.000010
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: 
Replicate_Ignore_DB: 
Replicate_Do_Table: 
Replicate_Ignore_Table: 
Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
Last_Errno: 0
Last_Error: 
Skip_Counter: 0
Exec_Master_Log_Pos: 79
Relay_Log_Space: 9676
Until_Condition: None
Until_Log_File: 
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File: 
Master_SSL_CA_Path: 
Master_SSL_Cert: 
Master_SSL_Cipher: 
Master_SSL_Key: 
Seconds_Behind_Master: 59344

Here, rac04 is reporting that it is 59344 seconds behind db04. One possibility that was explained to me is that since there have not been any replicated events from db04 to rac04, it believes itself to be behind. I do not find this plausible, since on other servers this is not the case - no events are replicated and Seconds_Behind_Master remains 0. Also, I have not found a consistent way to reproduce this error; it seems to happen randomly to any of our servers.

How to repeat:
Unknown, sorry, but it is happening frequently.
[20 Aug 2004 23:52] Matthew Lord
Hi,

You may be having system time problems for some reason.  Can you enable the logging of
warnings with the msyqld startup option --log-warnings and see if you see this in
your .err file:

Warning: "SELECT UNIX_TIMESTAMP()" failed on master,
do not trust column Seconds_Behind_Master of SHOW SLAVE STATUS"

Thanks!
[21 Aug 2004 2:37] Matthew Lord
After looking at this again I'm not sure that this is a bug but maybe a feature request.    The 
Seconds_Behind_Master is a comparison of the the current system time and the timestamp for 
the last statement executed by replication's SQL thread.

Based on the example you presented the seconds behind master came out to about 14 hours.  Is 
it possible that no updates had happened on the master for that long?

Best Regards
[21 Aug 2004 2:44] Matthew Lord
We could add a feature request that adds a check to try and determine if the slave is fully caught 
up and display 0 instead of

"The number of seconds that have elapsed since the timestamp of the last master's event 
executed by the slave SQL thread. "

That is from http://dev.mysql.com/doc/mysql/en/SHOW_SLAVE_STATUS.html
[21 Aug 2004 2:52] Devananda v
>After looking at this again I'm not sure that this is a bug but maybe a
>feature request.    The Seconds_Behind_Master is a comparison of the the current system time
>and the timestamp for the last statement executed by replication's SQL thread.

>Based on the example you presented the seconds behind master came out
>to about 14 hours.  Is it possible that no updates had happened on the master for that long?

It is entirely possible, but then I have two questions regarding this. 

We have currently 7 pairs of servers each replicating A -> B and B -> A, where only A recieves client connections - therefor B is continuously processing A's binlog whereas A is indefinitely 'waiting for master (B) to send event'. Why would I see Seconds_Behind_Master increasing on only 3 of the 7 B's, when they are all (theoretically) following this model?

Also, a hypothetical question: if I set up 2 servers, C and D, and began replication from C to D only, then connected to C and ran a couple transactions and verfied that D had replicated them (at this point seconds_behind_master should be ~0) and then *left them alone for a while* and did nothing on either C or D until an hour later, would D report that it was 3600 seconds_behind_master ? If this is the case (and how mysql is intended to behave), then this shows me that we have some other sort of problem, and 4 of our pairs (the ones that are reporting seconds_behind_master=0) are somehow misbehaving and I need to investigate that instead, and then I must apologize for thinking it was your bug, and not ours :)

Thanks for your feedback!
Devananda
Neopets, Inc
[14 Feb 2005 22:54] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[1 Jun 2006 12:25] Martijn van den Burg
This issue is popping up here, as well.
Setup: replication A -> B, using 5.0.18-standard (Solaris 8). Seconds_Behind_Master sometimes reaches very high values for only a short period of time.

Example: at Thu Jun  1 13:45:49 2006, the value of 'Seconds_Behind_Master' is 2449, for just one second. No queries were being executed at that time, and the latest query occurred just a few minutes before.

The system time on the servers is identical and they are located closely together, on the same network segment.

Here are the excerpts from the master and slave binlogs from around that time:

MASTER

#060601 13:42:21 server id 32  end_log_pos 125  Query   thread_id=50809 exec_time=0     error_code=0
SET TIMESTAMP=1149162141;
update LOGGER set VISITORS = (VISITORS + 1) where LOGGER_ID = 1;
# at 2145566
#060601 13:42:21 server id 32  end_log_pos 2145593      Xid = 1185303
COMMIT;
# at 2145593
#060601 13:52:59 server id 32  end_log_pos 28   Intvar
SET INSERT_ID=17;
# at 2145621
#060601 13:52:59 server id 32  end_log_pos 259  Query   thread_id=54329 exec_time=0     error_code=0
SET TIMESTAMP=1149162779;
insert into TIME_STATUS (MACHINE_ID, START_CODE, START_TIME, STATUS_TYPE_ID, MACHINE_STATUS_ID, MODE) values (327,'TM-1100', from_unixtime(1149162777),26,35,'N'
);
# at 2145852
#060601 13:52:59 server id 32  end_log_pos 2145879      Xid = 1186847
COMMIT;
# at 2145879

SLAVE

#060601 13:42:21 server id 32  end_log_pos 125  Query   thread_id=50809 exec_time=0     error_code=0
use msm;
SET TIMESTAMP=1149162141;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1;
SET @@session.sql_mode=0;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8;
update LOGGER set VISITORS = (VISITORS + 1) where LOGGER_ID = 1;
# at 16748
#060601 13:42:21 server id 32  end_log_pos 16775        Xid = 1446
COMMIT;
# at 16775
#060601 13:52:59 server id 32  end_log_pos 28   Intvar
SET INSERT_ID=17;
# at 16803
#060601 13:52:59 server id 32  end_log_pos 259  Query   thread_id=54329 exec_time=0     error_code=0
SET TIMESTAMP=1149162779;
SET @@session.time_zone='SYSTEM';
insert into TIME_STATUS (MACHINE_ID, START_CODE, START_TIME, STATUS_TYPE_ID, MACHINE_STATUS_ID, MODE) values (327,'TM-1100', from_unixtime(1149162777),26,35,'N'
);
# at 17034
[1 Jun 2006 12:44] Martijn van den Burg
I would like to add the following: Seconds_Behind_Master returns to 0 after the previously mentioned value of 2449, but 33 minutes and 42 seconds later it suddenly becomes 4472 (and then drops back to 0):

Thu Jun  1 13:45:49 2006  Seconds_Behind_Master : 2449

- 0 in between -

Thu Jun  1 14:19:31 2006  Seconds_Behind_Master : 4472