Bug #98115 | seconds_behind_master is not correct | ||
---|---|---|---|
Submitted: | 3 Jan 2020 2:23 | Modified: | 28 Jan 2020 15:34 |
Reporter: | hailiang zhao | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.7.25 | OS: | CentOS (6.5) |
Assigned to: | MySQL Verification Team | CPU Architecture: | x86 |
Tags: | MTS, Seconds_Behind_Master |
[3 Jan 2020 2:23]
hailiang zhao
[3 Jan 2020 7:57]
hailiang zhao
additional informatioin: use GTID ,MTS(8 thread) we also use gdb to debug, setting break point for "apply_event_and_update_pos" function, and when A insert a row, The results of (*ptr_ev)->common_header->type_code are all ROTATE_EVENT Does ROTATE_EVENT cause master's "seconds_behind_master" to be not correct?
[9 Jan 2020 2:30]
Darren Zhao
How to imitate this question? sysbench /usr/local/sysbench-master/src/lua/oltp_read_write.lua --mysql-host=localhost --mysql-port=3306 --mysql-user=root --mysql-password='root' --mysql-socket=/tmp/mysql3306.sock --mysql-db=sbtest --db-driver=mysql --tables=10 --table-size=10000000 prepare and then you can observer A's seconds_behind_master value from time to time.
[9 Jan 2020 2:34]
Darren Zhao
the following is results: Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 70 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0 Seconds_Behind_Master: 0
[16 Jan 2020 1:03]
MySQL Verification Team
Hi, I'm not sure this is a bug. I see the SBM value jumps from zero up (I don't see as high values as you do, so nothing as high as 70 seconds) but that's normal
[16 Jan 2020 1:04]
MySQL Verification Team
Are you sure DATETIME on both servers is properly set and that you did not have some NTP events happening during your test?
[16 Jan 2020 1:28]
Darren Zhao
@ Bogdan Kecman Are you sure DATETIME on both servers is properly set and that you did not have some NTP events happening during your test? I'm sure NTP events are normal and DATETIME on both two servers is properly set. we have tested for many times,and we found when binlog is rotating , SBM may come up.
[16 Jan 2020 1:35]
MySQL Verification Team
Hi, > I'm sure NTP events are normal and DATETIME on both two servers is properly set. perfect, as I remember a bug we chased for hours till we figured out NTP had time jumps that were messing with timestamp comparison. > we have tested for many times,and we found when binlog is rotating , SBM may come up. It is expected that SBM jumps when binlog is rotating as you get slowdown in execution of statements but are you sure the jump is "higher than expected"? How are you fetching the data, you are executing "show slave status" and parsing the output or? As you can also extract the statement that way and see if that's really something that happened that many seconds ago? When I'm running this test on my setup I see the SBM jump to 5-6 seconds but it's a "real value" not a bug.
[16 Jan 2020 1:48]
Darren Zhao
1、How are you fetching the data, you are executing "show slave status" and parsing the output or? YES, we use script to fetch the data: while true do mysql -A -uroot -pxx -e"show slave status\G" | grep 'Seconds_Behind_Master' >> delay.log echo `date` >> delay.log done 2、A is master, this is a circular replication, and A hasn't any events from B,we think A should't occur delay, but some time SBM is very big data: Slave_IO_State: Waiting for master to send event Master_Host: 172.28.65.36 Master_User: replication Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql-bin.000168 Read_Master_Log_Pos: 888959738 Relay_Log_File: mysql-relay-bin.000344 Relay_Log_Pos: 367 Relay_Master_Log_File: mysql-bin.000168 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 888959104 Relay_Log_Space: 621 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 7082 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 77161 Master_UUID: e223a4d5-222e-11ea-8b0e-08f1eaf15364 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Reading event from the relay log Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: cdf5562d-222e-11ea-a850-08f1eaf16af2:1-2959959331 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version:
[16 Jan 2020 17:25]
MySQL Verification Team
Hi, This does not seem normal, I have never seen this. Let me try to reproduce this again, this is not what I reproduced.
[28 Jan 2020 15:34]
MySQL Verification Team
Hi, I cannot reproduce what you are reporting no matter what I try. Can you reproduce this with 5.7.29 too ? thanks
[15 Jul 2020 13:44]
Buchan Milne
Seconds_Behind_Master change after enabling MTS
Attachment: MySQL_MTS_Seconds_Behind_Master.png (image/png, text), 30.81 KiB.
[15 Jul 2020 13:47]
Buchan Milne
We are impacted by this issue We have a similar setup, with circular replication. We recently upgraded to 5.7, then enabled GTID online, then enabled MTS. We aren't currently using semi-sync. Before enabling MTS, we didn't ever see the Seconds_Behind_Master on the writable master being above 0. Within 15m of enabling MTS, Seconds_Behind_Master began varying between 0 and up to over 600 We have performed this migration following the same process in 4 installations, and one is in progress (GTID has been enabled, but MTS hasn't). All 4 installations on MTS exhibit this issue, and the installation on GTID but not MTS does not currently exhibit the issue. I have attached a graph indicating the Seconds_Behind_Master p0 and p100 for one of the installations we recently upgraded, as long as the catchup-rate, indicating the improvement we got once MTS had been enabled on the circular master as well: https://bugs.mysql.com/file.php?id=29691&bug_id=98115 Percona has also noted this issue: https://www.percona.com/blog/2017/01/27/wrong-seconds_behind_master-with-slave_parallel_wo... Our automation tooling that performs (among other things) failovers checks the Seconds_Behind_Master on the writable before performing a failover to the circular master, and is now often failing due to this issue. Is it possible to fix this, or is there another alternative to ensuring that a writable master is safely replicating from a writable before performing a failover? (We are currently running Percona-Server-5.7.27).
[1 Oct 2021 22:42]
Artsiom Lipauka
I have same problem with 5.7.17. seconds_behind_master abnormal values periodically (~600000 and etc)