Bug #87832 Relay_Log_Space is inaccurate and leaks
Submitted: 22 Sep 2017 5:20 Modified: 12 Nov 2018 15:48
Reporter: Manuel Ung Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6 5.7 OS:Any
Assigned to: CPU Architecture:Any

[22 Sep 2017 5:20] Manuel Ung
Description:
The Relay_Log_Space variable as shown in SHOW SLAVE STATUS is sometimes much higher than the actual disk space used by relay logs. This is because we are not writing to Relay_log_info::log_space_total in a synchronized manner.

What can happen is the following:

log_space_total is 1000

IO thread reads log_space_total to be 1000 and adds 100 to get 1100
SQL thread purges, subtracts 1000 and writes 100 to log_space_total.
IO thread writes 1100 to log_space_total

How to repeat:
To confirm that the two codepaths can be entered executed concurrently, add a sleep in MYSQL_BIN_LOG::purge_logs_in_list and confirm that the IO thread can still modify log_space_total without blocking.

Suggested fix:
Add locks or use atomic variables.
[28 Sep 2017 10:45] Manuel Ung
Another effect of this bug is that the counter underflows when some additions get lost, giving some huge number.
[19 Oct 2017 17:09] MySQL Verification Team
Hi,
Yes, I can confirm/verify this behavior. Now I don't agree with proposed solution as it would have negative performance impact to the system. Anyhow I'm verifying it and I'll let the replication team decide how they want to handle this.

best regards
Bogdan
[1 Jun 2018 22:13] Manuel Ung
Any updates with this? We just use std::atomic and that performs pretty well.
[12 Nov 2018 15:48] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL 8.0.14, 5.7.25 and 5.6.43:

The value returned by a SHOW SLAVE STATUS statement for the total combined size of all existing relay log files (Relay_Log_Space) could become much larger than the actual disk space used by the relay log files. The I/O thread did not lock the variable while it updated the value, so the SQL thread could automatically delete a relay log file and write a reduced value before the I/O thread finished updating the value. The I/O thread then wrote its original size calculation, ignoring the SQL thread's update and so adding back the space for the deleted file. The Relay_Log_Space value is now locked during updates to prevent concurrent updates and ensure an accurate calculation.
[10 Jun 2021 6:12] chao gao
this bug fix lead to io thread response semi ack to master block about 200 ms when sql thread purge relay log.