MySQL Bugs: #87832: Relay_Log_Space is inaccurate and leaks

Bug #87832	Relay_Log_Space is inaccurate and leaks
Submitted:	22 Sep 2017 5:20	Modified:	12 Nov 2018 15:48
Reporter:	Manuel Ung	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6 5.7	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
The Relay_Log_Space variable as shown in SHOW SLAVE STATUS is sometimes much higher than the actual disk space used by relay logs. This is because we are not writing to Relay_log_info::log_space_total in a synchronized manner.

What can happen is the following:

log_space_total is 1000

IO thread reads log_space_total to be 1000 and adds 100 to get 1100
SQL thread purges, subtracts 1000 and writes 100 to log_space_total.
IO thread writes 1100 to log_space_total

How to repeat:
To confirm that the two codepaths can be entered executed concurrently, add a sleep in MYSQL_BIN_LOG::purge_logs_in_list and confirm that the IO thread can still modify log_space_total without blocking.

Suggested fix:
Add locks or use atomic variables.

Another effect of this bug is that the counter underflows when some additions get lost, giving some huge number.

Hi,
Yes, I can confirm/verify this behavior. Now I don't agree with proposed solution as it would have negative performance impact to the system. Anyhow I'm verifying it and I'll let the replication team decide how they want to handle this.

best regards
Bogdan

Any updates with this? We just use std::atomic and that performs pretty well.

Posted by developer:
 
Changelog entry added for MySQL 8.0.14, 5.7.25 and 5.6.43:

The value returned by a SHOW SLAVE STATUS statement for the total combined size of all existing relay log files (Relay_Log_Space) could become much larger than the actual disk space used by the relay log files. The I/O thread did not lock the variable while it updated the value, so the SQL thread could automatically delete a relay log file and write a reduced value before the I/O thread finished updating the value. The I/O thread then wrote its original size calculation, ignoring the SQL thread's update and so adding back the space for the deleted file. The Relay_Log_Space value is now locked during updates to prevent concurrent updates and ensure an accurate calculation.

this bug fix lead to io thread response semi ack to master block about 200 ms when sql thread purge relay log.