Bug #111149 Lock leak in heartbeat_queue_event()
Submitted: 25 May 2023 9:50 Modified: 8 Aug 2023 0:10
Reporter: genze wu (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[25 May 2023 9:50] genze wu
Description:
In sql/rpl_replica.cc:7485, function heartbeat_queue_event() process the heartbeat event with mi->data_lock. But in some error condition, mi->data_lock have not been unlock before return, like in line 7526 and 7536.

After process this error heartbeat event, mi->data_lock leak. All replica command like reset replica will be hang. 

How to repeat:
All error condition can cause that problam, here is an example.

If you have a big transaction(binlog greater than 4GB),After it write to binlog,the binlog will rotate to next file. Set MASTER_HEARTBEAT_PERIOD to a very small value like 0.001, there will be chance that heartbeat event send before binlog rotate.

In this situation, end_pos will overflow and small than log pos on replica point, which will cause this problam.

Suggested fix:
Unlock the mi->data_lock when error found.
[25 May 2023 11:11] genze wu
affect 8.0
[25 May 2023 19:24] MySQL Verification Team
Thank you for your comments
[8 Aug 2023 0:10] Jon Stephens
Documented fix as follows in the MySQL 8.2.0 changelog:

    For large transactions (greater than 4GB) and small values
    of MASTER_HEARTBEAT_PERIOD, it was possible for the heartbeat
    event to be sent before binary log rotation could complete,
    causing RESET REPLICA and similar statements on the replica to
    hang.

Closed.
[3 Nov 2023 20:32] Jean-François Gagné
This bug, reported in 8.0, is marked as fixed in 8.2: will this be fixed in 8.0 ?