Bug #117056 relay log recovery may lead to contention with explicitly called ‘start slave’
Submitted: 27 Dec 2024 10:12 Modified: 21 Jan 12:14
Reporter: Fan Lyu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[27 Dec 2024 10:12] Fan Lyu
Description:
In relay log recovery's call stack as follows:

load_mi_and_rli_from_repositories
|-fill_mts_gaps_and_recover
  |-start_slave_thread
  |-recover_relay_log

start_slave_thread is a asyn call. As the comments of fill_mts_gaps_and_recover says, it is a implicit execution of START SLAVE
  
After the log prints ER_RPL_RECOVERY_FILE_MASTER_POS_INFO from recover_relay_log, we see a 'ready for connection' print immediately, which means clients can connect to server.

If the SQLs to be recovered called by fill_mts_gaps_and_recover is a big transaction, e,g. large insert. And then we EXPLICITLY call start slave,then there will probably be a record lock contention when insert conflicts.

The questions is, is the contention between IMPLICITLY called SQL thread by load_mi_and_rli_from_repositories and the SQL thread EXPLICITLY called by user is a normal behaviour?

In my opinion, other recoverys from XA or InnoDB always happenens before 'ready for connection' and is treated as synchronous, but  fill_mts_gaps_and_recover calls start_slave_thread in an async way.

How to repeat:
we have a replica server which is killed  when replaying a large insert, then reboot the server ,see 
ER_RPL_RECOVERY_FILE_MASTER_POS_INFO and 'ready for connection' log prints.

Then start slave explcitly.
[10 Jan 14:20] MySQL Verification Team
Thank you for the report
[16 Jan 10:49] MySQL Verification Team
What version of MySQL did you use when you encountered this issue?
[21 Jan 12:14] Fan Lyu
hello, MySQL Team, I am using /reading codes of 8.0.32