MySQL Bugs: #117056: relay log recovery may lead to contention with explicitly called ‘start slave’

Bug #117056	relay log recovery may lead to contention with explicitly called ‘start slave’
Submitted:	27 Dec 2024 10:12	Modified:	24 Feb 10:38
Reporter:	Fan Lyu	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:		OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
In relay log recovery's call stack as follows:

load_mi_and_rli_from_repositories
|-fill_mts_gaps_and_recover
  |-start_slave_thread
  |-recover_relay_log

start_slave_thread is a asyn call. As the comments of fill_mts_gaps_and_recover says, it is a implicit execution of START SLAVE
  
After the log prints ER_RPL_RECOVERY_FILE_MASTER_POS_INFO from recover_relay_log, we see a 'ready for connection' print immediately, which means clients can connect to server.

If the SQLs to be recovered called by fill_mts_gaps_and_recover is a big transaction, e,g. large insert. And then we EXPLICITLY call start slave,then there will probably be a record lock contention when insert conflicts.

The questions is, is the contention between IMPLICITLY called SQL thread by load_mi_and_rli_from_repositories and the SQL thread EXPLICITLY called by user is a normal behaviour?

In my opinion, other recoverys from XA or InnoDB always happenens before 'ready for connection' and is treated as synchronous, but  fill_mts_gaps_and_recover calls start_slave_thread in an async way.

How to repeat:
we have a replica server which is killed  when replaying a large insert, then reboot the server ,see 
ER_RPL_RECOVERY_FILE_MASTER_POS_INFO and 'ready for connection' log prints.

Then start slave explcitly.

Thank you for the report

What version of MySQL did you use when you encountered this issue?

hello, MySQL Team, I am using /reading codes of 8.0.32

Can you reproduce this with 8.0.40 as we are having issues reproducing this on 8.0.40.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".