Bug #84225 set opt_mts_checkpoint_period to values less than 5 cause MTS hang
Submitted: 16 Dec 2016 8:55 Modified: 22 Dec 2016 5:44
Reporter: Fangxin Flou (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.17 OS:Any
Assigned to: CPU Architecture:Any
Tags: binlog, MTS

[16 Dec 2016 8:55] Fangxin Flou
Description:
I set the option opt_mts_checkpoint_period to values less than 5, and start the MTS recovery, find that the SQL thread hang on read relay log event.

if I turn to non-MTS replication, it's not affected.

How to repeat:
stop slave;
set global slave_checkpoint_period = 2;
set global slave_parallel_workers=12;
start slave;

the slave sql thread hang on 

mts_checkpoint_routine -> my_sleep -> select

Suggested fix:
I changed the following definition

const ulong mts_coordinator_basic_nap= 1;

and it works fine now.
[20 Dec 2016 9:41] MySQL Verification Team
Hello Fangxin,

Thank you for the report.
I'm not seeing the reported issue on 5.7.17 with moderate load on master. Could you please provide conf files from both master/slave, and any other related info which help me trigger this issue at my end? You may want to mark conf etc details as private after posting here.

Thanks,
Umesh
[22 Dec 2016 5:44] Fangxin Flou
My mistake, I call  the mts_checkpoint_routine in a daemon plugin (for real performance metrics) thread to get the latest applyed binlog position, which caused the problem.

Seems that mts_checkpoint_routine can only be called by slave SQL thread or worker threads.

After I remove the function call in my daemon plugin, it doesn't hang any more.

Thanks.
[22 Dec 2016 5:46] MySQL Verification Team
Thank you for confirming.

Thanks,
Umesh