Bug #92964 Slave performance degrades over time
Submitted: 26 Oct 11:57 Modified: 12 Dec 18:28
Reporter: Grzegorz Rojek Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S5 (Performance)
Version:5.7.23 OS:Debian (8.0, 9.0)
Assigned to: CPU Architecture:x86
Tags: GTID, multi threaded slave, replication

[26 Oct 11:57] Grzegorz Rojek
Description:
When mysql slave is running gtid based asynchronous multi threaded replication, performance degrades over time in significant way when session_track_gtids is set to OWN_GTID;

Thread  coordinating queries to execute on slave, consumes more and more cpu time, until it reaches 100% cpu utilization on 1 cpu core, which drastically degrades slave performance

In longer period it leads, to slave lag until slave thread is restarted.

Tested on. : 
mysql 5.7.22, 5.7.23, 5.7.24

How to repeat:
Enable GTID based replication;

mysql slave : 
 > stop slave ; 

mysql master : 
/*Generate large number of small transactions on master for at least 30 minutes to generate quite large backlog.*/
/*Do not stop generating events */

mysql slave  :
 > set global session_track_gtids = OWN_GTID; 
 > set global slave_parallel_workers = 4; /* any number of threads */
 > start slave;

Observe on slave server. :
  cpu utilization
  rate of trx per second 
 

After some time (15 minutes should be enough ), stop and start replication on slave: 
slave server : 
 > stop slave; start slave;

Observe  on slave server. :
  cpu utilization
  rate of trx per second 

Rate of trx should be much faster with lower cpu usage at the beginning. 
performance degrades over time just like on first start slave.

After another period of ~15 minutes ( when change in performance is visible ) :
slave server :  
 > set global session_track_gtids = OFF; 
 > stop slave; start slave;

Rate of committed trx is stable - as fast as right after restarting slave thread;
Cpu usage is stable.

Suggested fix:
Probably session tracked gtid's are not merged by coordinator, so after time searching if transaction should be execute takes longer as number of executed gtid increases.
[12 Dec 18:28] Bogdan Kecman
Thanks for the report, verified as described

Bogdan
[12 Dec 18:28] Bogdan Kecman
changed severity to performance