Bug #92964 Slave performance degrades over time
Submitted: 26 Oct 2018 11:57 Modified: 18 Sep 19:38
Reporter: Grzegorz Rojek Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S5 (Performance)
Version:5.7.23 OS:Debian (8.0, 9.0)
Assigned to: CPU Architecture:x86
Tags: GTID, multi threaded slave, replication

[26 Oct 2018 11:57] Grzegorz Rojek
When mysql slave is running gtid based asynchronous multi threaded replication, performance degrades over time in significant way when session_track_gtids is set to OWN_GTID;

Thread  coordinating queries to execute on slave, consumes more and more cpu time, until it reaches 100% cpu utilization on 1 cpu core, which drastically degrades slave performance

In longer period it leads, to slave lag until slave thread is restarted.

Tested on. : 
mysql 5.7.22, 5.7.23, 5.7.24

How to repeat:
Enable GTID based replication;

mysql slave : 
 > stop slave ; 

mysql master : 
/*Generate large number of small transactions on master for at least 30 minutes to generate quite large backlog.*/
/*Do not stop generating events */

mysql slave  :
 > set global session_track_gtids = OWN_GTID; 
 > set global slave_parallel_workers = 4; /* any number of threads */
 > start slave;

Observe on slave server. :
  cpu utilization
  rate of trx per second 

After some time (15 minutes should be enough ), stop and start replication on slave: 
slave server : 
 > stop slave; start slave;

Observe  on slave server. :
  cpu utilization
  rate of trx per second 

Rate of trx should be much faster with lower cpu usage at the beginning. 
performance degrades over time just like on first start slave.

After another period of ~15 minutes ( when change in performance is visible ) :
slave server :  
 > set global session_track_gtids = OFF; 
 > stop slave; start slave;

Rate of committed trx is stable - as fast as right after restarting slave thread;
Cpu usage is stable.

Suggested fix:
Probably session tracked gtid's are not merged by coordinator, so after time searching if transaction should be execute takes longer as number of executed gtid increases.
[12 Dec 2018 18:28] MySQL Verification Team
Thanks for the report, verified as described

[12 Dec 2018 18:28] MySQL Verification Team
changed severity to performance
[18 Sep 19:38] Margaret Fisher
Posted by developer:
Added changelog entry for MySQL 5.7.32 and 8.0.22:

When the system variable session_track_gtids was set to OWN_GTID on a multithreaded replica, the replica’s performance would degrade over time and begin to lag behind the master. The cause was the buildup of the GTIDs recorded by the replica’s worker threads at each transaction commit, which increased the time taken by the worker threads to insert new ones. Session state tracking is now disabled for worker threads on a multithreaded replica. Thanks to Facebook for the contribution.