Bug #85142 reducing MTS checkpointing causes high IO load
Submitted: 23 Feb 2017 6:40 Modified: 6 Mar 2017 9:33
Reporter: Trey Raymond Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6, 5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: checkpoint, io, MTS

[23 Feb 2017 6:40] Trey Raymond
The implementation of slave_worker_info.Checkpoint_group_bitmap is inefficient when slave_checkpoint_group is large.  Trying to raise that to get a system with less checkpointing causes enormous amounts of extra IO.
That column can go up to 65K - 1 bit per 8 transaction - and is written out with every gtid processed (in theory every commit but that's another issue).  Setting it to max (512K) sent the writes of a test server from 12-20MB/s up to 100-150MB/s, and from table_io_waits_summary_by_table, it's clear that all the io is on slave_worker_info.
Granted, that's the max value, but it shows how the slave's internal information storage can be almost 10x the IO load of the actual application writes to the database.  It's expected based on the logic of that column, but as it makes reduced checkpointing impossible, it's a blocker for a better mts (well, binlog file rollover too).

How to repeat:
This easy test used 5.7.17.
Set up two servers, master A and slave B.  On B, set slave_parallel_workers to a nonzero value, say 8.  Now, start writes to A, and monitor IO.  Check table_io_waits_summary_by_table (with mysql enabled in setup_objects) and look at the avg/total wait on slave_worker_info.  Then stop slave, and set slave_checkpoint_group to a high value, like its (512K-8) maximum.  Something to make that blob field need to be stored in overflow pages.  In this case, slave_worker_info is dynamic row format - you might get it to be even worse with the old compact format, but I doubt much difference.  Start the slave again and check out the slave_worker_info table, much larger.  Now check that IO again, and the p_s table.

Suggested fix:
Find a better way to do this which doesn't require a whole max-size bitmap write with every commit.  I'll review the code at some point, and if I think of anything easy, I'll gladly suggest it - but this is a very high cost operation that, while valuable, is way out of proportion.
[6 Mar 2017 9:33] Bogdan Kecman
Thanks for the report. Verified as reported. 
All best
Bogdan Kecman