Bug #85141 Write/fsync amplification w/ duplicate GTIDs
Submitted: 23 Feb 2017 6:23 Modified: 14 Mar 2017 12:18
Reporter: Trey Raymond Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.17 OS:Any
Assigned to: CPU Architecture:Any
Tags: GTID, io, msr

[23 Feb 2017 6:23] Trey Raymond
Description:
The slave_worker_info/slave_relay_log_info tables don't need to update on no-ops, only on executed gtids.  skips still update the respective table (which one is based on slave_parallel_workers).  Say replication comes across a gtid that has already been executed - normally, it would not need to write any data.  because of this, it needs to write a row, commit, and (presumably based on config) fsync.  It's not a large amount of data byte-wise, but that's a lot of syncs.

Say you have a slave of an msr setup, with three masters configured, and each master sourcing from one writer and using log_slave_updates.  The slave sees 3 incoming streams of transactions.  That means 2 out of 3 incoming transactions (well "groups" in terms of replication code but that word is ambiguous now with group rep and such) are skipped.  That means the host is doing 3x as many commits as it actually needs to to remain consistent.  That's a lot of wasted effort!

The amplification is directly proportional to the count of channels in a situation like that, it gets impractical very quickly, even with nice drives and a big controller write cache.

How to repeat:
Keep an eye on p_s.table_io_waits_summary_by_table, and remember to enable 'mysql' schema in setup_objects.  Also, I was able to get some additional data on this when I ran the test on a host with google/percona's userstat plugin, which directly counts row changes by table.  You can look in event_waits etc as well but those two are good clear views into what's happening.
You can also look at stuff like the Handler_commit status variable for an even higher level view of just the commits/syncs.

Easiest way is to set up a slave (C) of two masters, having one as a writer (A) and one as a slave of that (B) with log_slave_updates.  This ensures that all GTIDs come in twice.
- Start writes to A and observe all for a bit.  Sysbench or similar is fine.
- Lag the intermediate, B, by stopping its sql thread or otherwise.
- Stop writes to A, allow C to catch up if it's not already.
- Start sql_thread on B (or remove the forced lag).
-- note: at this point, you know all gtid's that C will see are ones it has already executed.  no more data changes will be made by replication outside of the system tables.
- Now, watch C carefully.  Look at all the writes to the relevant slave_*_info table (can test with or without mts) that are still going on!  But no value is added, as they don't correspond to anything the server's actually changing.

Suggested fix:
If a slave goes down and comes back up, and is started, and the last gtid seen was a dupe gtid which it didn't write to the table about - that's fine consistency wise, as it is still in the executed gtid set, and will be skipped again.  However, this is a noticeable change in behavior, so a variable controlling it might be best.  In some cases, people might want to pay the additional write cost to skip reading all of those logs.

This would only be relevant to gtid processing, but the problem is based around gtid processing...most cases where you'd see an already executed transaction depend on gtid.  There's one case of circular replication between two masters without gtid - I haven't tested that setup to see if this is a problem there, but the code filtering same server id is not the same as the code which filters is_executed gtids.  I also haven't tested items that come through replication like settings in headers, but that doesn't concern me as the volume would always be low, and if I recall, timestamp setting and anything else is contained within a gtid-defined group and so would not be an additional issue.
[3 Mar 2017 23:48] Sveta Smirnova
test case for MTR

Attachment: rpl_bug85141.test (application/octet-stream, text), 1.95 KiB.

[3 Mar 2017 23:48] Sveta Smirnova
configuration file

Attachment: rpl_bug85141.cnf (application/octet-stream, text), 443 bytes.

[14 Mar 2017 12:18] MySQL Verification Team
Thank you Trey Raymond, Sveta for the bug report and test case.

Thanks,
Umesh