MySQL Bugs: #84995: Show executed GTIDs by replication channel

Bug #84995	Show executed GTIDs by replication channel
Submitted:	14 Feb 2017 13:52	Modified:	21 Jun 2017 5:35
Reporter:	Simon Mudd (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S4 (Feature request)
Version:	5.7.17+	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	GTID, per_channel values, replication

Description:
SHOW SLAVE STATUS and the various replication tables provide no indication of the GTIDs received and executed per channel.

This information would be useful:
* should replication break
* should we wish to replicate from multiple sources (maybe the "same" data)
* to provide more complete logging

How to repeat:
Look for this information in the output of SHOW SLAVE STATUS and find you see only the GLOBAL metrics

Suggested fix:
Record per replication channel the following information:
* the GTIDs received from the channel (if any). This would start off being empty and get filled as GTID events arrive / are processed.
* the GTIDs executed by the SQL thread(s) in that channel.  This would be a subset of the previous point and if there had been gaps or similar it might mean that the executed set makes those gaps visible as only the executed GTIDs would be shown (this would not be the normal situation but could happen)

Whether it makes sense to record this in SHOW SLAVE STATUS or to have some new P_S replication tables is not clear. I would not expect it to be _necessary_ to keep this information across a server restart: that is the data probably should not be persistent but perhaps it might be useful. (?)

So to clarify: SHOW SLAVE STATUS does show

Retrieved_Gtid_Set:
and
Executed_Gtid_Set:

The retrieved set looks good, but the executed set is a global set of executed GTIDs not those which have been executed by the channel's SQL thread(s).

It may be sufficient to change the meaning of the current Executed_Gtid_Set column though that change at least in 5.7 would lead to incompatible change so would not be ideal, unless you consider the current behaviour to be a bug.

https://dev.mysql.com/doc/refman/5.7/en/show-slave-status.html clearly states:

"Executed_Gtid_Set

The set of global transaction IDs written in the binary log. This is the same as the value for the global gtid_executed system variable on this server, as well as the value for Executed_Gtid_Set in the output of SHOW MASTER STATUS on this server. Empty if GTIDs are not in use. "

Maintaining the same meaning would require a new column setting but also given it's clear that the output show "what's in the binary logs" it may make sense to "correct" this column to provide channel specific data, and leave SHOW BINARY LOGS with the "globally executed GTID set".

Hello Simon,

Thank you for the report and reasonable feature request!

Thanks,
Umesh

Posted by developer:

Thank you for this feature request, Simon.

This is interesting, but has the problem that the length of the string representation of the GTID set could grow indefinitely. The reason is that there could be gaps in the sequence, making it impossible to range-compress. For instance, if the channel applies transactions UUID:1, UUID:2, UUID:3, ... UUID:1000, we represent it compactly like UUID:1-1000000, which is good. But it is also possible for two channels to replicate from the same source, e.g. in a 'diamond' replication topology like A->B->D, A->C->D. In this case, the two channels on server D race to apply the transactions, and in the worst case may end up with one channel applying all even-numbered GTIDs and the other applying all odd-numbered GTIDs (UUID:1, UUID:3, UUID:5, ..., UUID:999 vs UUID:2, UUID:4, UUID:6, ..., UUID:1000000). Then it is impossible to range-compress the GTID sets, so they would grow indefinitely and start using up system resources. (This is not a problem for GTID_EXECUTED, because due to the auto-position protocol the gaps will eventually be filled in.)

So I think we cannot implement exactly this, but maybe we can think of alternatives that address the use cases.

Can you give an example of how you would use this information?

Would it be meaningful to e.g. keep the state of the last N transactions for a channel (for a configurable N)? Which of your use cases would that address and which would it not address? Any other variant that would help?

Your concern about string length issues with the GTID representation isn't just applicable to providing per channel statistics. It's latent in the design.

Is Diamond replication using GTID a support or expected usage? I can see why you might want to do this and it may help provide extra resilience but wonder how many people are using this right now. Certainly for that type of use case my suggestion would be problematic.

Maybe GR also suffers from this?

The main reason for requesting this (or trying to discuss) was what I see on a slave:

Retrieved_Gtid_Set: bbbbbbbb-7de3-11e3-b7c0-d89d67717f88:16848372-87008863
Executed_Gtid_Set: aaaaaaaa-7e9a-11e3-bc69-2c59e5371b18:23-106032611,
bbbbbbbb-7de3-11e3-b7c0-d89d67717f88:1-87008863,
cccccccc-b89c-11e3-b6ad-9cb654894de8:1-17141798

If you have more than one channel Executed_Gtid_Set is not a per channel value, so it is out of place.
It's also in the global variable gtid_executed which is the wrong place as this is not a setting but a "status value" (I filed a bug about this years ago) and that place isn't right either.

Use cases:
* seeing which upstream feed is generating more transactions (gtids) as any delayed feed will get the GTID event and then ignore it.
* having a way to see which ranges of transactions are being provided from a specific stream. This might be the case where you have replication from 2 different sources (with different data) and here you want to confirm which gtid values have been collected from each one. You have the retrieved set but it's cleaner to also see the applied set from that stream.

Does this answer your questions?