MySQL Bugs: #100190: Allow cleanup of gtid_executed, gtid

Bug #100190	Allow cleanup of gtid_executed, gtid_purged on a master, live
Submitted:	12 Jul 2020 7:13	Modified:	13 Jul 2020 5:35
Reporter:	Shlomi Noach (OCA)	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7, 8.0	OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	GTID, Replicaiton

Description:
gtid_executed + gtid_purged is an ever growing set to the point of bloating. While gtid_executed can be cleared on a replica via RESET MASTER, we pay the price by bloating gtid_purged. On a running master, there is no way to clean up the two. 

NOTE: This is a resubmission of http://bugs.mysql.com/bug.php?id=92836, which was closed for reasons unclear, and not reopened after coimmunity feedback. In this Bug I only illustrate the second of the suggested solutions in #92836. See https://bugs.mysql.com/bug.php?id=100189 for the first.

To emphasize what "bloat" means, this is an actual gtid_executed in a semi-production replication cluster (production environment, does not serve production traffic):

0025f4dd-7e8c-11e8-b6c9-008cfa5440e4:1-19242,
02dc9383-81be-11e8-a260-008cfa542442:1-5430702,
02ffa8d5-bbde-11e8-8e44-008cfa5440e4:1-58216,
04185125-c7b2-11e8-bc02-008cfa5440e4:1-63916,
04772ac2-93d5-11e8-aa4d-008cfa5440e4:1-54287,
...
dcda46d1-af55-11e8-aaa4-008cfa544064:1-69818,
dd720ef9-bf06-11e8-a88c-008cfa544064:1-6412662,
dde695fa-802f-11e8-be08-008cfa544064:1-63516,
de0f5d84-c482-11e8-8644-008cfa544064:1-71472,
de315cd9-d1e2-11e8-b10a-008cfa542442:1-2188,
de358dcf-ba4f-11e8-a4e8-008cfa544064:1-70939,
de983268-b598-11e8-a97f-008cfa544064:1-72787,
ded794e2-85ab-11e8-9b4c-008cfa5440e4:1-5310105,
e0a3502b-8f09-11e8-acb1-008cfa5440e4:1-643447,
e0d4ec6a-b0e1-11e8-aa5a-008cfa544064:1-71876,
e17bfcb9-b65d-11e8-b0b1-008cfa542442:1-71357,
e1a5fb6c-80f4-11e8-a561-008cfa5440e4:1-5510403,
e1c19e2e-bb14-11e8-97e9-008cfa5440e4:1-65282,
e1ec256a-8737-11e8-920e-008cfa544064:1-9417,
e1f7afac-7e8e-11e8-82fd-008cfa544064:1-64277,
e28f7a9f-c2f8-11e8-9c7c-008cfa5440e4:1-68107,
e312c2f6-c3bd-11e8-a556-008cfa542442:1-75526,
e35ef5a2-a773-11e8-bb5e-008cfa544064:1-58106,
e3bbc392-be41-11e8-b4ce-008cfa542442:1-6581231,
e3d06800-b40c-11e8-9fcf-008cfa542442:1-75904,
e403135f-88cc-11e8-957d-008cfa544064:1-39000,
e5e8e412-b4d3-11e8-96f6-008cfa544064:1-6442128,
e6674fd1-b01c-11e8-87b5-008cfa544064:1-6343336,
e66f6261-8a60-11e8-9232-008cfa544064:1-66867,
e67f443e-d11d-11e8-8d42-008cfa5440e4:1-65974,
e6a1d302-c549-11e8-bbc8-008cfa544064:1-68866,
e6c42d8f-8345-11e8-899b-008cfa544064:1-64489,
e71e9ab3-955c-11e8-9853-008cfa542442:1-63898940,
e73c3486-cc66-11e8-9a2d-008cfa544064:1-20455,
e7a40c37-802a-11e8-b8a7-008cfa5440e4:1-66008,
e89247a8-8b29-11e8-8bd9-008cfa5440e4:1-48384,
e8d63214-c7af-11e8-98ad-008cfa542442:1-70420,
e925a59c-a838-11e8-804a-008cfa544064:1-73481,
e9f19b55-bbdb-11e8-9eff-008cfa544064:1-72214,
ea016138-ac2a-11e8-8ffc-008cfa542442:1-70463,
eab84da9-81bb-11e8-901f-008cfa5440e4:1-64169,
ebea70a2-8280-11e8-baab-008cfa542442:1-62500,
ec088b38-adb6-11e8-bd96-008cfa542442:1-75788,
ec33ed21-a69a-11e8-806a-008cfa544064:1-54975,
ecb19710-ca00-11e8-9ddb-008cfa542442:1-72558,
ecdc7efa-93d2-11e8-bc91-008cfa544064:1-6453202,
ed0ea64c-b98a-11e8-b3b4-008cfa542442:1-6166922,
ed9079fc-b26d-11e8-a09f-008cfa5440e4:1-67948,
eec2f336-bfcd-11e8-9598-008cfa5440e4:1-69904,
f017ac88-c092-11e8-8d1a-008cfa542442:1-69700,
f04eaa36-ab65-11e8-96b7-008cfa542442:1-74351,
f0ea060e-a8ff-11e8-a722-008cfa5440e4:1-60803,
f0fd1daa-80f6-11e8-96e1-008cfa544064:1-66326,
f18f4e3f-b65f-11e8-ab64-008cfa5440e4:1-70044,
f245be9a-d039-11e8-bbde-008cfa542442:1-69809,
f25b83f2-c876-11e8-8b4d-008cfa5440e4:1-68992,
f2bc40f0-8672-11e8-a529-008cfa5440e4:1-9356,
f41a0596-a83a-11e8-8b36-008cfa542442:1-64949,
f42f53b7-c484-11e8-a66b-008cfa542442:1-72794,
f4927af1-a6ae-11e8-aa15-008cfa542442:1-249137,
f4b24428-c3bf-11e8-89f2-008cfa5440e4:1-6173019,
f4d1f6c1-cc68-11e8-a145-008cfa542442:1-1954220,
f50652f5-c93b-11e8-aee6-008cfa5440e4:1-65821,
f539e7cd-bf08-11e8-a732-008cfa5440e4:1-71385,
f59c119b-d11f-11e8-9927-008cfa544064:1-69835,
f7af77a9-ba51-11e8-9996-008cfa5440e4:1-6159825,
f7b7e287-a923-11e8-8893-008cfa542442:1-58453,
f823eb7e-bb16-11e8-8c90-008cfa542442:1-72813,
f82a6c91-b59a-11e8-a16f-008cfa5440e4:1-66282,
f83fc50d-8e3b-11e8-a3f8-008cfa542442:1-45447,
f922b2cf-c2fa-11e8-bf3b-008cfa542442:1-55629,
f9898d0b-c159-11e8-b974-008cfa542442:1-68487,
fa2c987d-ac2c-11e8-a9a1-008cfa544064:1-74559,
fa77ec3c-b1a8-11e8-b6ca-008cfa544064:1-70835,
faeced35-8739-11e8-9771-008cfa5440e4:1-9344,
fafde344-b0e3-11e8-bac0-008cfa5440e4:1-70828,
fb2d3df3-8a64-11e8-99a9-008cfa5440e4:1-62134,
fb6d7cc7-8282-11e8-8f48-008cfa544064:1-68014,
fced661b-5cf0-11e8-9b43-008cfa544064:1-34661946,
fd11ec6e-acf1-11e8-992c-008cfa5440e4:1-73512,
fd2a8a29-bca2-11e8-8406-008cfa544064:1-73243,
fe156d76-be43-11e8-958b-008cfa544064:1-72716,
fe804eb5-8fde-11e8-a2f4-008cfa5440e4:1-1059033,
fe8f6bb1-ca02-11e8-a059-008cfa544064:1-73859,
febed7be-c610-11e8-b337-008cfa544064:1-71551,
fef321ba-b4d5-11e8-9ed1-008cfa542442:1-72082,
ffbe008f-85ad-11e8-8f11-008cfa542442:1-62782,
ffefe288-b01e-11e8-b852-008cfa5440e4:1-70914,
fff4e4f9-8347-11e8-a135-008cfa542442:1-67163

There is total 561 lines and I've in fact had to remove some so as to be able to submit this report.

gtid_purged is similarly bloated.

Operations such as SHOW MASTER STATUS, SHOW SLAVE STATUS yield with hundreds of lines of text. We suspect there may be a scenario where gtid_executed overflows some text capacity.

The vast majority of UUID entries in the above belong to servers no longer in existence. Servers that used to be masters, failed over and got reprovisioned. This kind of operation is increasingly popular with kubernetes deployments and I suspect other users are similarly impacted, and certainly more will come and with greater urgency.

How to repeat:
Keep failing over your master, reprovision it with a new UUID, and gitd_executed will grow.

Suggested fix:
Requesting to add functionality to allow a DBA to prune gtid_executed and gtid_purged.

Add a GTID_FORGET_UUID('<uuid>'), such that  `SELECT GTID_FORGET("fafde344-b0e3-11e8-bac0-008cfa5440e4")` would delete any entries for given UUID and range need not be provided.

It will be the user's responsibility and risk to only purge actual stale information, such as UUIDs of servers no longer in existence, or ranges known to have been purged by all servers.

In the spirit of other similar functions, I'd suggest PURGE GTID_UUID as the command.

From a larger scale operations POV, it's fairly trivial for me to find the UUIDs of all active severs in my topology and then query the current leader to find the UUIDs that do not exist in my topology. From there, it's just a matter of running PURGE GTID_UUID from the difference of the two sets.

As per the original bug report I have come across this quite a lot.

I think it is good to provide some requirements on what we mean when we say we want a "cleanup" as otherwise any implementation could be surprising.

My take on this would be
* The purging of a UUID:<range(s)> should be done for a one or more UUIDs at a time in a single command and imply removing all the ranges for that/those UUIDs. Given I have seen hundreds of UUIDs in my servers' gtid_executed variable it makes sense to be able to purge more than one UUID at a time.
* This command should itself trigger a GTID event as this allows us to track the "exact position/moment in time" the change happened. After this event the purged UUIDs no longer exist, but before they DO. The server MUST be aware of this in order to handle slaves correctly.
* The command SHOULD replicate downstream to all slaves. What that means in a situation like Group replication I'm not 100% sure, but suspect that the group will together accept the "event" like any other so it will be pushed out to all members at the same time.
* binlogs SHOULD NOT be removed / purged as they contain information which is crucial to the normal workflow in master/slave relationships and without binlogs any slave that is behind will be unable to connect to its master and continue replicating
* a slave that connects to the master and during the exchange with the master indicates it knows the UUID(s) that have been purged is indicating that it wants to connect from a moment in time PRIOR to the purging of those UUIDs. This SHOULD be possible and the master SHOULD be able to synchronise with the slave at the given position and provide the subsequent binlog events (including the purge event which was written later, and which the slave will process "normally"). If the slave indicates that it only knows events which look like they are after the "purge moment" then the master should also be able to server the binlogs from the moment requested. If the master is unable to determine the exact moment in the binlogs then it SHOULD provide an error.
* when binlogs containing the "purge events" are not longer held on the server there is no need for the server to have and specific knowledge of them and to all intents and purposes the only events the server needs to be aware of is for all UUIDs less those that have been purged.
* I suspect that it is not a good idea to try to purge a UUID if this has already been done, there are binlogs prior to the purge still containing this UUID. This is because for slaves connecting with a delay it may not be possible to uniquely determine where to start streaming binlogs as the location from which to do that may not be unique. That is UUIDs are supposed to be unique so re-using them if you have tried to remove them is going to cause issues.
* It SHOULD be possible to execute the "PURGE GTID_UUIDS" command more than once for different UUIDs at different points in time so the master may be aware of 0..n such "events" and be able to serve the binlog stream along the whole set of binlogs to slaves that connect at any point in time.
* It probably DOES NOT make sense to PURGE GTID_UUIDS for a master's own UUID, or for a group replication cluster's UUID and it might be appropriate to not allow this as it's likely to cause problems. However, this restriction should NOT be applied if these commands are being received via replication as the receiving "slave" may once have been a master and it is quite valid to want to potentially remove that "old master's" UUID:<range>.

I hope these thoughts make sense. I've been meaning to write up a more formal specification and while the above is not formal I think it pretty much identifies some of the things we have to think about and take into account.

Strong agree to Simon's assessment that this should be a binlog event. That is, we'd run the function/command on the master ("source" in new terminology), and this would be written to the binary log, to be applied by replicas. It's imperative that this is the case, so that lagging replicas, or replicas that happen to be offline, or replicas in process of restoring from backup would be able to pick it up.

Hi,

Bug #92836 is verified and pushed up the chain to the devs for fixing. Not sure how come the status of the #92836 remained the "need feedback" but the important thing is it's being triaged by the replication team.

I'm setting this one as a duplicate of #92836 as there really is no point of having the same discussion in three places, I hope you agree.

Kind regards
Bogdan

p.s. I'll see if we can push for some updates on the progress as I do understand how annoying this bug must be

> it's being triaged by the replication team.

Thank you. Can you please clarify:

- Where should we continue the discussion?
- Whether the solution is going to be binlog event based (desired) or other (undesired and unuseful). I have no visibility into your thought process and whether suggested solutions are considered. What I fear is, that with lack of discussion, the team will come up with a solution that does not solve the problem in real production systems, so visibility is appreciated.

Thank you!

Is there also an interaction here with the now undependable show (slave|replica) status?  

We will often do cascading replication to perform online migrations.  So you have a primary and replica set, and then you spin up a new primary with relay logging turned on, and replicas for it.

Thus, the newest replicas have 3 GTID sets (or more, depending on how many times you've done this.)  One of those GTID sets will never, ever, see transactions again because the old server has been destroyed.

It appears that since seconds-behind uses these gtid sets in its calculation, a SHOW SLAVE STATUS will alternate between the slave-lag of the "real" GTID set that is replicating, and the one that has not seen writes in months or years.  0 -> 14 -> 0 -> 0 -> 0 -> 16 -> 0 -> 0 -> 0

Turning on multi-threaded slave writes seems to fix this problem, but if somebody is not paying attention in their monitoring "Seconds_Behind_Master: 0" comes up often enough that you can't use it for monitoring any more.  But it looks like letting us operationally mark a gtid set in gtid_purged as "gone forever!" fixes both the display problem as well as a naive fix for seconds_behind.