Bug #108637 Feature: a passthrough mode in async replication with filtering
Submitted: 28 Sep 2022 18:23 Modified: 29 Sep 2022 5:50
Reporter: Shawn Green Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S4 (Feature request)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[28 Sep 2022 18:23] Shawn Green
Description:
Using multi-source replication, it is possible to setup a "mesh" topology.  This is where each node is both a source to and replica of every other node. If this were just two nodes, we would call it a "loop" so a true mesh will need at least 3 nodes. 

There are valid business reasons why some changes to some tables only need to replicate between subsets of nodes within this mesh.  This requires the use of replication filters on individual channels. 

When async replication filters out an event using GTID tracking, it will need to remember that it has "seen" and handled this event in the order it was meant to be processed. Any downstream replicas of this node also need to know that there was an event with that GTID and that it was handled by "doing nothing" so there is an empty transaction written into the Binary Log of the handling replica for this event (all by design and works well for chains and trees).  

However, in a mesh topology, the same event can take different routes to get to the same replica.  Had this event replicated directly from the original source to that replica, the filters were not configured to ignore that update and the event would have been applied. 

For this, I propose that there could be another per-channel option that controls whether or not filtered events are logged "as they were" when the filtering replica saw them or to do as we do now and record an empty transaction.  This passthrough mode would have allowed the downstream replica to have received the complete event even though an intermediate source had a filter in place to not process that event. 

How to repeat:
A full mesh is not required for reproduction. We just need 3 nodes A, B, and C Where A is a source to both B and C and B is also a source to C. 

There needs to be a filter on the A-B channel that would exclude some kind of modification. There is no filter on the A-C channel that would also exclude the same modification (one of the --replicate-% types would work for this)

Temporarily suspend the A-C replication channel. 

Make a modification on A that B would not execute due to the filtering condition. Allow that to replicate through A-B, get filtered, then on down the B-C link to C.  Notice the change in GTID_EXECUTED has happened on C. 

Resume the A-C channel. 

When C sees this event coming down the line from A, it should apply this event (according to the lack of any filters that may prevent the operation). But, it cannot because it has already managed that GTID value as an empty transaction. 

This allows for data drift in the situation where certain legs in the mesh are unavailable due to fault or maintenance. Had C received the event directly from A, the change would have been applied but that leg of the topology was down (due to fault or maintenance or weather or ... ) 

Suggested fix:
Create an option that would allow a filtering replica to NOT replace an event with an empty transaction but to simply relay the filtered event directly into its own Binary Log.  

This passthrough mode allows the middle node in the "longer path" to remain intentionally different but it does not break the "whole mesh" replication because its downstream replica would still have the choice to filter this event or not.  It allows the same event to take either path to the node where it should be applied.
[29 Sep 2022 5:50] MySQL Verification Team
Hello Shawn,

Thank you for the feature request!

regards,
Umesh