MySQL Bugs: #21146: Replication Filtering before sending on the network

Bug #21146	Replication Filtering before sending on the network
Submitted:	19 Jul 2006 12:53	Modified:	10 Oct 2008 11:00
Reporter:	Paul Lemay	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S4 (Feature request)
Version:	5.0, 5.1, 6.0	OS:	Linux (Linux)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
Currently, the fundamental replication rules is:

"If a master server does not write a statement to its binary log, the statement is not replicated. If the server does log the statement, the statement is sent to all slaves and each slave determines whether to execute it or ignore it."

Is there not a opportunity to let the master write in its binary log but allow filtering in the master by the connected slave thread before sending on the network?

How to repeat:
This is not a bug but a feature request.

Suggested fix:
Define a new set of parameters to allow filtering of requests in the master event if the master wrote the requests in its binary logs. The purpose is to save transmission on hte network for request that would be filtered by the slave anyway.

Thank you for a feature request. In this case master will be overloaded - it will have to process each and every entry. I think, it is better to put some more load on (several) slaves than on single master. So, I am not sure that this feature will be really useful.

Thanks for your fast answer. 

Let me better explain the problem. The topology is 10 replica nodes plus, 1 master and 1 standby. The full database is replicated on the standby but it is only a subset of the database that need to be replicated on the 10 replica nodes. Now, there is an application running on the replica nodes that needs to update the master database at a very high frequency. Those updates are not needed on the replica nodes but they are in the standby node. 

Therefore what I see is that an update, sent at a very high frequency, will be transmitted over that network to 11 mysql servers but only 1 really need it. The others will filter them. The whole system must therefore incur the cost of 10 network transmission in addition to have each replica apply a filter on these queries.

Therefore, if I understand well your answer, I would need to demonstrate that it it is less costly, in terms of performance for the master server, to filter the 10 queries than it is to transfer them over the network to 10 replica nodes?

Many thanks for writing a feature request. We will discuss this.

This is a great feature that we can introduce in the next releases.
However, while this feature is not available the following architecture might circumvent the issue:

                          ------------     ------------
                            master 1   -->   standby
                          ------------     ------------
                               |
                          ------------
                             islave
                          ------------
                          /  |  |  \
                         S1  S2 S3 Sn

Filters would be placed in an intermediate slave (i.e. islave) that might
use the blackhole engine to avoid writing information to disk. In other
words, an intermediate binlog would be created but there would be no writes to
an intermediate database in order to avoid performance problems.

Note that this solution is better in terms of performance when the number
of slaves is high. Filters in the master would be very useful when different
slaves wanted different data.

Hello!

The solution with intermediate slave is working, but it's *not that easy* to set up, so opitions for binlog filtering on master's side would be much helpful and dramatically speed up setting up such scenario.

PS
Another situation when this options would be helpful:
replication is used to send data to *untrusted* slave - I just cannot send whole binlog, because it can contain some private data.

See also: BUG#2917, BUG#41267, WL#2387 and WL#1049.