Bug #43457 replicate-ignore-db behaves differently with different binlog formats
Submitted: 6 Mar 2009 14:24 Modified: 11 Apr 2018 13:22
Reporter: Luis Soares Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.1, 6.0 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: binlog format, filter, replicate-ignore-db, replication

[6 Mar 2009 14:24] Luis Soares
Description:
When setting a filter for replication using --replicate-ignore-db the filtering process exhibits different behavior for the different binlog formats. The following is an example of how it can differ (high level description - EXEC => executed, !EXEC => not executed):

EXAMPLE
=======

Create "db1" and "db2" on both master and slave (both dbs containing table: "t1").

* ON SLAVE, start with:
--replica-ignore-db=db1

* ON MASTER:
use db1;

binlog_format=stmt;
  insert into t1;             !EXEC
  insert into db2.t1;         !EXEC
  insert into db2.t1 UUID;    !EXEC

binlog_format=mixed;
  insert into t1;             !EXEC
  insert into db2.t1;         !EXEC
  insert into db2.t1 UUID;    EXEC

binlog_format=row;
  insert into t1              !EXEC
  insert into db2.t1          EXEC
  insert into db2.t1 UUID;    EXEC

===========

In the example above, db1 is always filtered out. However, db2 is filtered out in:
  * statement format => always;
  * mixed mode       => only when not using unsafe queries;
  * row format       => it does not get filtered out at all.

I checked in 6.0, but this most likely also exists in 5.1.

How to repeat:
--source include/master-slave.inc
--source include/have_binlog_format_statement.inc

--disable_query_log
connection master;

SET SQL_LOG_BIN=0;
create database db1;
create database db2;

create table db1.t1 (a text);
create table db2.t1 (a text);
SET SQL_LOG_BIN=1;

connection slave;

create database db1;
create database db2;

create table db1.t1 (a text);
create table db2.t1 (a text);

connection master;

use db1;

set binlog_format=statement;
insert into t1 values ('t1.statement');
insert into db2.t1 values ('db2.t1.statement');
insert into db2.t1 values (UUID());

sync_slave_with_master;
--enable_query_log
--echo ######### BINLOG FORMAT: STATEMENT #########
select * from db1.t1;
select * from db2.t1;
--echo #########  #########
--disable_query_log
delete from db2.t1;
connection master;

set binlog_format=mixed;
insert into t1 values ('t1.mixed');
insert into db2.t1 values ('db2.t1.mixed');
insert into db2.t1 values (UUID());
sync_slave_with_master;
--enable_query_log
--echo ######### BINLOG FORMAT: MIXED #########
select * from db1.t1;
select * from db2.t1;
--echo #########  #########
--disable_query_log
delete from db2.t1;
connection master;

set binlog_format=row;
insert into t1 values ('t1.row');
insert into db2.t1 values ('db2.t1.row');
insert into db2.t1 values (UUID());
sync_slave_with_master;
--enable_query_log
--echo ######### BINLOG FORMAT: ROW #########
select * from db1.t1;
select * from db2.t1;
--echo #########  #########
--disable_query_log
delete from db2.t1;

exit;

Suggested fix:
Make behavior consistent among all binlog formats.
[6 Mar 2009 21:27] Sveta Smirnova
See bug #43296 also.
[13 Mar 2009 7:20] Sveta Smirnova
Thank you for the report.

Verified as described.
[30 Mar 2009 10:09] Mats Kindahl
Statement-based replication currently uses the current database when filtering because for statements like

  update table db1.t1, db2.t1 set db1.t1.a = db2.t1.a where db1.t1.a = 1;

it is very hard to analyze if this statement should be filtered out if db1 (or db2) is filtered out.

However, for filtering on the slave side, it might be possible to do something along these lines:

- Introduce a "table dummy" on the slave side where tables can be opened as
  dummies. These tables should behave as blackhole, but should not have to be
  explicitly created.

- With each statement, attach a list of tables and if they are read or written.
  This information is already available, but is not written to the binary log.

- On the slave, substitute table dummies for all tables *written* by the
  statement where the tables are filtered out either because they are in a
  database that is filtered out, or because they are explicitly filtered out.

- Execute the statement. This will throw away anything written to the dummy
  tables, but allow the statement to be executed normally.

Note that it is not possible to substitute dummies for tables that are filtered out and *read*, since blackhole will return an empty result in this case, leading to inconsistencies between master and slave. If any tables that is read is filtered out, the statement cannot be replicated properly and an error should be thrown.