Bug #77673 Option "replicate_do_db" does not cause "create table" to replicate ('row' log)
Submitted: 9 Jul 2015 18:30 Modified: 16 Nov 2017 7:49
Reporter: Jörg Brühe (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1.73, 5.5.44, 5.6.24, 5.6.27, 5.5.46, 5.7.7-rc OS:Any
Assigned to: CPU Architecture:Any
Tags: binlog format, CREATE TABLE, replication

[9 Jul 2015 18:30] Jörg Brühe
Description:
With binlog_format set to 'row', setting option "replicate_do_db = tsp" is not sufficient to let the slave replicate a "create table tsp.tab3".

My code is (intentionally) running without any "use" settings, all table names are fully qualified by the database name.

In such a setup, I expect all statements that affect any table in DB "tsp" to be replicated, but "create table" isn't.
A subsequent "insert", however, is replicated, and of course replication then breaks due to a non-existant table.

I reproduced this on the current versions of all GA series: 5.1.73, 5.5.44, and 5.6.24.

Yes, I did read the manual, and I am aware of the different behavior of the "replicate_do_*" options depending on the binlog format.

How to repeat:
Empty instances on master and slave.
Both instances are in sync, replication is prepared.
Binlog format is "row".

Additional config options on both (matter only on slave):
    [mysqld]
    replicate-do-db  =  tsp

Statements on master:
    set session autocommit = 0 ;
    create database avm ;
    create table avm.tab1 (id int auto_increment,
        wann timestamp,
        was varchar(50),
        primary key (id) ) ;
    insert into avm.tab1 (was) values ("Ach was muss man oft von bösen") ;
    insert into avm.tab1 (was) values ("Buben hören oder lesen") ;
    commit ;
    create database tsp ;
    create table tsp.tab3 (id int auto_increment,
        wann timestamp,
        was varchar(50),
        primary key (id) ) ;
    insert into tsp.tab3 (was) values ("Dieses ist der erste Streich") ;
    insert into tsp.tab3 (was) values ("Doch der zweite folgt sogleich") ;
    commit ;
(No "use", database is always given explicitly!)

Result on master:
- All 2 databases are created,
- all 2 tables are created, each has 2 rows.

Result on slave:
- Databases are created correctly ("tsp" exists, "avm" doesn't).
- No table is created ("tsp.tab3" is missing).
- Replication is broken, error message (5.5 + 5.6):
      Error executing row event: 'Table 'tsp.tab3' doesn't exist'
  In 5.1:
      Error 'Table 'tsp.tab3' doesn't exist' on opening tables

Suggested fix:
I consider "create table tsp.tab3" to affect database "tsp", so I expect it to be replicated (given "replicate-do-db = tsp") by row logging, like the preceding "create database tsp" is.

Also, I could not spot any test in the test suite that really covers this option. Yes, it is mentioned in a few ".opt" files, but I could not see them being really used. Please check this, and ensure the option is covered.
[10 Jul 2015 9:49] Umesh Shastry
Hello Jörg Brühe,

Thank you for the report.
Observed this on 5.6.27/5.5.46 builds with the provided test case.

Thanks,
Umesh
[13 Jul 2015 13:47] Jörg Brühe
I added more test cases:
The same problem affects "truncate table" and "drop table".

This is my test case (extending the original one, listed in "How to repeat", which ended with broken replication):

Fix it by running on the slave:
    create table tsp.tab3 (id int auto_increment,
        wann timestamp,
        was varchar(50),
        primary key (id) ) ;
    commit ; -- no effect for DDL
    start slave ; 

    Status on slave:
    - "tsp.tab3" now exists, replication resumed, the table has 2 rows.

Statements on master:
    truncate table tsp.tab3 ;

    Status on master:
    - Table "tsp.tab3" is empty.

    Status on slave:
    - Table "tsp.tab3" still has 2 rows.

Statements on master: 
    drop table tsp.tab3 ;

    Status on master:
    - Table "tsp.tab3" does not exist.

    Status on slave:
    - Table "tsp.tab3" still exists with 2 rows.
[13 Jul 2015 18:17] Jörg Brühe
Same behaviour in 5.7.7-rc.
[4 Nov 2015 20:18] Nikos Vogdanos
hi
Server version: 5.6.27-log MySQL Community Server (GPL)
slightly different test case with the same problem

binlog-format=ROW
binlog-do-db=otrs

use mysql
create table otrs.whatever

create statement not in binlog
subsequent dml are included

i suppose it has to do with those 2 statements
1) 
DDL statements such as CREATE TABLE and ALTER TABLE are always logged as statements
2)
Statement-based logging. Only those statements are written to the binary log where the default database (that is, the one selected by USE) is db_name
[20 Jul 2017 16:38] Margaret Fisher
Posted by developer:
 
Documentation change made on advice of development to clarify the expected behavior and requirements.
In https://dev.mysql.com/doc/refman/5.7/en/replication-rules-db-options.html (and other releases):
Previous text -
With statement-based replication, the default database is checked for a match. With row-based replication, the database where data is to be changed is the database that is checked. 
Changed to -
The database that is checked for a match depends on the binary log format of the statement that is being handled. If the statement has been logged using the row format, the database where data is to be changed is the database that is checked. If the statement has been logged using the statement format, the default database (specified with a USE statement) is the database that is checked.    
Note: Only DML statements can be logged using the row format. DDL statements are always logged as statements, even when  BINLOG_FORMAT = ROW. All DDL statements are therefore always filtered according to the rules for statement-based replication. This means that you must select the default database explicitly with a USE statement in order for a DDL statement to be applied.
[20 Jul 2017 20:16] Jörg Brühe
I'm happy to see activity here, after so long a time.
But I can't simply accept the above note, because it doesn't fully explain what happened:

The DDL statements (which get logged in statement format) include not only "create table" but also "create database".
In my test, the "create database tsp" was applied on the slave.

Why did this happen? Why was "create database tsp" accepted as matching the "replicate_do_db", but "create table tsp.tab3" was not?
Both are DDL statements affecting a database which was not the current one (due to the total absence of "use" commands).
[4 Aug 2017 9:45] Margaret Fisher
Posted by developer:
 
Sorry, I didn't get a notification about the further comment because the bug is closed. Please could you raise a new bug specifically about the CREATE DATABASE behavior? That will make it possible to handle the issue.
There is a paragraph relating to this in 
https://dev.mysql.com/doc/refman/5.7/en/replication-rules.html
For statements affecting databases only (that is, CREATE DATABASE, DROP DATABASE, and ALTER DATABASE), database-level options always take precedence over any --replicate-wild-do-table options. In other words, for such statements, --replicate-wild-do-table options are checked if and only if there are no database-level options that apply. This is a change in behavior from previous versions of MySQL, where the statement CREATE DATABASE dbx was not replicated if the slave had been started with --replicate-do-db=dbx --replicate-wild-do-table=db%.t1. (Bug #46110)
- I wonder if the scope of that bug fix was wider than it appears in the discussion. I see that the description for the replicate-do-db option at
https://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#option_mysqld_repli...
does not talk about the situation when the database has not yet been created, and I agree that information should be added.
[11 Aug 2017 14:21] Jörg Brühe
I have let this one slip a bit, dealing with other issues.

In fact, I had found a fix in the meantime:
Replace "replicate_do_db = tsp"
by      "replicate_wild_do_table = tsp.%"
With this setting, not only "create database tsp" will be replicated on the slave but also "create table tsp.NAME".
Given that both are DDL statements and so must be replicated in statement format, I am not aware of any explanation in the manual that would explain it.

I speculate that for a "...wild..." parameter the system cannot do a string comparison (with current default database, which should be the empty string in my test) but rather must apply a pattern match, for which it might combine the actual database and the table name.

I have verified that this works in 5.5.44, 5.6.24, and 5.7.17.
[21 Aug 2017 14:23] Margaret Fisher
Posted by developer:
 
Thanks for the update. Do you want me to investigate this further with development and find an explanation to add to the manuals? If so it would be best for you to raise a new bug. Or shall we close off now that you have a workaround?
[25 Aug 2017 14:48] Jörg Brühe
Thanks for the offer!
I think I will let it stand as it is now - the issues are written here, so a search in the bug database will find them.

For me personally, there is no urgent need, and our customer (where this all arose) now uses "replicate_wild_..." to achieve the desired effect.

I think there is a sufficient number of bugs with a higher return if they receive attention, let me just mention 3 of my 7 open ones:

https://bugs.mysql.com/bug.php?id=79597
This really becomes embarrassing with the new utf8mb4 default.

https://bugs.mysql.com/bug.php?id=80431
General tablespaces in 8.0 should have this feature.

https://bugs.mysql.com/bug.php?id=83777 = 79828 = 85326
mysqldumpslow is unusable, this forces users to the Percona toolkit.
[16 Nov 2017 7:49] Erlend Dahl
Re-closing based on the latest comments from the filer.