Bug #39197 replication breaks with large load with InnoDB, flush logs, and slave stop/start
Submitted: 2 Sep 2008 18:20 Modified: 27 Nov 2008 22:30
Reporter: Giuseppe Maxia Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.1.26,5.1.28, 5.0.67 OS:Any (Mac OSX and Linux)
Assigned to: CPU Architecture:Any
Tags: regression, replication.flush, stop slave

[2 Sep 2008 18:20] Giuseppe Maxia
Replication fails when loading a large table into InnoDB, and the following events happen:
* one slave stops;
* the master flushes logs;
* the slave restarts.

Replication fails with a duplicate key insert error. 
It only happens with InnoDB. The same operation with MyISAM is performed without side effects.

How to repeat:
Using MySQL Sandbox (http://launchpad.net/mysql-sandbox) and MySQL 5.1.26 or 5.1.28 tarball, create a replication system:

./make_replication_sandbox /path/to/tarball/mysql-5.1.26-rc-OSSPEC.tar.gz

download the sample employees database (http://launchpad.net/test-db)
and install it in some directory

start the loading
$ cd employees_db
$ $HOME/sandboxes/rsandbox_5_1_26/m -t < employees.sql

From a different terminal, when the output shows that the server is loading the "titles" or "salaries" tables, do the following:

$ cd $HOME/sandboxes/rsandbox_5_1_26
$ ./check_slaves
#it should show all "yes"
$ ./s2 -e "stop slave"
$ ./check_slaves
#it should show two "yes" and two "no"
$ ./m -e "flush logs"
$ sleep 1
$ ./s2 -e "start slave"

$ ./check_slaves
#it should show all "yes" but it shows one "no" as SQL thread for slave 2

$ ./s2 -e "show slave status\G"
[3 Sep 2008 7:29] Sveta Smirnova
Thank you for the report.

Verified as described.
[3 Sep 2008 7:41] Sveta Smirnova
Probably related with bug #34582
[3 Sep 2008 8:34] Sveta Smirnova
To repeat I had to run ./check_slaves; ./s2 -e "stop slave"; ./check_slaves; ./m -e "flush logs";  ./s2 -e "start slave"; ./check_slaves several times
[3 Sep 2008 9:03] Giuseppe Maxia
Yes. It may depend on the speed of the server.
After four times that the test repeated as described, I also found a case when I had to repeat the sequence twice.
[19 Sep 2008 8:44] Sveta Smirnova
I was not able to repeat this bug with version 5.0.67
[19 Sep 2008 8:49] Sveta Smirnova
Seems I was hurry: I was able to repeat failure with 5.0.67, but needed to run test more times than with 5.1
[27 Nov 2008 21:05] Andrei Elkin
Sveta, Giuseppe, 

I need your help, I could not reproduce this failure.
I  tested innodb master -> innodb slave, myisam master -> myisam slave with 
mixed and row binlog format: no problem.

I managed to stop the slave with the dup key error although, namely when
the slave's side tables are of myisam type but on the master table of innodb type and with either binlog format.
In such a case it's normal that the slave stops, because
the binlog file contains transactions denoted explicitly with BEGIN/COMMIT. Interrupted slave tries to restart the last transaction it was executing and it naturally fails as the myisam tables hold all data that were load since the last
begin till the last insert (or Write_rows_log) event.

Notice, the script employees.sql does not specify what is the engine for the slave side. I wonder if you had it effectively of myisam type.

Could you please split employees.sql into 2 ddl and dml parts, 
and check what are the table type upon creation on the slave side and only after
that start populating with the dml part.
Please verify with mixed and row binlog format  three combination:

  innodb master -> innodb slave;
  myisam master -> myisam slave;
  innodb master -> myisam slave;

The slave's stop with the last option is not a failure as has been explained.
If you will manage to break replication with the first two options it will prove
there is a bug. Then please leave show slave status and show create table reports.


[27 Nov 2008 22:30] Sveta Smirnova

thank you for pointing this. Problem is exactly last case which is not a bug. I could not repeat with both InnoDB and both MyISAM.
[28 Nov 2008 12:05] Giuseppe Maxia
Thanks, Sveta and Andrei.
This completely escaped my attention. The "SET" command is not replicated. I wrote a blog post about this issue.
[28 Nov 2008 15:32] Mark Callaghan
What you have described is a bug, not a feature:
[28 Nov 2008 17:56] Andrei Elkin

It smells as a bug, even as a rat :-), I agree.