MySQL Bugs: #66260: Massive data loss when changing master

Bug #66260	Massive data loss when changing master_port in a master/master setup
Submitted:	8 Aug 2012 13:03	Modified:	14 May 2013 19:12
Reporter:	Matija Grabnar	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.5.27	OS:	Linux (Ubuntu 12.04)
Assigned to:		CPU Architecture:	Any
Tags:	dataloss, innodb, regression, replication

Description:
When a circular (master/master) replication is set up, changing the port of the second server causes MASSIVE data loss on the other master. In short, everything that was written by clients to the first master after the replication was set up, DISAPPEARS from the first master once the slave start command is issued.

Note, in the example below, only a script inserting data was running,
and this was a database in a virtual machine dedicated to this test, there is no 
chance of unexpected delete commands.

mysql> select count(*) from prva;
+----------+
| count(*) |
+----------+
|    12683 |
+----------+
1 row in set (0.01 sec)

mysql> stop slave;
Query OK, 0 rows affected (0.23 sec)

mysql> select count(*) from prva;
+----------+
| count(*) |
+----------+
|    12689 |
+----------+
1 row in set (0.01 sec)

mysql> change master to master_port=3369;
Query OK, 0 rows affected (0.23 sec)

mysql> select count(*) from prva;
+----------+
| count(*) |
+----------+
|    12706 |
+----------+
1 row in set (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.01 sec)

mysql> select count(*) from prva;
+----------+
| count(*) |
+----------+
|      155 |
+----------+
1 row in set (0.00 sec)

I have verified this on Ubuntu, first with one master at 5.1 and the other at 5.5, then with both masters on 5.5. That means that I have seen this problem now
three times, once on the production system, and twice with test systems that I have set up to find out what went wrong with the production system.

How to repeat:
Set up mysql on the first system, create /etc/mysql/conf.d/replication.cnf

[mysqld] 
server-id = 10
auto_increment_offset = 1
auto_increment_increment = 10

log_bin = mysql-bin
log-slave-updates
replicate-same-server-id = 0
binlog-ignore-db=mysql

replicate-do-db = testis 
replicate-do-db = testaltv

Create a table for testing:
CREATE TABLE `prva` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `stamp` datetime DEFAULT NULL,
  `local` varchar(32) DEFAULT NULL,
  `host` varchar(32) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

create a script that will write data to the file:

#!/usr/bin/perl -w
use strict;
use DBI;

my $dsn = "DBI:mysql:database=testis;host=localhost";
my $user = 'root';
my $password = '';
my $host = `hostname`; chomp $host;

my $dbh = DBI->connect($dsn,$user,$password);
my $sth = $dbh->prepare("insert into prva values (NULL,now(),?,?)");

while (1) {
  $sth->execute(scalar localtime(time),$host);
  sleep 1;
}

You can leave it running for some time, or not, it doesn't seem to matter.
Now, set up the second host, and establish master-slave replication between them.

Create a mysql server, put the following in /etc/mysql/conf.d/replication.cnf
[mysqld]
server-id=20
auto_increment_offset   = 2 
auto_increment_increment= 10

log-bin                 = mysql-bin

# Also log queries received from our master (for circular replication)
log-slave-updates
replicate-same-server-id        = 0

Transfer the database content to "prime" the slave: (id-20)

on id:10
create user repl@'192.168.1.5' identified by 'a';
grant replication slave to repl@'192.168.1.5';
mysqldump --opt --single-transaction --master-data --databases testis testaltv > /tmp/dump20120808.sql

on id:20
mysql> change master to master_host='192.168.1.3', master_user='repl', master_password='a';
mysql < dump20120808.sql
start slave;

Verify that the slave replication is running correctly. 

Leave it running for a while, long enough that you will see obvious difference between the table size when the replication was set up and when the bug occured.

Now, establish master/master replication between the two servers:
On id:20, do 
create user repl2@'192.168.1.3' identified by 'a';
grant replication slave to repl2@'192.168.1.3';
flush tables with read lock;
show master status;
unlock tables;

On id:10 do
change master to master_host='192.168.1.5', master_user='repl2', master_password='a',master_log_file='get it from show master',master_log_pos='get it from show master';
start slave;

Verify that the master/master replication is now working. You can start the write_to_db process on the id:20 now, and you should see the DB entries appear on the ID:10 side.

Again, leave the replication to run for a while, so the difference in the number of entries is obvious when it happens.

Now, say that you got orders from management to change the default port on id:20.

On id:20 edit /etc/mysql/my.cnf and change both port entries to 3369.

Restart mysql service on id:20. This will break the replication (it will still be proceeding 10->20, but not the other way around, because id:10 can not reconnect to id:20).

Now, on id:10, 
mysql> select count(*) from prva;
+----------+
| count(*) |
+----------+
|    12683 |
+----------+
1 row in set (0.01 sec)

mysql> stop slave;
Query OK, 0 rows affected (0.23 sec)

mysql> select count(*) from prva;
+----------+
| count(*) |
+----------+
|    12689 |
+----------+
1 row in set (0.01 sec)

mysql> change master to master_port=3369;
Query OK, 0 rows affected (0.23 sec)

mysql> select count(*) from prva;
+----------+
| count(*) |
+----------+
|    12706 |
+----------+
1 row in set (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.01 sec)

mysql> select count(*) from prva;
+----------+
| count(*) |
+----------+
|      155 |
+----------+
1 row in set (0.00 sec)

BOOM. Data written since replication was set up DISAPPEARS from server id:10. (Some?) of the data that was written to id:20 is now present in the id:10 database (it arrived via replication). Replication 20->10 is working, but replication 10->20 has likely stopped with a "duplicate key" error.

Thank you for the report.

Verified as described. Problem is not repeatable with version 5.1

seems like a documented scenario.  not that I agree with it....
http://dev.mysql.com/doc/refman/5.5/en/change-master-to.html

"If you specify the MASTER_HOST or MASTER_PORT option, the slave assumes that the master server is different from before (even if the option value is the same as its current value.) In this case, the old values for the master binary log file name and position are considered no longer applicable, so if you do not specify MASTER_LOG_FILE and MASTER_LOG_POS in the statement, MASTER_LOG_FILE='' and MASTER_LOG_POS=4 are silently appended to it."

OK, I can in some ways understand why changing the port would reset the master_log_pos and master_log_file values. However, I still don't understand from that why data written to the master would suddenly disappear. We lost a day's worth of data that was committed and written to disk when this happened on our production server. We recovered about 80% of the data from the backup (done with mysqldump) so we know that the data was there - but then it was gone. 

Note, the data that was deleted was NOT the data that came in through replication, it was exactly the opposite: the data that was written to the server by clients was gone.

It's as if when the server (id:10 in the above example) first became a master, a checkpoint was made, and when the change of port on id:20 was made, the id:10 server undid everything it did since it started logging for replication.

Even if I switched it to a completely new master, I would not have expected that to happen.

Analysis:
-----------
We are able to see the data loss, but this is not a bug.
This behaviour is expected for the following reason.

The table `t1` on `master2` is created using a dump file.
For `mysqldump` tool "--add-drop-table=On (by default)".
And the dump file contains a `drop table t1` followed by
`create table t1`. `master2` logs these DDL's in its binary
log file as if they are originated from `master2`(self).

When the change master is executed on `master1` with new
`master2`s port id, `master1` thinks that it is a new server
and it will append master_log_file='', and master_log_pos=4
(silently). Now `master1` will start applying current relay
log file starting from position 4. Since the above mentioned
DDLs are present in the current relay log file and they are
originated from master2 (due to the dump file usage),
`master1` will start applying these DDLs i.e., Drop and
(re)create. Hence we see the data loss i.e., we will see
only the data which has come through the dump + any
additional changes that happened directly on master2

Suggestions to avoid above problem:

1) While changing the master's port, if user knows that it
is the same server, specify the exact `master_log_pos` and
`master_log_file`.

2) On `master 1` specify "--skip-add-drop-table" option
while taking dump, so that DROP TABLE statement will not be
added before each CREATE TABLE statement

3) The lost data can be obtained from `master 2`
in circular replication.

4) Do not use dump file to create the initial setup on
`master2`, let it happen through normal master-slave
replication, which might be a time taking process.

5) If the dump is already taken without "--skip-add-drop-table",
set session binlog variable to OFF, before loading the dump
on master2.