MySQL Bugs: #76587: Lost events on MIXED based replication

Bug #76587	Lost events on MIXED based replication
Submitted:	3 Apr 2015 12:16	Modified:	8 May 2015 5:37
Reporter:	Vaclav Bilek	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Utilities: Binlog Events	Severity:	S1 (Critical)
Version:	5.6.21-24	OS:	Linux
Assigned to:		CPU Architecture:	Any
Tags:	binlog, out of sync, replication, unsafe statement

Description:
When you have 2 levels of replication:

Master1 -> Master2 -> Slaves

and use MIXED binlog, then when binlog rotation ocures you can lose some events on the 2nd level of replication. that happens when unsafe statement is translated to row based format.

The 2nd level Master doesnt write into his binlog all of the rows that he receves from 1st Master.
On ROW based binlog this doesn't happen. 
On production this leads to unpredictable differencees between 2nd level slaves and masters/1st level slaves.

How to repeat:
2 level replication:
M1->M2->S1

on M1/2:
binlog_format= MIXED
max_binlog_size=4096 # just for easier testing; in production ve use houndreds ob MB and the bug is there too

M1:
use test;
CREATE TABLE `t1` ( `c1` int(10)  AUTO_INCREMENT, PRIMARY KEY (`c1`) ) ENGINE=InnoDB;
delimiter //
create procedure populate (in num int) 
begin 
declare i int default 0;
while i < num do
insert into t1 values (); 
set i = i + 1; 
end while; 
end//
delimiter ;

call populate (10000);

delete from t1 limit 10000;
select count(*) from t1 ;

M2:
use test;
select count(*) from t1 ;

S1:
use test;
select count(*) from t1 ;

my results:
M1:
[test]> select count(*) from t1 ;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

M2:
[test]> select count(*) from t1 ;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

S1:
[test]> select count(*) from t1 ;
+----------+
| count(*) |
+----------+
|     8364 |
+----------+

Suggested fix:
using RBR is not fix or work-around for us.

Hi Vaclav,

Have you turned on log-slave-updates on the intermediate master?

log-slave-updates is ON on M2; 
the replication is working, as you can see on the example where through the replication is created table t1 filled with 10000 rows, just the delete command fails on the 2. level slave (S1) and deletes only 1636 of rows not all 10K.

on 5.6.24 bug still there

Thank you for the report.
Could you please provide exact conf files used on M1, M2 and S1?
I could not repeat this issue at my end with below conf and provided test case:

M1 is master to M2, M2 is master to S1
M1->M2->Slave

# M1

[mysqld]

basedir=/export/umesh/server/binaries/mysql-5.6.24
datadir=/export/umesh/server/binaries/mysql-5.6.24/m1
port = 15000
pid-file = /export/umesh/server/binaries/mysql-5.6.24/run/m1.pid
socket = /export/umesh/server/binaries/mysql-5.6.24/run/m1.sock
log-error=/export/umesh/server/binaries/mysql-5.6.24/log/m1.log
server-id=1
log-bin=master1-bin
binlog_format= MIXED
max_binlog_size=4096

# M2
[mysqld]

basedir=/export/umesh/server/binaries/mysql-5.6.24
datadir=/export/umesh/server/binaries/mysql-5.6.24/m2
port = 15001
server-id=2
log-bin=master2-bin
binlog_format= MIXED
max_binlog_size=4096
log-slave-updates
skip-slave-start
pid-file = /export/umesh/server/binaries/mysql-5.6.24/run/m2.pid
socket = /export/umesh/server/binaries/mysql-5.6.24/run/m2.sock
log-error=/export/umesh/server/binaries/mysql-5.6.24/log/m2.log

# S1
[mysqld]

basedir=/export/umesh/server/binaries/mysql-5.6.24
datadir=/export/umesh/server/binaries/mysql-5.6.24/s1
port = 15002
server-id=3
pid-file = /export/umesh/server/binaries/mysql-5.6.24/run/s1.pid
socket = /export/umesh/server/binaries/mysql-5.6.24/run/s1.sock
log-error=/export/umesh/server/binaries/mysql-5.6.24/log/s1.log

#M1

mysql> use test;
Database changed
mysql> CREATE TABLE `t1` ( `c1` int(10)  AUTO_INCREMENT, PRIMARY KEY (`c1`) ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.00 sec)

mysql> delimiter //
mysql> create procedure populate (in num int)
    -> begin
    -> declare i int default 0;
    -> while i < num do
    -> insert into t1 values ();
    -> set i = i + 1;
    -> end while;
    -> end//
Query OK, 0 rows affected (0.00 sec)

mysql> delimiter ;
mysql> call populate (10000);
Query OK, 1 row affected (2.92 sec)

mysql> select count(*) from t1 ;
+----------+
| count(*) |
+----------+
|    10000 |
+----------+
1 row in set (0.00 sec)

mysql> delete from t1 limit 10000;
Query OK, 10000 rows affected (0.04 sec)

mysql> select count(*) from t1 ;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

#M2

mysql> select count(*) from t1 ;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

#S1
mysql> select count(*) from t1 ;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

If you can provide more information, feel free to add it to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.

M1 config

Attachment: M1.tar (application/x-tar, text), 20.00 KiB.

M2 config

Attachment: M2.tar (application/x-tar, text), 20.00 KiB.

S1 config

Attachment: S1.tar (application/x-tar, text), 20.00 KiB.

I cant change status from Can't repeat to Open (can change only to Closed)

Please see attached conf files. 

I can confirm that parameters master_info_repository relay_log_info_repository set to "FILE" is a workaround for this problem, when set to "TABLE" replication doesnt work corectly.

Marking this as duplicate of Bug #76974 since Bug #76974 has extra details in it