MySQL Bugs: #84973: New GTID/log_slave_updates functionality causing failures

Bug #84973	New GTID/log_slave_updates functionality causing failures
Submitted:	14 Feb 2017 1:47	Modified:	14 Feb 2017 8:47
Reporter:	Trey Raymond	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.7.17	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
5.7 introduced the ability to have GTID replication without log_slave_updates. Unfortunately, this will still cause failures in certain scenarios.
A real-world use case would be with n-master/single writer configs using msr (though the sample in how to repeat is simpler, it's not realistic).

What appears to happen is that an intermediate slave connected with GTID but without log_slave_updates will still pass down GTID's which it has executed through replication, even though it isn't writing them to the binary logs. This causes slaves below it to fail (or in a realistic case, other masters trying to have a replication channel back to the host).
The error from MySQL, on the lower level slave's status:
Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
You can see from "show master status" on the intermediate host that it's tracking GTIDs with the master's uuid, even though they wouldn't be logged. When the lower slave sees this, and those GTIDs don't exist in the binlogs, it fails.

How to repeat:
Simplest way to demonstrate is to set up a chain of three servers, with GTID enabled.
A - master.
B - slave of A, log_slave_updates=0
C - slave of B

Set them up initially, then write to A.
Then, on B, run show master status. You'll see something like the below (in this case e65451a8... is server A) - GTIDs expected to be logged, but of course log_slave_updates is off.

mysql> show master status;
+-------------------+----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+--------------------------------------------+
| mysqld-bin.000001 | 154 | | ops,ops | e65451a8-efbc-11e6-aac9-782bcb1808c5:1-892 |
+-------------------+----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)

Then check show slave status on server C:

Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

Suggested fix:
This is a very useful feature new in 5.7 (gtid w/o log slave updates), but it seems to have only been partially implemented. A master needs to be aware of what server UUID's it's actually writing data to the binary logs from (easy, if log_slave_updates=0 only itself, if log_slave_updates=1 then all), and negotiate replication with its own slaves based on that.

One possible complication is when the value of log_slave_updates is changed. This currently requires a restart, but in the future it will be dynamic (someday...). With the current situation requiring a restart, all slaves would need to reconnect anyway, so implementation is simple. When it's dynamic, slaves would need to reconnect, and that should be documented.

Hello Trey Raymond,

Thank you for the report.

Thanks,
Umesh