MySQL Bugs: #84752: Multi-Slave Replication Fail: bogus data in log event

Bug #84752	Multi-Slave Replication Fail: bogus data in log event
Submitted:	31 Jan 2017 18:38	Modified:	1 Oct 2018 10:55
Reporter:	Gonzalo Miguel Arruti	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.7.12	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
Hello, we have one master and two slaves and as soon as the slaves start to replicate, we get this error in one slave only (the other keeps running): 

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'bogus data in log event; the first event 'mysql-bin.000002' at 211200498, the last event read from '/var/lib/mysql/binlog/mysql-bin.000002' at 211359690, the last byte read from '/var/lib/mysql/binlog/mysql-bin.000002' at 211359709.' 

If we start the slave with the error, it keeps working until the slaves are up-to-date again and one of them gets the error. So we think that the issue is in some way related to the binlog concurrence reading by the slave server I/O thread. If the slaves are not reading from the same position of the binlog (no concurrence readings), the error does not happen.

We can prevent this error in three ways: 
- having only one slave (not an option, we need two of them) 
- setting slave_compressed_protocol=1 (not an option, too much cpu usage) 
- change the sync_binlog value to off (not an option, we cannot lose transactions)

mysql-community-server-5.7.12-1 
Red Hat Enterprise Linux Server release 6.5 (Santiago)
- GTID is enabled (this is something that we need too) 

How to repeat:
We test in differentS enviorements (MySQL 5.7.12, 5.7.17) and always we've had the same result.

Any ideas to fix this issue? 
Regards.

Hi,

I can't "easily" reproduce this problem, I had to try it multiple times with some load in order to get the same problem, but the problem is obviously there. I tried your workarounds and they work.

Setting the bug to verified with "not acceptable workaround", I hope it can be fixed soon.

thanks for your submission
Bogdan

If you can share your my.cnf from both master and slaves it might help me make a "100% reproducible test case", as you say it always behaves like this for you.

Master - my.cnf

Attachment: Master - my.cnf (application/octet-stream, text), 2.75 KiB.

Slave - my.cnf

Attachment: Slave - my.cnf (application/octet-stream, text), 2.75 KiB.

Yes, we've had the problem with high load. I attach the Master and Slave .cnf files  as requested. 

Thanks!

Hi,
Thanks for the config, makes no difference for my reproduction :( ... looks like the heavy load is the only thing important here that makes the difference. I'm able to reproduce this only if I up the load.

the bug is verified so let's see what the devs will say :D
Bogdan

Hi,
Thanks to you for your help and verified. We look forward to what the development team says :) 
Kind Regards

Hi, Gonzalo Miguel Arruti, 

If you are still experiencing this issue, can you please upload *whole* log file from both master and slave?

thanks
Bogdan

Fixed by another patch, changelog entry now added for MySQL 5.7.25 and 8.0.14:

With sync_binlog=1 set, if the binary log was rotated during a commit before the binary log end position was updated, replication stopped on the slave because the server attempted to use the old binary log end position with the new binary log file. The server now compares the binary log file name with the active binary log file when updating the binary log end position, so that the issue does not occur.

Bug #93783 marked as duplicate of this one