Bug #84752 Multi-Slave Replication Fail: bogus data in log event
Submitted: 31 Jan 2017 18:38 Modified: 1 Oct 2018 10:55
Reporter: Gonzalo Miguel Arruti Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.12 OS:Linux
Assigned to: CPU Architecture:Any

[31 Jan 2017 18:38] Gonzalo Miguel Arruti
Description:
Hello, we have one master and two slaves and as soon as the slaves start to replicate, we get this error in one slave only (the other keeps running): 

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'bogus data in log event; the first event 'mysql-bin.000002' at 211200498, the last event read from '/var/lib/mysql/binlog/mysql-bin.000002' at 211359690, the last byte read from '/var/lib/mysql/binlog/mysql-bin.000002' at 211359709.' 

If we start the slave with the error, it keeps working until the slaves are up-to-date again and one of them gets the error. So we think that the issue is in some way related to the binlog concurrence reading by the slave server I/O thread. If the slaves are not reading from the same position of the binlog (no concurrence readings), the error does not happen.

We can prevent this error in three ways: 
- having only one slave (not an option, we need two of them) 
- setting slave_compressed_protocol=1 (not an option, too much cpu usage) 
- change the sync_binlog value to off (not an option, we cannot lose transactions)

mysql-community-server-5.7.12-1 
Red Hat Enterprise Linux Server release 6.5 (Santiago)
- GTID is enabled (this is something that we need too) 

How to repeat:
We test in differentS enviorements (MySQL 5.7.12, 5.7.17) and always we've had the same result.

Any ideas to fix this issue? 
Regards.
[31 Jan 2017 18:39] Gonzalo Miguel Arruti
-
[7 Feb 2017 22:22] Bogdan Kecman
Hi,

I can't "easily" reproduce this problem, I had to try it multiple times with some load in order to get the same problem, but the problem is obviously there. I tried your workarounds and they work.

Setting the bug to verified with "not acceptable workaround", I hope it can be fixed soon.

thanks for your submission
Bogdan
[7 Feb 2017 22:23] Bogdan Kecman
If you can share your my.cnf from both master and slaves it might help me make a "100% reproducible test case", as you say it always behaves like this for you.
[8 Feb 2017 17:23] Gonzalo Miguel Arruti
Master - my.cnf

Attachment: Master - my.cnf (application/octet-stream, text), 2.75 KiB.

[8 Feb 2017 17:24] Gonzalo Miguel Arruti
Slave - my.cnf

Attachment: Slave - my.cnf (application/octet-stream, text), 2.75 KiB.

[8 Feb 2017 17:32] Gonzalo Miguel Arruti
Yes, we've had the problem with high load. I attach the Master and Slave .cnf files  as requested. 

Thanks!
[9 Feb 2017 10:18] Bogdan Kecman
Hi,
Thanks for the config, makes no difference for my reproduction :( ... looks like the heavy load is the only thing important here that makes the difference. I'm able to reproduce this only if I up the load.

the bug is verified so let's see what the devs will say :D
Bogdan
[9 Feb 2017 14:49] Gonzalo Miguel Arruti
Hi,
Thanks to you for your help and verified. We look forward to what the development team says :) 
Kind Regards
[12 Sep 2018 17:28] Bogdan Kecman
Hi, Gonzalo Miguel Arruti, 

If you are still experiencing this issue, can you please upload *whole* log file from both master and slave?

thanks
Bogdan
[1 Oct 2018 10:54] Margaret Fisher
Fixed by another patch, changelog entry now added for MySQL 5.7.25 and 8.0.14:

With sync_binlog=1 set, if the binary log was rotated during a commit before the binary log end position was updated, replication stopped on the slave because the server attempted to use the old binary log end position with the new binary log file. The server now compares the binary log file name with the active binary log file when updating the binary log end position, so that the issue does not occur.
[3 Jan 10:31] Umesh Shastry
Bug #93783 marked as duplicate of this one