MySQL Bugs: #87661: Disabling multi-threaded slave replication and binlog causes failure

Bug #87661	Disabling multi-threaded slave replication and binlog causes failure
Submitted:	5 Sep 2017 3:28	Modified:	2 Apr 2019 23:39
Reporter:	monty solomon	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.7.18	OS:	CentOS (6.9)
Assigned to:		CPU Architecture:	Any

Description:
I disabled multi-threaded slave replication by changing configuration options and restarting the mysql server.

After starting the slave it failed with an error.

               Last_SQL_Errno: 1062
               Last_SQL_Error: Could not execute Write_rows event on table links; Duplicate entry '---redacted---' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log bin.010441, end_log_pos 217845660

Reverting the changed configuration options and restarting the slave resumed replication without error.

I have seen this failure mode on multiple clusters.

How to repeat:
Change the following settings in the my.cnf file to disable multi-threaded slave replication and binary logging, restart the mysql server, and start the slave. 

#log-bin = /opt/mysql/dblogs1/...
#log-bin-index = /opt/mysql/dblogs1/...
slave_parallel_workers = 0

Observe replication failure in SQL thread caused by reported duplicate entry.

Revert the changed configuration options, restart the mysql server, and start the slave.

Observe replication resumes without error.

Change the settings in the my.cnf file to disable multi-threaded slave replication and binary logging, restart the mysql server, and start the slave. 

Observe replication failure in SQL thread caused by reported duplicate entry.

Hi Monty,

Sorry you are having such a hard time with MTS :(

I have issues reproducing this bug even after spending some considerable amount of time working on it. 

- I setup MTS
- I put load on the master (with sysbench for e.g.)
- I shutdown slave, turn off MTS, start slave

and everything continues normally, no failures?!

Do you always have an issue with same table when you reproduce this, how easily can you reproduce this problem?

best regards
Bogdan

Bogdan,

The steps you wrote appear to be different from the ones I listed. Instead of "shutdown slave, turn off MTS, start slave" edit the my.cnf file to disable multi-threaded slave replication and binary logging, restart the server, and start the slave.

I was able to reproduce the failure on several different clusters by following the steps I provided.

Hi,

I was shutting down both servers, no changes, still can't reproduce.
What kind of load are you putting on the master?

all best
Bogdan

The average QPS on one cluster is 2.4 K.

Hi,

I managed to reproduce this on one setup, but I'm not sure why as I did everything exactly the same as countless previous times I was not able to reproduce it. I need to analyze this bit more before I decide how to proceed.

thanks
Bogdan

I managed to reproduce this again (doing nothing different then last few times) so I'm verifying this bug. I don't think this will be easy to fix as I don't have a 1:1 reproducible test case so please stay tuned if replication devs have additional questions

all best
Bogdan

Bogdan,

Hi. I'm checking in to see if there has been any update or news of any kind. We are experiencing slave delays on some clusters where multi-threaded slave replication might help.

Thanks.