Bug #87661 Disabling multi-threaded slave replication and binlog causes failure
Submitted: 5 Sep 2017 3:28 Modified: 2 Apr 2019 23:39
Reporter: monty solomon Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.18 OS:CentOS (6.9)
Assigned to: CPU Architecture:Any

[5 Sep 2017 3:28] monty solomon
Description:
I disabled multi-threaded slave replication by changing configuration options and restarting the mysql server.

After starting the slave it failed with an error.

               Last_SQL_Errno: 1062
               Last_SQL_Error: Could not execute Write_rows event on table links; Duplicate entry '---redacted---' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log bin.010441, end_log_pos 217845660

Reverting the changed configuration options and restarting the slave resumed replication without error.

I have seen this failure mode on multiple clusters.

How to repeat:
Change the following settings in the my.cnf file to disable multi-threaded slave replication and binary logging, restart the mysql server, and start the slave. 

#log-bin = /opt/mysql/dblogs1/...
#log-bin-index = /opt/mysql/dblogs1/...
slave_parallel_workers = 0

Observe replication failure in SQL thread caused by reported duplicate entry.

Revert the changed configuration options, restart the mysql server, and start the slave.

Observe replication resumes without error.

Change the settings in the my.cnf file to disable multi-threaded slave replication and binary logging, restart the mysql server, and start the slave. 

Observe replication failure in SQL thread caused by reported duplicate entry.
[14 Sep 2017 14:01] MySQL Verification Team
Hi Monty,

Sorry you are having such a hard time with MTS :(

I have issues reproducing this bug even after spending some considerable amount of time working on it. 

- I setup MTS
- I put load on the master (with sysbench for e.g.)
- I shutdown slave, turn off MTS, start slave

and everything continues normally, no failures?!

Do you always have an issue with same table when you reproduce this, how easily can you reproduce this problem?

best regards
Bogdan
[17 Sep 2017 6:35] monty solomon
Bogdan,

The steps you wrote appear to be different from the ones I listed. Instead of "shutdown slave, turn off MTS, start slave" edit the my.cnf file to disable multi-threaded slave replication and binary logging, restart the server, and start the slave.

I was able to reproduce the failure on several different clusters by following the steps I provided.
[4 Dec 2017 16:05] MySQL Verification Team
Hi,

I was shutting down both servers, no changes, still can't reproduce.
What kind of load are you putting on the master?

all best
Bogdan
[21 Dec 2017 3:23] monty solomon
The average QPS on one cluster is 2.4 K.
[21 Dec 2017 4:54] MySQL Verification Team
Hi,

I managed to reproduce this on one setup, but I'm not sure why as I did everything exactly the same as countless previous times I was not able to reproduce it. I need to analyze this bit more before I decide how to proceed.

thanks
Bogdan
[15 Jan 2018 9:45] MySQL Verification Team
I managed to reproduce this again (doing nothing different then last few times) so I'm verifying this bug. I don't think this will be easy to fix as I don't have a 1:1 reproducible test case so please stay tuned if replication devs have additional questions

all best
Bogdan
[2 Apr 2019 23:39] monty solomon
Bogdan,

Hi. I'm checking in to see if there has been any update or news of any kind. We are experiencing slave delays on some clusters where multi-threaded slave replication might help.

Thanks.