MySQL Bugs: #30814: Failures in update_slave_list from handle_slave

Bug #30814	Failures in update_slave_list from handle_slave_io halt replication IO thread
Submitted:	4 Sep 2007 23:07	Modified:	13 Jan 2008 13:21
Reporter:	Mark Callaghan	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.0.37	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	io, replication, stop, thread

Description:
handle_slave_io calls update_slave_list after creating a connection. If this call fails, then handle_slave_io branches to 'err:' and the slave IO thread exits. This halts replication on error for what appears to be a pointless call. There is no benefit to a slave knowing the other slaves and much of the code in repl_failsafe.cc is full of warnings about the code there not working (see http://bugs.mysql.com/bug.php?id=11923). This breaks the semantics of master-retry-count as the slave IO thread exits on this error regardless of the count.

How to repeat:
Run replication over a flaky TCP connection.

Suggested fix:
Don't call update_slave_list.

But if you insist on calling it, handle errors appropriately so as not to break the semantics of the 'master-retry-count' configuration parameter.

Wow, and the call to update_slave_list is preceded by this comment. I guess the comment is incorrect.
    /*
      Register ourselves with the master.
      If fails, this is not fatal - we just print the error message and go
      on with life.

Having the same problem, flaky connection caused the Slave thread to stop, could you please fix.

I think bug 21132 is related to this bug.
http://bugs.mysql.com/bug.php?id=21132

And I think this bug is the same as http://bugs.mysql.com/bug.php?id=19175

Duplicate of Bug #19175.