Bug #30814 Failures in update_slave_list from handle_slave_io halt replication IO thread
Submitted: 4 Sep 2007 23:07 Modified: 13 Jan 2008 13:21
Reporter: Mark Callaghan Email Updates:
Status: Duplicate Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.0.37 OS:Any
Assigned to: CPU Architecture:Any
Tags: io, replication, stop, thread

[4 Sep 2007 23:07] Mark Callaghan
handle_slave_io calls update_slave_list after creating a connection. If this call fails, then handle_slave_io branches to 'err:' and the slave IO thread exits. This halts replication on error for what appears to be a pointless call. There is no benefit to a slave knowing the other slaves and much of the code in repl_failsafe.cc is full of warnings about the code there not working (see http://bugs.mysql.com/bug.php?id=11923). This breaks the semantics of master-retry-count as the slave IO thread exits on this error regardless of the count.

How to repeat:
Run replication over a flaky TCP connection.

Suggested fix:
Don't call update_slave_list.

But if you insist on calling it, handle errors appropriately so as not to break the semantics of the 'master-retry-count' configuration parameter.
[4 Sep 2007 23:13] Mark Callaghan
Wow, and the call to update_slave_list is preceded by this comment. I guess the comment is incorrect.
      Register ourselves with the master.
      If fails, this is not fatal - we just print the error message and go
      on with life.
[6 Jan 2008 5:01] Rayed Alrashed
Having the same problem, flaky connection caused the Slave thread to stop, could you please fix.
[6 Jan 2008 23:00] Mark Callaghan
I think bug 21132 is related to this bug.
[6 Jan 2008 23:05] Mark Callaghan
And I think this bug is the same as http://bugs.mysql.com/bug.php?id=19175
[13 Jan 2008 13:21] Valeriy Kravchuk
Duplicate of Bug #19175.