MySQL Bugs: #72581: Slaves with same server_id / server

Bug #72581	Slaves with same server_id / server_uuid compete for master connection
Submitted:	8 May 2014 15:13	Modified:	11 May 2015 10:22
Reporter:	Hartmut Holzgraefe	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6.17, 5.6.19	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Followup of bug #72578:

If a new slave connection with the same server_uuid (or server_id prior to the introduction of server_uuid) as an already connected slave comes in the already connected slave receives an end packet and the new connection is established.

This makes the disconnected slave try to reconnect, which then disconnects the other slave again due to the duplicate id, and this game goes on at high frequency, spamming master and slave error logs.

How to repeat:
Try to set up replication with two slaves with same server_uuid 

Suggested fix:
Reject the new connection with an appropriate error message instead of closing the already established one ...

Hello Hartmut,

Thank you for the bug report.
Verified as described.

Thanks,
Umesh

// How to repeat part

See, Bug #72578

Ok, this seems to be what is causing the competing behavior in sql/rpl_master.cc

/*

  Kill all Binlog_dump threads which previously talked to the same slave
  ("same" means with the same server id). Indeed, if the slave stops, if the
  Binlog_dump thread is waiting (mysql_cond_wait) for binlog update, then it
  will keep existing until a query is written to the binlog. If the master is
  idle, then this could last long, and if the slave reconnects, we could have 2
  Binlog_dump threads in SHOW PROCESSLIST, until a query is written to the
  binlog. To avoid this, when the slave reconnects and sends COM_BINLOG_DUMP,
  the master kills any existing thread with the slave's server id (if this id is
  not zero; it will be true for real slaves, but false for mysqlbinlog when it
  sends COM_BINLOG_DUMP to get a remote binlog dump).

  SYNOPSIS
    kill_zombie_dump_threads()
    slave_uuid      the slave's UUID

*/

So this function would need to be more clever about detecting whether the existing slave connection is still alive, maybe by adding some simple ping mechanism to see whether the other end is still alive?

Thanks for your feedback. This has been fixed in upcoming versions and the following was noted in the 5.6.26 and 5.7.8 changelogs:

When two slaves with the same server_uuid were configured to replicate from a single master, the I/O thread of the slaves kept reconnecting and generating new relay log files without new content. In such a situation, the master now generates an error which is sent to the slave. By receiving this error from the master, the slave I/O thread does not try to reconnect, avoiding this problem.