Bug #72578 | Duplicate slave server_uuid in replication forum gives no specific error message | ||
---|---|---|---|
Submitted: | 8 May 2014 14:57 | Modified: | 21 Aug 2014 7:57 |
Reporter: | Hartmut Holzgraefe | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.6.17, 5.6.19 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[8 May 2014 14:57]
Hartmut Holzgraefe
[8 May 2014 15:09]
Hartmut Holzgraefe
It is actually not the connecting slave that gets rejected, but the existing connection is closed in favor of the new one ... which causes a steady flow of reconnects from the competing slaves with same server_uuid ... so my "only suggest duplicate IDs if connection fails right away" suggestion doesn't make sense ... Separate but report about the wrong connection being closed will follow (this was also already mentioned in one of the bugs that were marked duplicate of 16927) ...
[9 May 2014 9:35]
MySQL Verification Team
Hello Hartmut, Thank you for the bug report. Verified as described. Thanks, Umesh
[9 May 2014 9:43]
MySQL Verification Team
// How to repeat 1. Setup simple replication (Master-Slave1) 2. Clone Slave2 from Slave1(keeping auto.cnf as is) 3. Bring up Slave2 and instruct Slave2 to connect Master On master> HOW SLAVE HOSTS shows just one slave as connected(whichever connected first in Slave1/Slave2) Suppose Slave1 already connected, check Slave2 error log which is flooded with "[Note] Slave: received end packet from server, apparent master shutdown:" Also, check master's error log - it is flooded with 2014-05-11 05:39:35 9407 [Note] Start binlog_dump to master_thread_id(56) slave_server(2), pos(master-bin.000003, 813) 2014-05-11 05:39:35 9407 [Note] Start binlog_dump to master_thread_id(57) slave_server(3), pos(master-bin.000003, 813) 2014-05-11 05:39:35 9407 [Note] Start binlog_dump to master_thread_id(58) slave_server(2), pos(master-bin.000003, 813) 2014-05-11 05:39:35 9407 [Note] Start binlog_dump to master_thread_id(59) slave_server(3), pos(master-bin.000003, 813) 2014-05-11 05:39:35 9407 [Note] Start binlog_dump to master_thread_id(60) slave_server(2), pos(master-bin.000003, 813) 2014-05-11 05:39:35 9407 [Note] Start binlog_dump to master_thread_id(61) slave_server(3), pos(master-bin.000003, 813) ## Work around I see is if we delete auto.cnf before bringing up Slave2
[21 Aug 2014 7:57]
David Moss
The following was added to the 5.6.21 and 5.7.5 change logs with commit 3981: The error messages generated when a duplicate server UUID causes issues during replication have been improved. The slave error now identifies the duplicate server UUID and the master error identifies the zombie thread that has been killed.
[21 Aug 2014 7:58]
David Moss
Posted by developer: The following was added to the 5.6.21 and 5.7.5 change logs with commit 3981: The error messages generated when a duplicate server UUID causes issues during replication have been improved. The slave error now identifies the duplicate server UUID and the master error identifies the zombie thread that has been killed.
[25 Sep 2014 11:51]
Laurynas Biveinis
revno: 6082 committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com> branch nick: mysql-5.6 timestamp: Thu 2014-07-17 16:56:55 +0530 message: Bug#18731211 Duplicate slave server_uuid in replication forum gives no specific error message Problem & Analysis: =================== When a new slave joins to a replication forum with uuid which is exactly same as one of the existing slave's uuid, then master is confused between old slave and the newly joined slave. When com_binlog_send request comes from newly joined slave, it thinks that old slave's dump thread is zombie and kills it. After some time, old slave tries to reconnect to the master and sends com_binlog_send request to Master. Then Master thinks that newly joined slave's dump thread is zombie and kills it. This ping-pong reconnections between Master->old_slave and Master->new_slave happens continously. This is known expected behaviour but the error message in old_slave and new_slave's error log file is not clear enough to identify this situation. Information in error log file was: "received end packet from server, apparent master shutdown" which does not say anything about server_uuid misconfiguration in the replication forum. Fix: Changing the error message which clearly mentions all the possible causes which resulted this disconnection. New information in error log file is: "Slave: received end packet from server due to dump thread being killed on master. Dump threads are killed for example during master shutdown, explicitly by a user, or when the master receives a binlog send request from a duplicate server UUID <server_uuid>. " And also a NOTE is added on master's error log file with the following information when log_warnings is greater than 1. "While initializing dump thread for slave with UUID <slave_uuid>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread."
[30 Sep 2014 14:07]
David Moss
Posted by developer: Reopened by merge, no new changelog entries needed. Set back to Closed.
[13 Aug 2015 9:08]
Simon Mudd
Related to bug#33815. Note: calling the "other thread" a zombie is using a terminology which may be confusing or simply wrong, especially if the action is triggered by 2 slaves with the same uuid and they are both "active". "zombie" implies "dead or not working", when in this case it may clearly not be true. So perhaps it's best to simply say something like: "While initializing dump thread for slave with UUID <slave_uuid>, found an existing dump thread with the same UUID. Master is killing the existing dump thread." Maybe it's worth adding if the time between "reconnections" under these circumstances is very short: "Seeing this message frequently for the same UUID is probably a sign of more than one slave using the same uuid, a configuration mistake", but if you do this rate limit the frequency of this message?
[29 Dec 2016 9:54]
MySQL Verification Team
Related, Bug #84358