Bug #118852 Replicas with the same uuid may be connected to the same master database
Submitted: 19 Aug 8:53 Modified: 20 Aug 11:28
Reporter: karry zhang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S4 (Feature request)
Version: OS:Any
Assigned to: CPU Architecture:Any

[19 Aug 8:53] karry zhang
Description:
In some cases two replicas with the same uuid may be connected to the same source.

How to repeat:
Here I simply use the same replica to connect to the source multiple times to simulate this situation

Add the following code:

--- a/sql/rpl_source.cc
+++ b/sql/rpl_source.cc
@@ -1123,6 +1123,7 @@ class Find_zombie_dump_thread : public Find_THD_Impl {
         is_zombie_thread =
             ((thd->server_id == cur_thd->server_id) && !tmp_uuid.length());
       }
+      
       if (is_zombie_thread) return true;
     }
     return false;
@@ -1156,10 +1157,16 @@ void kill_zombie_dump_threads(THD *thd) {
   String replica_uuid;
   get_replica_uuid(thd, &replica_uuid);
   if (replica_uuid.length() == 0 && thd->server_id == 0) return;
-
+  
   Find_zombie_dump_thread find_zombie_dump_thread(replica_uuid);
   THD_ptr tmp_ptr =
       Global_THD_manager::get_instance()->find_thd(&find_zombie_dump_thread);
+
+  DBUG_EXECUTE_IF("before_kill_zombie_dump_threads", {
+    const char act[] = "now WAIT_FOR continue";
+    assert(opt_debug_sync_timeout > 0);
+    assert(!debug_sync_set_action(current_thd, STRING_WITH_LEN(act)));
+  };);
   if (tmp_ptr) {
     /*
       Here we do not call kill_one_thread() as

Execute the following test case:

--source include/master-slave.inc
--source include/have_binlog_format_row.inc
--source include/have_semisync_plugin.inc
--source include/install_semisync.inc
--source include/have_debug.inc

connection master;
SET GLOBAL debug='+d,before_kill_zombie_dump_threads';

connection slave;
--source include/stop_slave.inc
--source include/start_slave.inc

connection slave;
--source include/stop_slave.inc
--source include/start_slave.inc

connection master1;
SET GLOBAL debug = '-d,before_kill_zombie_dump_threads';
SET DEBUG_SYNC = 'now SIGNAL continue';
SHOW PROCESSLIST;

--source include/rpl_sync.inc
--source include/uninstall_semisync.inc
--source include/rpl_end.inc

We can see the following results. There are two newly created dump threads. These two dump threads obviously come from the same replica. This is the same for different replicas with the same uuid.

include/master-slave.inc
Warnings:
Note	####	Sending passwords in plain text without SSL/TLS is extremely insecure.
Note	####	Storing MySQL user name or password information in the connection metadata repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START REPLICA; see the 'START REPLICA Syntax' in the MySQL Manual for more information.
[connection master]
include/install_semisync.inc
SET GLOBAL debug='+d,before_kill_zombie_dump_threads';
include/stop_slave.inc
include/start_slave.inc
include/stop_slave.inc
include/start_slave.inc
SET GLOBAL debug = '-d,before_kill_zombie_dump_threads';
SET DEBUG_SYNC = 'now SIGNAL continue';
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info
7	event_scheduler	localhost	NULL	Daemon	2	Waiting on empty queue	NULL
18	root	localhost	test	Sleep	1		NULL
19	root	localhost:25946	test	Sleep	1		NULL
20	root	localhost:25947	test	Sleep	1		NULL
22	root	localhost:25949	test	Sleep	1		NULL
23	root	localhost:25950	test	Query	0	init	SHOW PROCESSLIST
24	root	localhost:25953	NULL	Binlog Dump GTID	1	Waiting to finalize termination	NULL
25	root	localhost:25954	NULL	Binlog Dump GTID	1	starting	NULL
26	root	localhost:25955	NULL	Binlog Dump GTID	0	starting	NULL
Warnings:
Warning	1287	'INFORMATION_SCHEMA.PROCESSLIST' is deprecated and will be removed in a future release. Please use performance_schema.processlist instead
SHOW REPLICAS;
Server_Id	Host	Port	Source_Id	Replica_UUID
2	127.0.0.1	13092	1	a8127cbd-7cd4-11f0-9568-b8599f307988
include/rpl_sync.inc
include/uninstall_semisync.inc
include/rpl_end.inc

Suggested fix:
The reason for this problem is that Global_THD_manager::find_thd only finds the first THD with the same uuid in the array and returns.

The correct way is to find all THDs with the same uuid.
[19 Aug 9:04] MySQL Verification Team
Hello,

> In some cases two replicas with the same uuid may be connected to the same source.

A UUID (Universally Unique Identifier) is a 128-bit value used to uniquely identify information in computer systems.

UNIQUELY being a keyword here?!

The case where two replicas have same uuid is invalid case. Can you give me a relevant reason why would that case be valid?
[19 Aug 11:34] karry zhang
Hello, MySQL Verification Team.

The MySQL server generates a true UUID in addition to the default or user-supplied server ID set in the server_id system variable. This is available as the global, read-only variable server_uuid.

The server_uuid is saved in auto.cnf. In some backup scenarios, the auto.cnf file will also be backed up, resulting in two servers having the same uuid.
In some backup scenarios, this auto.cnf file is also backed up, resulting in two servers having the same UUID.

I admit this isn't a typical scenario, so the bug title isn't quite right.

I believe the key impact of this bug is that if the same replica frequently connects, it can lead to a large number of connections to the source. I believe this is the purpose of the zombie kill operation.
[20 Aug 6:11] MySQL Verification Team
Hello,

There is no way I can accept this is a bug. I would argue that whatever that backup policy is - it is a problem / buggy, not the MySQL for assuming UUID is unique.

I will convert this to Feature Request and verify it as such. I doubt dev team will decide to approve it but it is the best I can do.
[20 Aug 11:28] karry zhang
Hello,
As I mentioned, the title of this bug is not well chosen. In fact, the key to this bug is that there may be multiple connections from the same replica at the same time. If MySQL allows this behavior, then why is there a kill zombie operation? If this behavior is not allowed, then it is a bug.