Bug #84358 Incorrect error reported in master server's log about slave's UUID
Submitted: 28 Dec 2016 12:47 Modified: 29 Dec 2016 9:47
Reporter: Desmond Chew Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.17 OS:Red Hat
Assigned to: CPU Architecture:Any
Tags: UUID, zombie dump

[28 Dec 2016 12:47] Desmond Chew
Description:
When I issue "stop slave; start slave" from the Slave server, I am seeing the following error reporting in my Master server's error log.

2016-12-28T12:36:40.750827Z 32850 [Note] Start binlog_dump to master_thread_id(32850) slave_server(20), pos(hostname01-bin.000118, 1332)
2016-12-28T12:37:40.825527Z 32891 [Note] While initializing dump thread for slave with UUID <xxxxx-xxxx-xxx-xxxx-xxxxxx>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(32850).

hostname01=master server
slave_server is having server_id=20

There replication only consists of master-server setup. Both are running 5.7.17.
binlog_format=statement.

auto.cnf is unique between the master-slave servers.

How to repeat:
I am unsure why the error as both servers are having different UUID.
[29 Dec 2016 9:47] MySQL Verification Team
Hello Desmond Chew,

Thank you for the report.
Observed this with 5.7.17 build.

Thanks,
Umesh
[29 Dec 2016 9:49] MySQL Verification Team
Post Bug #72578 fix,  a NOTE is added on master's error log file with the following information when log_warnings is greater than 1.

"While initializing dump thread for slave with UUID <slave_uuid>,
  found a zombie dump thread with the same UUID.
  Master is killing the zombie dump thread."

I agree with Simon's suggestion, better to simplify the note to avoid confusion here.
[29 Dec 2016 9:53] MySQL Verification Team
-- 5.7.17
-- How to repeat
- Setup master/slave replication

rm -rf master
bin/mysqld --initialize-insecure --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/master -v
bin/mysqld --no-defaults --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/master --core-file --socket=/tmp/mysql.sock --port=3306 --log-error=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/master/log.err --log-bin=master-bin --server_id=1 2>&1 &

rm -rf slave
bin/mysqld --initialize-insecure --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/slave -v
bin/mysqld --no-defaults --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/slave --core-file --socket=/tmp/mysql_slave.sock --port=3307 --log-error=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/slave/log.err --log-bin=slave-bin --server_id=2 --skip_slave_start 2>&1 &

- master
- create repl user, note down position etc

- slave 
- instruct slave to connect master

root@localhost [(none)]> CHANGE MASTER TO
    ->      MASTER_HOST='localhost',
    ->      MASTER_PORT=3306,
    ->      MASTER_USER='repl',
    ->      MASTER_PASSWORD='slavepass',
    ->      MASTER_LOG_FILE='master-bin.000001',
    ->      MASTER_LOG_POS=1220;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

root@localhost [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)

root@localhost [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: localhost
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 1220
               Relay_Log_File: hod03-relay-bin.000002
                Relay_Log_Pos: 321
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
.

           Master_TLS_Version:
1 row in set (0.00 sec)

root@localhost [(none)]> stop slave; start slave;
Query OK, 0 rows affected (0.01 sec)

-- note master's error log

[umshastr@hod03]~: tail -f /export/umesh/server/binaries/GABuilds/mysql-5.7.17/master/log.err
2016-12-29T09:38:29.340374Z 0 [Note] IPv6 is available.
2016-12-29T09:38:29.340391Z 0 [Note]   - '::' resolves to '::';
2016-12-29T09:38:29.340404Z 0 [Note] Server socket created on IP: '::'.
2016-12-29T09:38:29.351580Z 0 [Note] Event Scheduler: Loaded 0 events
2016-12-29T09:38:29.351682Z 0 [Note] Executing 'SELECT * FROM INFORMATION_SCHEMA.TABLES;' to get a list of tables using the deprecated partition engine. You may use the startup option '--disable-partition-engine-check' to skip this check.
2016-12-29T09:38:29.351690Z 0 [Note] Beginning of list of non-natively partitioned tables
2016-12-29T09:38:29.363315Z 0 [Note] End of list of non-natively partitioned tables
2016-12-29T09:38:29.363409Z 0 [Note] bin/mysqld: ready for connections.
Version: '5.7.17-log'  socket: '/tmp/mysql.sock'  port: 3306  MySQL Community Server (GPL)
2016-12-29T09:40:13.378235Z 4 [Note] Start binlog_dump to master_thread_id(4) slave_server(2), pos(master-bin.000001, 1220)
2016-12-29T09:40:57.221527Z 5 [Note] While initializing dump thread for slave with UUID <93071309-cdaa-11e6-bbbd-0010e05f3e06>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(4).
2016-12-29T09:40:57.221596Z 5 [Note] Start binlog_dump to master_thread_id(5) slave_server(2), pos(master-bin.000001, 1220)
[29 Dec 2016 9:58] MySQL Verification Team
Bug #79771 marked as duplicate of this
[2 Jan 2017 12:12] Sven Sandberg
Posted by developer:
 
This is expected behavior. Asynchronous replication does not communicate from slave to master (other than the initial handshake). In particular, when you issue STOP SLAVE on the slave, the slave just stops listening to the master without notifying the master. The master's dump thread will keep sending events that are not be received by any slave. Internally, such threads are called 'zombie dump threads'. Zombie dump threads are automatically killed in two cases:
 1. When net_write_timeout seconds have elapsed since the slave disconnected, and the zombie dump thread tries to send another event, the zombie thread will kill itself.
 2. When a slave connects using the same server_uuid (which means the same slave connects, unless there is a misconfiguration), then that new connection will kill the existing zombie dump thread.

So the 'Note' that you are seeing in the log informs that the second of these expected cases has happened. So there is no bug in mysqld or in your application and nothing to worry about. The 'Note' was introduced in BUG#18731211 (5.6.21).

However, I think the message could be a bit more helpful:
 1. "Zombie dump thread" is probably just an internal term and should be avoided.
 2. It should be more clear from the text (and not just the severity level) that this is a normal, expected condition.
 3. It should be clear from the text that this probably just means that the same slave reconnects.

Also, maybe we should document this in the manual (maybe under START SLAVE unless there is a better place).
[12 Jan 2017 13:01] Sven Sandberg
Posted by developer:
 
See also comment from Aug 2016 at BUG#72578.
[13 Jun 2017 8:09] MySQL Verification Team
Bug #86676 marked as duplicate of this one