MySQL Bugs: #75261: mysql logging around a slave disconnection sometimes is not clear

Bug #75261	mysql logging around a slave disconnection sometimes is not clear
Submitted:	18 Dec 2014 13:07	Modified:	17 Jan 2015 10:10
Reporter:	Simon Mudd (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6.15	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	i/o thread, logging, replication

Description:
While doing some HA configuration tests I deliberately disconnect the master from the slave and check the logging. I notice that the logging is far from clear if the slave reconnects to the master immediately.

You really need to look at the logs to figure out the current state and right now you pretty much have to use something else like SHOW SLAVE STATUS\G to be sure of the I/O threads state. The logging does not make this clear in my opinion.

How to repeat:
Here is some logging from earlier:

-- earlier logging
2014-12-18 09:36:13 23253 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-12-18 09:36:13 23253 [Note] Slave SQL thread initialized, starting replication in log 'binlog.007423' at position 680880579, relay log '../log/relaylog.000001' position: 4
2014-12-18 09:36:13 23253 [Note] Slave I/O thread: connected to master 'replication_user@master.server.name:3306',replication started in log 'binlog.007423' at position 680880579
2014-12-18 13:40:05 23253 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2014-12-18 13:40:05 23253 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.007428' at position 525496273
2014-12-18 13:40:05 23253 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-12-18 13:41:32 23253 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2014-12-18 13:41:32 23253 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.007428' at position 561905212
2014-12-18 13:41:32 23253 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
-- you may not realise but we are reconnected here.

-- I deliberately down time the master connection for a "long time" 
2014-12-18 13:45:07 23253 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2014-12-18 13:45:07 23253 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.007428' at position 640993609
2014-12-18 13:45:07 23253 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-12-18 13:45:11 23253 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2014-12-18 13:45:11 23253 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.007428' at position 642544945
2014-12-18 13:45:11 23253 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-12-18 13:45:11 23253 [ERROR] Slave I/O: error reconnecting to master 'replication_user@master.server.name:3306' - retry-time: 60  retries: 1, Error_code: 2003
2014-12-18 13:46:11 23253 [Note] Slave: connected to master 'replication_user@master.server.name:3306',replication resumed in log 'binlog.007428' at position 642544945
-- we see we are reconnected to the master

-- a momentary disconnection:
2014-12-18 13:52:26 23253 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2014-12-18 13:52:26 23253 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.007428' at position 793924105
2014-12-18 13:52:26 23253 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
-- you may not realise but we are reconnected here.

Suggested fix:
I find this logging far from clear. Fixing the logging should help avoid confusion and also make it easy to determine the current I/O thread state.

So I'd suggest this line of logging is _always_ shown if the I/O thread reconnects.

2014-12-18 13:46:11 23253 [Note] Slave: connected to master 'replication_user@master.server.name:3306',replication resumed in log 'binlog.007428' at position 642544945

It makes it clear which master is being talked to and also which replication position is being used.
If using GTID mode please also ensure the logging of the GTID negotiated positions is shown (slightly off-topic for this post but seems missing now)

This error message is completely pointless given that there's no user connected to the server and no 'stop slave' or 'start slave' command is being run:

"2014-12-18 13:45:11 23253 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information."

So in this specific case at least the reconnection is being attempted by the I/O thread "on it's own" and "with whatever credentials it has" , so please remove this warning here.

These suggested changes will:
- clean up logging
- remove useless messages (noise)
- make it clear from the logging alone that the i/o thread is connected or disconnected from it's master.

The io thread reconnect messages are a known noisy behavior

we have other request on this as well:

Bug #72629	Slave security warning printed too often

On the vague is the io thread connected or not, i'm still investigating, everytime i see reconnecting to retry in tests, slave status io running shows : connecting (on 5.6.22)

2014-12-22 22:04:27 19371 [ERROR] Slave I/O: error reconnecting to master 'root@127.0.0.1:3306' - retry-time: 60  retries: 1, Error_code: 2003

 Slave_IO_Running: Connecting

Will investigate some more.

The issue is this:

Normal IO disconnection/Reconnection results in: (omitting the start slave repository spam)

2014-12-22 22:04:27 19371 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog-bin.000030' at position 120
2014-12-22 22:04:27 19371 [ERROR] Slave I/O: error reconnecting to master 'root@127.0.0.1:3306' - retry-time: 60  retries: 1, Error_code: 2003

At this point slave status will show io thread status of "connecting"..., once it does reconnect:

2014-12-22 22:13:27 19371 [ERROR] Slave I/O: error reconnecting to master 'root@127.0.0.1:3306' - retry-time: 60  retries: 10, Error_code: 2003
2014-12-22 22:14:27 19371 [Note] Slave: connected to master 'root@127.0.0.1:3306',replication resumed in log 'binlog-bin.000030' at position 120

Now it's possible to reproduce the lack of the slave io thread connected ack if one kills the binlog thread on the master:

2015-01-16 13:22:34 1862 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'shannons-binlog-bin.000006' at position 120
2015-01-16 13:22:34 1862 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.

^ the above is all that will be logged when the master recreates the binlog dump thread.

To reproduce, setup a slave, kill the master binlog dump thread while tailing the slave error log. Suppose this can happen with brief network events as well.

Either way the above doesn't seem correct, if we're reconnecting it should log when it has reconnected as the original filer states.

Verified as described see test in previous comment to reproduce io reconnection confusion with no ack there was actually a successful reconnect.

While usually I stop a slave connection cleanly on a slave (STOP SLAVE) sometimes I have to kill the connections on the master (KILL <slave_thread_id_on_master>), so it may not just be network "drops" that cause this, but deliberate cleanups by the DBA, or even maintenance on the master, or for example a vip being taken down and brought up again.  Under good circumstances once a slave is working this type of logging is infrequent, but if these situations occur it's really good to see the (change in) state of the I/O thread in the slave's log.

So the logging should IMO show the correct connection being made, when disconnections happen, and periodically if retrying to connect indicate the connection is down. (if the logging clearly shows the i/o thread is not connected that's less important of course but it's good to know there's a continuing connection problem.)