MySQL Bugs: #78107: Make I/O thread report the SQL commands that fail to help debugging.

Bug #78107	Make I/O thread report the SQL commands that fail to help debugging.
Submitted:	17 Aug 2015 14:41	Modified:	4 Nov 2015 12:51
Reporter:	Simon Mudd (OCA)	Email Updates:
Status:	Open	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S4 (Feature request)
Version:	5.7.8,5.6.27	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	bug#77732

Description:
See: http://bugs.mysql.com/bug.php?id=77732

The referenced bug is reported to be a problem and I think it will be fixed.

However, it's more confusing as the error message that is reported shows the error received from the upstream master but the SQL command it sent is not shown.

Normally the SQL thread may have errors and that currently does log the command that failed, which makes diagnosis easier.  I think that's still missing when the master is configured to use ROW based replication and think that getting the SQL statement (or pseudo SQL statement in the case of RBR) to be shown is quite convenient.

How to repeat:
See the bug report.

Suggested fix:
Report the SQL statement that is sent to the master if you get an error, apart from the error that the master reports.
While normally this sort of thing does not happen one thing that worries me is that the MySQL and other forks of MySQL replication code base is diverging so the chance of an error like this happening again becomes higher. It may not be an issue of different code bases as in the case of bug#77732 and figuring out exactly what's wrong is harder if you don't log the SQL that's sent.

That said, the I/O thread should take care not to log the password of a change master command. Other than that I don't think there's any sensitive information that's exchanged between master and slave.

A similar isssue seen in 5.6.27 shows the following snippet:

(1) 2015-11-04 12:02:10 37036 [Note] Slave SQL thread initialized, starting replication in log 'binlog.012345' at position 81906155, relay log '../log/relaylog.000001' position: 4
(2) 2015-11-04 12:02:10 37036 [Note] Slave I/O thread: connected to master 'some_user@some-host.example.com:3306',replication started in log 'binlog.012345' at position 81906155
(3) 2015-11-04 12:21:08 37036 [ERROR] Error reading packet from server: ...<error message from upstream master removed as not appropriate>

You see in lines (1) and (2) that the I/O and SQL threads start at the indicated position. That's good.
For (3) you see an error message from the "server". You notice: (a) no message that the connection to the server was dropped and (b) no message that the connection to the master was re-attempted AND successful.

So please add a line of logging for (a) and (b) so we can track the state changes in a slave more completely.

Maybe this leads to slightly more logging but given disconnections are usually infrequent I do not see an issue in that, and the more complete logging helps debugging connection problems with a master.

Note: I know from the master logging that the connection WAS dropped and the slave DID connect again.  This is not visible in the logging shown.  The message as it stands make it looks like the error came from the existing master connection and as far as I'm aware that did not happen.

Hm. Perhaps I should file this as a separate request. I'll do that. sorry for putting inappropriate content here, even if it's related.