MySQL Bugs: #89876: mysqladmin flush-hosts is not safe in GTID mode

Bug #89876	mysqladmin flush-hosts is not safe in GTID mode
Submitted:	1 Mar 2018 16:44	Modified:	6 Mar 2018 13:58
Reporter:	Simon Mudd (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7+	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
An error such as this one looks easy to fix

2018-03-01 13:05:55 ERROR Error 1129: Host 'a.b.c.d' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'

However, if you run the suggested command (or the SQL statement 'FLUSH HOSTS') on a slave you will break replication consistency as this generates a binlog event on the server concerned and master and slave will have different gtid_executed values.

Relocating servers later will then be problematic.

So maybe being able to not write the binlog events (flush hosts feels "local" not global so not sure why replicated) feels better or could/should be optional.
 

How to repeat:
See above.

Suggested fix:
consider:
( 1 ) not writing to binlogs anyway
( 2 ) make this optional
( 3 ) not writing to binlogs on a slave (if there's an upstream master)

Hi Simon, 
I'm torn between verifying this as S4 (Feature request) and S3 (bug) since it does behave as intended but intended behavior breaks stuff.. I'll leave it as S3 and let's see what the dev team have to say about it.

all best
Bogdan

Hi Bogdan,

Yes, I understand your point of view but ...

(1) Please fix the documentation to explain the behaviour on a slave if GTID is being used. Several other flush commands have similar unexpected/desirable behaviour which have bitten me when moving to GTID.

With documentation that's a way forward.

(2) The error message in the log should be modified to be "safer".

As it stands it may lead a DBA to do the wrong thing which can be quite undesirable. Relocating servers should be _easy_ with GTID but this specific error message leads to potential breakage, if not immediately later when the "extra event" has been purged from the binlogs. At that point relocation will fail and someone has to know that this was an event that didn't matter, and consider starting to inject an empty event to clean things up. Will we remember what happened 6-12 months after running such a command (mysqladmin flush-hosts) if we don't catch it immediately? I'm not sure. Will we be sure that the extra event was due to this? I don't think so.

(3) Look at how to work around this. Manual use of "SET SQL_LOG_BIN = 0; FLUSH HOSTS" works fine but you have to remember to do this. It's _very_ easy to forget the first statement.

Personally I'm not sure why such a command actually does write to the binlogs as it flushes the local host cache. It _may_ be convenient to do this on all servers at the same time via the master but it's not intuitive that this is happening. My feeling is that the behaviour should be opposite: do this locally (without writing to the binlogs) unless you want to explicitly write it to the binlogs via an extended command such as using WRITE_TO_BINLOG.

Some of the uses mentioned in https://dev.mysql.com/doc/refman/5.7/en/flush.html make sense to be pushed to the binlog but I'm not sure that's true for "caches" of the type where FLUSH HOSTS is used.

Feel free to add a specific FR for WRITE_TO_BINLOG and maybe a server option which decides what the default behaviour is but (1) and (2) above look like they can be and need to be addressed.

Posted by developer:
 
Thanks for an interesting suggestion, Simon.

Yes, it is counter-intuitive that FLUSH and other administrative statements are replicated.

Note that FLUSH has an optional NO_WRITE_TO_BINLOG clause. So I think at least a partial fix is to make mysqladmin flush-hosts use NO_WRITE_TO_BINLOG. Would that work for you?

Hi. To your question "So I think at least a partial fix is to make mysqladmin flush-hosts use NO_WRITE_TO_BINLOG. Would that work for you?"

Yes, it would solve the "obvious problem" so would be good.