Bug #82727 Allow DELETE to work on performance_schema.host_cache
Submitted: 25 Aug 2016 16:07 Modified: 5 Apr 2018 6:07
Reporter: Simon Mudd (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S4 (Feature request)
Version:5.7.14 OS:Any
Assigned to: CPU Architecture:Any
Tags: host_cache, performance_schema

[25 Aug 2016 16:07] Simon Mudd
Description:
I usually have skip-name-resolve turned OFF as it's handy to see the hostnames of the servers connecting to MySQL. Changing this behaviour requires MySQL to be restarted as the setting is not dynamic.

I use applications like orchestrator to monitor the topology of a replication chain and this needs to figure out the slaves connected to a master. It does this using SHOW SLAVE HOSTS or can use the SHOW PROCESSLIST output and generally works well.

However, due to the nature of the environment in which my servers operate they may be removed and new ones added at any time. The hostname associated with a server (and it's ip address may change).

MySQL shows in the processlist output the hostname:port of the connected servers. If DNS changes this is not reflected correctly and the only way to clear this is to do a TRUNCATE TABLE host_cache even if perhaps only one of several hundred entries is wrong.  Doing this will trigger a new DNS lookup for each new host connecting which is not in the cache.

Clearing the cache periodically (on several servers) requires some co-ordination to avoid heavy DNS traffic at the same time and this is inconvenient.

There is no TTL information in the host_cache and consequently the data never expires. I'm unaware of any plans to add such a setting though it might perhaps be convenient.

The consequence of this is that the output in SHOW PROCESSLIST may show invalid hostname information.  This can confuse applications like orchestrator which once they have connected to a master will try to find all the slaves and then connect to them. This will fail for any host where the host_cache information is stale, usually connecting to a host by a name that no longer exists.

Other software and humans may also be "confused"

How to repeat:
e.g.

root@myhost [(none)]> root@bc215bprdb-01 [(none)]> select @@version, count(*) from performance_schema.host_cache;
+------------+----------+
| @@version  | count(*) |
+------------+----------+
| 5.7.13-log |      534 |
+------------+----------+
1 row in set (0.00 sec)

root@myhost [(none)]> delete from performance_schema.host_cache WHERE HOSTNAME = 'SIMON';
ERROR 1142 (42000): DELETE command denied to user 'root'@'localhost' for table 'host_cache'
root@myhost [(none)]> 

This user has all privileges on *.*.

Suggested fix:
If I could delete the "bad" entries I would be able to manage the situation outside of MySQL so my primary request is:

1.  Allow the DELETE statement to work on P_S.host_cache.

Other P_S tables already allow this to be done, so it seems to seems to be a valid request.

Other related requests are:

2. Adding a TTL setting for the host_cache would also be useful. It may require some sort of "jitter" randomisation to avoid triggering if all entries timeout at the same time a large batch of requests. (?)  Doing this should in theory prevent the problem from being visible and eventually the information shown would be good.

3. Mentioned elsewhere making skip-name-resolve dynamic would also be convenient.
[31 Aug 2016 7:45] MySQL Verification Team
Hello Simon,

Thank you for the feature request!

Thanks,
Umesh
[14 Sep 2016 17:12] Simon Mudd
Some more background.

On a master, running 111 days I checked the host_cache for consistency and got the following result:

WARNING: 349 hostnames from host_cache not found in DNS, 23 host(s) found in p_s.host_cache and hostname from ip is different, 1220 host(s) found in p_s.host_cache and ip address matches

Flushing the cache gives me:

OK: 229 host(s) found in p_s.host_cache and ip address matches (DNS)

So the point being here is that if I use the name in show processlist for something like orchestrator to find a connected slave, the name may be wrong and I may connect to the wrong box, or maybe not connect to any box at all even though the connection is valid and there's a slave connected as indicated by the MySQL server.  This also can lead to confusion about client connections for the same reason.

Switching off DNS (skip_name_resolve in MySQL) is not possible atm despite several requests to make the change dynamic. That would be a good start, as I can then choose to use DNS or not. Restarting all servers to change such a setting is really intrusive and means that for a long period of time I may have different hosts with different configurations. Not ideal.

Better still if I can delete any bad rows in the host_cache then I can ensure I get the benefits of seeing a hostname (which is easier to understand) and not flushing the cache and triggering lots of subsequent dns queries for the next clients that connect.  Doing that on one server is one thing. As the number of servers managed grows this can be more problematic.

So allowing DELETE on host_cache rows would allow me to fix the problem.
Making skip_name_resolve dynamic would also help and be related. I'd prefer both options to allow me to choose my setup.
And maybe giving the host_cache entries a TTL would also make them expire cleanly would probably solve the issue too.
[5 Apr 2018 6:03] Simon Mudd
Related: https://bugs.mysql.com/bug.php?id=90307 (off by 2 error between the max_connect_errors and actual connection attempts processed)
[5 Apr 2018 6:07] Simon Mudd
Related: https://bugs.mysql.com/bug.php?id=71219 (skip_name_resolve should be dynamic)