Bug #104859 Improve handling of the MySQL host cache expiry to respect DNS TTL values.
Submitted: 7 Sep 2021 16:15 Modified: 8 Sep 2021 8:04
Reporter: Simon Mudd (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Connection Handling Severity:S3 (Non-critical)
Version:8.0.26 OS:Any
Assigned to: CPU Architecture:Any
Tags: dns, host cache, ttl

[7 Sep 2021 16:15] Simon Mudd
Description:
I sort of consider this a bug though documentation does not mention TTL so maybe you consider this a feature request.  Either way if you use DNS and this can make Ip <-> name resolution easier to track then not respecting DNS TTL causes confusion.

processlist and show slave hosts output showed inconsistent results. This can potentially break automation that expects values to be consistent.

How to repeat:
Seen on a production system (master + 2 slaves).
Names / ip addresses have been modified for obvious reasons but actual queries and responses represent what was seen:

root@host-1042 [(none)]> show slave hosts;
+-----------+------------------------+------+-----------+--------------------------------------+
| Server_id | Host                   | Port | Master_id | Slave_UUID                           |
+-----------+------------------------+------+-----------+--------------------------------------+
| 218024031 | host-1078.example.com  | 3306 | 999999999 | e18a9586-0fcc-11ec-b4d0-9cdc71cefc80 |
| 197021157 | host-1049.example.com  | 3306 | 999999999 | 3d75922d-b2a4-11eb-ae9b-246e96b3f228 |
+-----------+------------------------+------+-----------+--------------------------------------+
2 rows in set, 1 warning (0.00 sec)

root@host-1042 [(none)]> SELECT SUBSTRING(HOST,1,INSTR(HOST,":")-1) AS replica,                 user as user                 FROM information_schema.PROCESSLIST                 WHERE COMMAND IN ("Binlog Dump","Binlog Dump GTID");
+------------------------+----------+
| replica                | user     |
+------------------------+----------+
| host-1161.example.com  | rep_user |
| host-1049.example.com  | rep_user |
+------------------------+----------+
2 rows in set (0.00 sec)

From processlist;
| 4865658 | rep_user           | host-1161.example.com:45633            | NULL | Binlog Dump GTID |  7560 | Master has sent all binlog to slave; waiting for more updates | NULL             |

host-1049 show correctly, host-1161 shows as host-1078.

# checking configuration of the 1161/1078 host:

Configuration of /etc/my.cnf on host-1078 looks correct:

report_host = host-1078.example.com
report_port = 3306

as does running configuration:

root@host-1078 [(none)]> select @@report_host;
+-----------------------+
| @@report_host         |
+-----------------------+
| host-1078.example.com |
+-----------------------+
1 row in set (0.00 sec)

so configuration looks good.

Checking DNS:

$ host host-1078.example.com
host-1078.example.com has address 10.1.2.3
$ host 10.1.2.3
3.2.1.10.in-addr.arpa domain name pointer host-1078.example.com.
$

So DNS is consistent.
Checking with dig I see the DNS TLL is configured as 300 seconds.

;; ANSWER SECTION:
host-1078.example.com. 300 IN A      10.1.2.3

Resolved the inconsistent logging with:

Resolved the inconsistent logging with:

(1)
root@host-1042 [information_schema]> flush hosts;
Query OK, 0 rows affected, 1 warning (0.00 sec)

(2)
Restart replica in the other host

(3)
root@host-1042 [information_schema]> SELECT SUBSTRING(HOST,1,INSTR(HOST,":")-1) AS replica,                 user as user                 FROM information_schema.PROCESSLIST                 WHERE COMMAND IN ("Binlog Dump","Binlog Dump GTID");
+------------------------+----------+
| replica                | user     |
+------------------------+----------+
| host-1078.example.com  | rep_user |
| host-1049.example.com  | rep_user |
+------------------------+----------+
2 rows in set (0.00 sec)

root@host-1042 [information_schema]>

Conclusion:

host-1161 went away and was replaced by host-1078 using the same ip address.
The MySQL host cache did not handle/notice this change

Current configuration of the host cache shows:

[root@host-1042 ~]$ grep host_cache_size /etc/my.cnf
# host_cache_size is now dynamic and being large enough prevents unneeded DNS lookups
host_cache_size = 5000

So the large enough host cache means it never clears old entries.

Suggested fix:
DNS has a TTL value. Please use and respect it.

This would avoid confusing logging in the mysql server which may potentially break replication.

A lot of people like to set skip_name_resolve = 1 which in some environments may make sense but in others the additional information provided by DNS means you don't need extra tooling to recognise systems and management is simpler.
[7 Sep 2021 16:19] Simon Mudd
Sorry typo: "This would avoid confusing logging in the mysql server which may potentially break replication." SHOULD read:

This would avoid confusing logging in the mysql server which may potentially break automation.
[7 Sep 2021 16:30] Simon Mudd
Documentation at https://dev.mysql.com/doc/refman/8.0/en/host-cache.html says "The server performs host name resolution using the gethostbyaddr() and gethostbyname() system calls" but neither calls handle caching.

There's no other way to configure a generic TTL of the host cache and if any TTL configuration is going to be considered I'd really expect DNS to be used natively.
[8 Sep 2021 7:13] MySQL Verification Team
Hello Simon,

Thank you for the feature request!

regards,
Umesh