Description:
I sort of consider this a bug though documentation does not mention TTL so maybe you consider this a feature request. Either way if you use DNS and this can make Ip <-> name resolution easier to track then not respecting DNS TTL causes confusion.
processlist and show slave hosts output showed inconsistent results. This can potentially break automation that expects values to be consistent.
How to repeat:
Seen on a production system (master + 2 slaves).
Names / ip addresses have been modified for obvious reasons but actual queries and responses represent what was seen:
root@host-1042 [(none)]> show slave hosts;
+-----------+------------------------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID |
+-----------+------------------------+------+-----------+--------------------------------------+
| 218024031 | host-1078.example.com | 3306 | 999999999 | e18a9586-0fcc-11ec-b4d0-9cdc71cefc80 |
| 197021157 | host-1049.example.com | 3306 | 999999999 | 3d75922d-b2a4-11eb-ae9b-246e96b3f228 |
+-----------+------------------------+------+-----------+--------------------------------------+
2 rows in set, 1 warning (0.00 sec)
root@host-1042 [(none)]> SELECT SUBSTRING(HOST,1,INSTR(HOST,":")-1) AS replica, user as user FROM information_schema.PROCESSLIST WHERE COMMAND IN ("Binlog Dump","Binlog Dump GTID");
+------------------------+----------+
| replica | user |
+------------------------+----------+
| host-1161.example.com | rep_user |
| host-1049.example.com | rep_user |
+------------------------+----------+
2 rows in set (0.00 sec)
From processlist;
| 4865658 | rep_user | host-1161.example.com:45633 | NULL | Binlog Dump GTID | 7560 | Master has sent all binlog to slave; waiting for more updates | NULL |
host-1049 show correctly, host-1161 shows as host-1078.
# checking configuration of the 1161/1078 host:
Configuration of /etc/my.cnf on host-1078 looks correct:
report_host = host-1078.example.com
report_port = 3306
as does running configuration:
root@host-1078 [(none)]> select @@report_host;
+-----------------------+
| @@report_host |
+-----------------------+
| host-1078.example.com |
+-----------------------+
1 row in set (0.00 sec)
so configuration looks good.
Checking DNS:
$ host host-1078.example.com
host-1078.example.com has address 10.1.2.3
$ host 10.1.2.3
3.2.1.10.in-addr.arpa domain name pointer host-1078.example.com.
$
So DNS is consistent.
Checking with dig I see the DNS TLL is configured as 300 seconds.
;; ANSWER SECTION:
host-1078.example.com. 300 IN A 10.1.2.3
Resolved the inconsistent logging with:
Resolved the inconsistent logging with:
(1)
root@host-1042 [information_schema]> flush hosts;
Query OK, 0 rows affected, 1 warning (0.00 sec)
(2)
Restart replica in the other host
(3)
root@host-1042 [information_schema]> SELECT SUBSTRING(HOST,1,INSTR(HOST,":")-1) AS replica, user as user FROM information_schema.PROCESSLIST WHERE COMMAND IN ("Binlog Dump","Binlog Dump GTID");
+------------------------+----------+
| replica | user |
+------------------------+----------+
| host-1078.example.com | rep_user |
| host-1049.example.com | rep_user |
+------------------------+----------+
2 rows in set (0.00 sec)
root@host-1042 [information_schema]>
Conclusion:
host-1161 went away and was replaced by host-1078 using the same ip address.
The MySQL host cache did not handle/notice this change
Current configuration of the host cache shows:
[root@host-1042 ~]$ grep host_cache_size /etc/my.cnf
# host_cache_size is now dynamic and being large enough prevents unneeded DNS lookups
host_cache_size = 5000
So the large enough host cache means it never clears old entries.
Suggested fix:
DNS has a TTL value. Please use and respect it.
This would avoid confusing logging in the mysql server which may potentially break replication.
A lot of people like to set skip_name_resolve = 1 which in some environments may make sense but in others the additional information provided by DNS means you don't need extra tooling to recognise systems and management is simpler.