Bug #67236 | MySQL 5.6 host cache does not seem to record info uniquely by ip, host | ||
---|---|---|---|
Submitted: | 15 Oct 2012 15:54 | Modified: | 15 Jan 2013 6:52 |
Reporter: | Simon Mudd (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Performance Schema | Severity: | S3 (Non-critical) |
Version: | 5.6.5,5.6.6,5.6.7-rc | OS: | Any |
Assigned to: | Marc ALFF | CPU Architecture: | Any |
Tags: | windmill |
[15 Oct 2012 15:54]
Simon Mudd
[15 Oct 2012 16:11]
Simon Mudd
Given the default setting of the host_cache may not be large enough, please provide some idea of the overhead if the default value needs to be increased, both in memory size but also in resources (scanning a larger table takes longer, even if done in memory).
[15 Oct 2012 16:33]
Simon Mudd
Maybe actually the uniqueness should be by ip, not (ip, host). Either way the issue exists.
[16 Oct 2012 2:25]
MySQL Verification Team
Thank you for the bug report.
[16 Oct 2012 7:05]
Simon Mudd
Note: also seen in 5.6.5.
[16 Oct 2012 15:59]
Simon Mudd
Also seen in 5.6.7.
[22 Oct 2012 15:11]
Marc ALFF
Some findings and clarifications, after analysis. 1) From the behavior seen and the analysis done, it seems that the host cache internal data is actually ok, but some rows are displayed multiple times in table performance_schema.host_cache The data is not wiped under load, and the host cache is still functional. The fact that the same IP can be reported multiple times is a verified bug. 2) The host cache maintains internally a unique index, by IP address alone. This index is implemented with a HASH. 3) In terms of CPU overhead, given that a HASH is used, the CPU overhead is not expected to grow significantly as the number of client host grows, and it should definitively not grow in O(N). 4) In terms of memory overhead, the host cache consist of at most N records, where each record is of sizeof(Host_entry), or 336 bytes currently. Even with thousands of client machines, and when keeping stats for a lot of clients, a host cache of 10,000 entries cost around 3.2 Mb of memory. 5) About the expiry policy, - each time an entry is looked at, the LAST_SEEN column is updated - when the table is full, the column with min(LAST_SEEN) is dropped. That is, the least recently used (LRU) entry is dropped. Regards, -- Marc
[23 Oct 2012 5:34]
Simon Mudd
Hi Marc, Thanks for the update to various questions I asked. That info is most useful. If you can please ensure it reaches the documentation pages. My intention is to probably increase this cache size to something like 4096 on all 5.6 hosts as this should cover the current "expected" number of different possible clients by a sufficient margin, and thus reduce any further DNS lookups. The fact I see more than a single row per ip is still a bit disconcerting as I was looking for data on "connect errors" and it looked like this table should provide a good overview of the data (except for the fact that the number of hosts connecting to the servers was too large to fit into the default size host-cache). I gather you are working on that issue still.
[14 Jan 2013 16:37]
Paul DuBois
Noted in 5.6.9, 5.7.0 changelogs. The Performance Schema host_cache table displayed some lines multiple times. This was not an issue with the host cache itself, only with the table that provides information about the cache contents.
[14 Jan 2013 18:24]
Paul DuBois
Added a note to http://dev.mysql.com/doc/refman/5.6/en/host-cache.html that if the host cache is full when an entry is to be stored, the least recently used entry is discarded.
[14 Jan 2013 21:49]
Simon Mudd
Is there a counter for this "discarded entry as host_cache full" status? If not can we add one as otherwise it's impossible to know if this is an issue or not, and a full, but not overflowed host_cache table can't be distinguished from an overflowed host_cache table (that probably needs extending...). If there's no such counter I'll add an appropriate feature request. I think that's important given the complete opaqueness of the host cache in 5.5. The (now fixed) 5.6 implementation resolves a lot of these issues and the counter would allow us to measure when the size is not big enough.
[14 Jan 2013 22:54]
Marc ALFF
Hi Simon. Let's say you have N hosts. With a host cache size of N, you are correct that there is no way to distinguish a full but not overflowed host cache from a full and overflowed host cache ... A suggestion then: use a host cache size of N+1 when expecting N entries: a (almost) full and not overflowed host cache will have at most N rows, while a full and overflowed (?) host cache will have N+1 rows. Monitoring count(*) to make sure there is at least 1 free entry guarantees that no overflow occurred, if this is the metric of interest here. Another suggestion: by definition the next entry that will be discarded if needed to make room is the last recently used host, for which the LAST_SEEN timestamp is the minimum. Monitoring the value of MIN(LAST_SEEN) indicates for how long the host cache has been unchanged. For example, a MIN(LAST_SEEN) 5 days in the past means no entry has been evicted from the host cache during the last 5 days ... On a side note, very few people get notifications for comments on closed bugs, which are definitively not monitored, I just happen to have seen this one since I am assigned to it. Regards, -- Marc Alff
[14 Jan 2013 23:18]
Marc ALFF
Correction to the previous comments ... Monitoring MIN(LAST_SEEN) works only if some host rarely connects to the server, for example very infrequent connections from an admin console. The distribution (and frequency) of connections from clients can have an impact here. Count(*) < host_cache_size seems better.
[15 Jan 2013 6:52]
Simon Mudd
Filed separate feature request: http://bugs.mysql.com/bug.php?id=68085.