| Bug #43537 | MEM agents not resolving hostname-to-IP on each connection | ||
|---|---|---|---|
| Submitted: | 10 Mar 18:01 | Modified: | 4 Sep 17:50 |
| Reporter: | Shawn Green | ||
| Status: | Verified | ||
| Category: | Monitoring: Agent | Severity: | S3 (Non-critical) |
| Version: | 2.0 | OS: | Any (N/A) |
| Assigned to: | Kay Roepke | Target Version: | |
| Tags: | windmill | ||
[10 Mar 18:01]
Shawn Green
[10 Mar 18:25]
Kay Roepke
the agent doesn't resolve the URL to MEM itself, the entire URL is passed to libcurl which does the necessary steps. glancing at the libcurl docs i can't see a way to force re-resolving it. i'm curious: what's the TTL of their MEM DNS record? might they be seeing their own TTL setting here (and restarting MEM simply takes longer than the TTL)?
[10 Jun 21:01]
Kay Roepke
We would need feedback for the question in the above comment to determine the source of the problem. Thanks
[15 Jun 17:32]
Shawn Green
Here is the TTL information about memserver1 (the machine that the agents failed to follow as it changed addresses due to a VLAN shift) [adminuser ~]$ dig any memserver1.site1.anon ; <<>> DiG 9.2.4 <<>> any memserver1.site1.anon ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53006 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2 ;; QUESTION SECTION: ;memserver1.site1.anon. IN ANY ;; ANSWER SECTION: memserver1.site1.anon. 10800 IN A xxx.yyy.102.101 ;; AUTHORITY SECTION: site1.anon. 10800 IN NS ns1.lhr1.activehotels.com. site1.anon. 10800 IN NS ns2.lhr1.activehotels.com. ;; ADDITIONAL SECTION: ns1.xxx.site2.anon. 10800 IN A xxx.yyy.102.200 ns2.xxx.site2.anon. 10800 IN A xxx.yyy.103.200 ;; Query time: 0 msec ;; SERVER: xxx.yyy.102.200#53(xxx.yyy.102.200) ;; WHEN: Mon Jun 15 08:22:32 2009 ;; MSG SIZE rcvd: 152 Restarting the agent allowed for the new address to resolve properly but for one example I know about this problem affecting about 130 agents at the same time. If we could somehow get libcurl to uncache any DNS resolutions when we get a "failure to connect" message and try again, then it would improve our ability to follow network changes like VLAN remaps.
