Bug #52952 | If Agent gets a timeout on initial checkin, it will not retry | ||
---|---|---|---|
Submitted: | 19 Apr 2010 18:45 | Modified: | 17 Aug 2010 10:42 |
Reporter: | Diego Medina | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Enterprise Monitor: Agent | Severity: | S2 (Serious) |
Version: | 2.2.0.1695 | OS: | Any |
Assigned to: | Darren Oldag | CPU Architecture: | Any |
[19 Apr 2010 18:45]
Diego Medina
[19 Apr 2010 18:49]
Enterprise Tools JIRA Robot
Diego Medina writes: The server.reachable started to be sent only after I went to the UI and forced a re-inventory.
[19 Apr 2010 18:49]
Enterprise Tools JIRA Robot
Attachment: 10350_chassis.log.gz (application/x-gzip, text), 172.27 KiB.
[27 May 2010 23:19]
Enterprise Tools JIRA Robot
Darren Oldag writes: EM-4279 is pretty much a duplicate of this bug, not just 'related'
[28 May 2010 14:30]
Enterprise Tools JIRA Robot
Jan Kneschke writes: * if the agent closes his connection after the 120sec timeout, doesn't the server get a exception when it tries to write ? Could it be handled on the server side, by forcing a resync ? * the patch does it's job, just some cosmetics: ... static int network_xml_parse_tasks(xmlNode *agentNode, GAsyncQueue *rcvq, const GString *agent_id, struct network_io_config_t *io_config) { Instead of passing the struct down, only pass the GString * task_sequence down OR don't pass the 'const GString *agent_id' down and take it in the function from the struct. I prefer the 1st. * As this is a task-sequence, we should actually check on it: Did it change since the last one by 0 or 1 (did it decrement ? did it fast forward ?), if not it will be an error which should be handled with a resync. * for that we need to know what kind of integer it is to know when it wraps.
[28 May 2010 19:09]
Enterprise Tools JIRA Robot
Darren Oldag writes: revision-id: oldag@mysql.com-20100528190237-vv0vzasy81q1xtqe parent: marcos.palacios@sun.com-20100527212843-rwerh308fxtar1dl committer: Darren L. Oldag <oldag@mysql.com> branch nick: Monitor22 timestamp: Fri 2010-05-28 14:02:37 -0500 revision-id: oldag@mysql.com-20100528185749-vo5lth6zngrv3olx parent: michael.schuster@oracle.com-20100527111422-rnmpp5mf1twhokj3 committer: Darren L. Oldag <oldag@mysql.com> branch nick: Agent22 timestamp: Fri 2010-05-28 13:57:49 -0500
[7 Jun 2010 23:30]
Enterprise Tools JIRA Robot
Andy Bang writes: In build 2.2.2.1722.
[1 Jul 2010 13:44]
Enterprise Tools JIRA Robot
Diego Medina writes: It has been very hard to reproduce, so we are closing this bug as resolved but note that it may come back. If a customer seems to have this issue, make sure they are using both, the agent and service manager with the fix (it requires both components to be updated)
[5 Jul 2010 7:23]
MC Brown
A note has been added to the 2.2.2 changelog: If a &merlin_agent; got a timeout during the initial checkin with &merlin_server; (for instance, if &merlin_server; was busy), it would fail to resynchronize properly and show the monitored MySQL instances as down.
[29 Jul 2010 23:23]
Enterprise Tools JIRA Robot
Andy Bang writes: In build 2.2.3.1734.
[16 Aug 2010 15:04]
Enterprise Tools JIRA Robot
Diego Medina writes: Verified fixed in 2.2.3.1734.
[17 Aug 2010 10:42]
MC Brown
A note has been added to the 2.2.3 changelog: check-in by &merlin_agent;, the monitored instance could be identified as down.