| Bug #45382 | Agent loops in resynchronize task (race condition) | ||
|---|---|---|---|
| Submitted: | 8 Jun 22:29 | Modified: | 20 Nov 23:42 |
| Reporter: | Diego Medina | ||
| Status: | QA testing | ||
| Category: | Monitoring: Agent | Severity: | S3 (Non-critical) |
| Version: | 2.0.5, 2.1.0.1059 | OS: | Any |
| Assigned to: | Kay Roepke | Target Version: | |
| Triage: | Needs Triage: R4 (High) / E4 (High) | ||
[8 Jun 22:29]
Diego Medina
[8 Jun 22:31]
Diego Medina
This is the debug log, clean, from the time the agent started, it went straight into the loop. And it stayed in the loop until I
Attachment: rezync loop log from start - good one.gz (application/x-gzip, text), 112.99 KiB.
[8 Jun 22:34]
Diego Medina
Last comment was supposed to be: This is the debug log, clean, from the time the agent started, it went straight into the loop. And it stayed in the loop until I killed the agent.
[12 Sep 4:30]
Donna Harmon
I ran several tests with the following configuration and was able to reproduce, 4 of 5 times, the problem when restarting the monitor, where the servers do not display in the dashboard for over 15 minutes while the agents display almost immediately: Ubuntu VM Monitor 2.0.6.71 Agent 2.0.6.7159 Single Agent - 31 5.1.35 instances monitored (credit for Darren Oldag for spotting these entries in the logs and below input) The agent gets into a state where it is not honoring the 'resync protocol' with the MEM server. in the logs, i see several situations like this: 2009-09-08 13:12:01: (debug) network-io.c:1097: skipping heartbeat without resync response, resync_state RESYNC_NEED_AGENT_ID 2009-09-08 13:12:34: (debug) last message repeated 10 times [...repeat...] 2009-09-08 14:29:37: (debug) last message repeated 10 times 2009-09-08 14:29:38: (debug) network-io.c:1097: skipping heartbeat without resync response, resync_state RESYNC_NEED_AGENT_ID and/or ... 2009-09-06 15:20:40: (critical) scheduler.c:778: in list_known_data_items state 2009-09-06 15:20:49: (critical) last message repeated 101 times [...repeat...] 2009-09-06 15:30:07: (critical) scheduler.c:778: in list_known_data_items state 2009-09-06 15:30:38: (critical) last message repeated 66 times until the agent gets out of the resync state properly, the server.reachable collection cannot be retrieved, and will not mark any of the mysqld instances as 'up'.
[18 Sep 14:13]
Andrii Nikitin
Here somewhat different error messages followed by 4min gap in all graphs: 2009-09-17 13:17:04: (critical) exception received from server: Internal error: known items are required for list_instances processing 2009-09-17 13:17:49: (critical) G:\bs\bs\merlin\agent-2.0\src\mysql-proxy-0.7.0r1489_20090811_1738\plugins\agent\network-io.c:841: starting task 6 for mysql::server[418fdade-6547-4ccd-9556-c85cedfbc99a] According to Leith this is most probably the same problem
[20 Nov 23:42]
Enterprise Tools JIRA Robot
Sloan Childers writes: available in 2.1 build 1109
