Bug #45382 | Agent loops in resynchronize task (race condition) | ||
---|---|---|---|
Submitted: | 8 Jun 2009 20:29 | Modified: | 14 Jan 2010 14:55 |
Reporter: | Diego Medina | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Enterprise Monitor: Agent | Severity: | S3 (Non-critical) |
Version: | 2.0.5, 2.1.0.1059 | OS: | Any |
Assigned to: | Kay Roepke | CPU Architecture: | Any |
Tags: | windmill |
[8 Jun 2009 20:29]
Diego Medina
[8 Jun 2009 20:31]
Diego Medina
This is the debug log, clean, from the time the agent started, it went straight into the loop. And it stayed in the loop until I
Attachment: rezync loop log from start - good one.gz (application/x-gzip, text), 112.99 KiB.
[8 Jun 2009 20:34]
Diego Medina
Last comment was supposed to be: This is the debug log, clean, from the time the agent started, it went straight into the loop. And it stayed in the loop until I killed the agent.
[12 Sep 2009 2:30]
Donna Harmon
I ran several tests with the following configuration and was able to reproduce, 4 of 5 times, the problem when restarting the monitor, where the servers do not display in the dashboard for over 15 minutes while the agents display almost immediately: Ubuntu VM Monitor 2.0.6.71 Agent 2.0.6.7159 Single Agent - 31 5.1.35 instances monitored (credit for Darren Oldag for spotting these entries in the logs and below input) The agent gets into a state where it is not honoring the 'resync protocol' with the MEM server. in the logs, i see several situations like this: 2009-09-08 13:12:01: (debug) network-io.c:1097: skipping heartbeat without resync response, resync_state RESYNC_NEED_AGENT_ID 2009-09-08 13:12:34: (debug) last message repeated 10 times [...repeat...] 2009-09-08 14:29:37: (debug) last message repeated 10 times 2009-09-08 14:29:38: (debug) network-io.c:1097: skipping heartbeat without resync response, resync_state RESYNC_NEED_AGENT_ID and/or ... 2009-09-06 15:20:40: (critical) scheduler.c:778: in list_known_data_items state 2009-09-06 15:20:49: (critical) last message repeated 101 times [...repeat...] 2009-09-06 15:30:07: (critical) scheduler.c:778: in list_known_data_items state 2009-09-06 15:30:38: (critical) last message repeated 66 times until the agent gets out of the resync state properly, the server.reachable collection cannot be retrieved, and will not mark any of the mysqld instances as 'up'.
[18 Sep 2009 12:13]
Andrii Nikitin
Here somewhat different error messages followed by 4min gap in all graphs: 2009-09-17 13:17:04: (critical) exception received from server: Internal error: known items are required for list_instances processing 2009-09-17 13:17:49: (critical) G:\bs\bs\merlin\agent-2.0\src\mysql-proxy-0.7.0r1489_20090811_1738\plugins\agent\network-io.c:841: starting task 6 for mysql::server[418fdade-6547-4ccd-9556-c85cedfbc99a] According to Leith this is most probably the same problem
[20 Nov 2009 22:42]
Enterprise Tools JIRA Robot
Sloan Childers writes: available in 2.1 build 1109
[18 Dec 2009 21:11]
Enterprise Tools JIRA Robot
Diego Medina writes: Verified fixed on 2.1.1.1132
[14 Jan 2010 14:55]
MC Brown
A note has been added to the 2.1.1 changelog: The agent could get into a state where it loop through the resynchronization of the core data, without reporting information to the service manager, causing gaps in the data.
[1 Feb 2010 19:44]
Enterprise Tools JIRA Robot
Diego Medina writes: Agent 2.2.0.1611 still has the fix.
[1 Feb 2010 19:56]
Enterprise Tools JIRA Robot
Diego Medina writes: Agent 2.1.1.1142 still has the fix