| Bug #30015 | Agent status events don't show up in Events tab | ||
|---|---|---|---|
| Submitted: | 24 Jul 2007 20:34 | Modified: | 9 Jan 2015 9:59 |
| Reporter: | Stefan Hinz | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Enterprise Monitor: Web | Severity: | S3 (Non-critical) |
| Version: | 1.2.0.7481 | OS: | Any |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
[24 Jul 2007 20:34]
Stefan Hinz
[13 Sep 2007 20:53]
Bill Weber
When you stop an agent, the agent.reachable variable is set to "shutdown". The "Info" threshold for "MySQL Agent Not Reachable" is "shutdown" and so you get an Info Alert. However, when you click on the red dot for "Agent Status" in the Heat Chart, it takes you to the Events tab with the Severity filter set to Critical and therefore you don't see the Info event for that rule. Instead, you see other Critical events, which is why it appears that no agent events show up in the Events tab. In fact, the event is there, it's just filtered. To see the event, click "reset".
[1 Oct 2007 20:58]
Joshua Ganderson
So, we have an issue here about conflicting thresholds on the heat chart (which treats this as critical and filters critical) and in the agent status rule (which bill identified as info). My suggestion would be to modify the rule threshold to have this be a critical alert. Passing the buck to Andy.
[13 Nov 2007 1:43]
Andy Bang
There are two ways an agent can be "down": 1) The agent process is terminated normally by a user (perhaps because they're bringing the associated mysqld down for maintenance), in which case we get a "shutdown" signal from the agent. This is considered a "normal" shutdown and generates an "Info" event. 2) It crashes for some reason. In this case the agent times out and the Merlin server generates a "timed out" event. This is considered an "abnormal" shutdown and generates a "Critical" event. However in both cases we show a red dot because the Heat Chart only knows about up or down, and not about "planned down" vs. "unplanned down". I personally think the basic behavior of the rule is correct but understand that it's confusing. I can fix this one rule by generating a critical event in both shutdown cases, but I think we still have the same underlying problem for other rules. This should be triaged by bug council.
