Bug #46401 Feature Request: Display values sent from Agent when creating custom rules
Submitted: 27 Jul 2009 12:21 Modified: 7 Jan 2010 18:25
Reporter: Leandro Morgado Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Enterprise Monitor Severity:S4 (Feature request)
Version:2.0.5.7153 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: windmill

[27 Jul 2009 12:21] Leandro Morgado
Description:
When creating custom monitoring rules in MEM one needs to define a valid and appropriate THRESHOLD to trigger the Critical, Warning and Info notices/actions. 

Currently, the server/web interface does not give any indication of values being monitored and sent by the agent (these are visible in the agent's log file when log-level=debug). 

It would be *very* useful if there was some kind of "TEST" button that displays the monitored values as they are received on the web interface. Eg:

2009-07-27 12:00:00 page_swap_in 0
2009-07-27 12:05:00 page_swap_in 3
2009-07-27 12:10:00 page_swap_in 36

This would help tremendously when defining THRESHOLD. The point is to show the value to the user in the same way that the server uses them when comparing against the THRESHOLD.

For bonus points, historical average, maximum, minimum, etc. would help to fine tune the THRESHOLD after the monitored server has been running for a while.  

How to repeat:
Try to create a swap_page_in custom monitor and set an appropriate THRESHOLD.

Suggested fix:
See description
[27 Jul 2009 14:28] Simon Mudd
It's also a good way to test that you are configuring and using the right value. As otherwise we need to look into the merlin internals and this would make it much easier just to get a "snapshot" value and see that it looks ok when creating custom rules or setting up graphs.
[28 Jul 2009 0:24] Andy Bang
Unless I've misunderstood something, we already do this, and also provide a way for you to improve on it.

First, when the event fires, click on it and then click on the Advanced tab.  You should see something like the following (from "Connection Usage Excessive":

Thresholds
Critical Alert = 95
Warning Alert = 85
Info Alert = 75

Frequency
00:01

Expression
(%Uptime% > 10800) && (((%Threads_connected% / %max_connections%) * 100) > THRESHOLD)

Evaluated Expression
(708777 > 10800) && (((1 / 100) * 100) > 75)

The "Evaluated Expression" section shows you exactly what values it's using.  Here Uptime = 708777, Threads_connected = 1, and max_connections = 100.

If you don't see the event on the Events tab, change the Severity filter from "Alerts" to "All" and look for the Success event associated with the rule.

Second, you can also have the value printed in the Description and Advice blocks.  You can see the following as the last sentence under Advice for that same rule on the Results and Details tabs:

There are currently 1 threads connected to HPMiniTower:3306, with max_connections set to 100.

If you go to the Advisors->Manage Rules tab and click the "copy" button next to that rule, you'll see the following:

There are currently __%Threads_connected%__ threads connected to __%server.0__, with __max_connections__ set to __%max_connections%__.

Which is what creates the final string in the Advice section shown above.

Regarding "historical average, maximum, minimum, etc.", that's a good feature request, but you can also get the data today by either (1) creating a custom graph for the data items, or (2) looking in the dc_ng_long_now table and calculating what you want from the data there.  Counter variables (most SHOW STATUS items are counter that just continually increase) complicate that process  because you must take the deltas, so a graph is probably easier.
[29 Jul 2009 14:57] Leandro Morgado
Thanks for the tip Andy. It will be handy!

However, this implies that events have to first be fired, then examined individually in order to get the monitored values from the agent. This further implies setting very low THRESHOLDs and increasing them successively until you find the "right spot". Each time you want to see the evaluated expression, you are dependent on an event firing. This is a hassle and removes motivation from creating your own rules. What would be really be nice is:

1) at custom rule creation time have a "TEST" button that would display the fetched values at some INTERVAL, possibly the same as the one defined in the custom rule or a shorter one for testing purposes (waiting 10 hours for a value to be fetched is *not* my idea of fun). Sample output after pressing a "TEST" button for a swap_page_out counter style parameter would be:

2009/07/30 12:00:00 - Initializing (or could be RAW value like 300000 pages swapped so far)
2009/07/30 12:01:00 - 4 swap_page_out (this is a delta since last value)
2009/07/30 12:02:00 - 0 swap_page_out
2009/07/30 12:03:00 - 666 swap_page_out
2009/07/30 12:00:00

>"historical average, maximum, minimum, etc.", that's a good feature request, >but you can also get the data today by either 
>(1) creating a custom graph for the data items, 

Graphs are good for finding trends but can make it very hard to get precise values, especially if one "very high value" (1M) pops up that is much higher than regular values (1-100), adjusting the graph scale for higher 1M style values. Wouldn't if be much better to be able to say, give me average, max and min values for the following datetime range?

>or (2) looking in the dc_ng_long_now table and calculating what you want from the data there. 

This is not really an option as it is 1) not documented and 2) hardly user friendly. 

>Counter variables (most SHOW STATUS items are counter that just continually >increase) complicate that process  because you must take the deltas, so a >graph is probably easier.

I don't see how a counter is any harder. You simply have to collect two values before starting to calculate.

I really don't mean to nitpick. I just think having this feature would encourage the creation of custom rules by the end user.