Bug #23539 Expression evaluation stops after adding notification group to Heat Chart rule
Submitted: 23 Oct 2006 4:09 Modified: 27 Oct 2006 5:40
Reporter: Andy Bang Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Server Severity:S1 (Critical)
Version: OS:
Assigned to: Darren Oldag CPU Architecture:Any

[23 Oct 2006 4:09] Andy Bang
Description:
Evaluation of the expression for "MySQL Server Not Reachable" stops after adding email notification group (under specific circumstances).  I hope someone else can reproduce this so I don't think I'm going nuts.  It's not trivial -- follow the steps exactly.  Note that you need to monitor 2 separate instances of MySQL for this one.

How to repeat:
1) Start from scratch (clean DB, etc).
2) Start tomcat.
3) Start dashboard.
4) Do not import advisors, although I don't think this has any effect on the outcome.
5) Set up email notication.
6) Set up a notification group.
7) Turn on expression debugging with http://127.0.0.1:8080/merlin/main?command=set_log_level&log=Advisors&level=15.  I don't think this has any effect either, but helps see what's going on.
8) Start an agent.
9) In the logs you can see pairs of expression evaluations like:

Advisors  	Debug  	10/23/06 03:40  	(%server.reachable% == THRESHOLD) = (1 == 0) = 0.0
Advisors 	Debug 	10/23/06 03:40 	(%server.reachable% == 1) && (%server.connected% == THRESHOLD) = (1 == 1) && (1 == 0) = 0.0

10) Go to Advisors->Current Schedule, expand Heat Chart, expand Agent Not Communicating With Database Server, click on scheduled rule, add notification group on the Settings tab.
11) Expand MySQL Server Not Reachable, click on scheduled rule, add notification group on the Settings tab.
12) If you look in the logs you should still see that expression evaluation is happening correctly.
13) Start agent 2.
14) Go to Advisors->Current Schedule, expand Heat Chart, expand Agent Not Communicating With Database Server, click on scheduled rule for the 2nd agent, add notification group on the Settings tab.
15) Expand MySQL Server Not Reachable, click on scheduled rule for the 2nd agent, add notification group on the Settings tab.
16) Go to the logs and you'll see pairs like the following:

Advisors  	Debug  	10/23/06 03:58  	(%server.reachable% == 1) && (%server.connected% == THRESHOLD) = (0 == 1) && (0 == 0) = 0.0
Advisors 	Debug 	10/23/06 03:58 	(%server.reachable% == 1) && (%server.connected% == THRESHOLD) = (1 == 1) && (1 == 0) = 0.0

Which means it's still evaluating the Agent Not Communicating With Database Server rule for both agents.

HOWEVER, you will only see ONE entry each minute like this:

Advisors  	Debug  	10/23/06 03:58  	(%server.reachable% == THRESHOLD) = (1 == 0) = 0.0

Which means it's only evaluting one of the two scheduled MySQL Server Not Reachable rules.

17) Stop the mysqld associated with agent 2 (i.e. stop the MySQL DB server that agent 2 is monitoring).
18) You'll never get a MySQL Server Not Reachable alarm!
19) Examine the logs.  Nothing!
20) Go to Advisors->Current Schedule, and "unschedule" the MySQL Server Not Reachable for the 2nd agent.
21) Look in the Tomcat stdout log and you'll see the following exception:

com.mysql.merlin.server.monitors.MonitorException: E1302:  Schedule does not exi
st. "MySQL Server Not Reachable" "Andys Laptop 4.1:3307"
        at com.mysql.merlin.server.monitors.MonitorService.getSchedule(MonitorSe
rvice.java:1118)
        at com.mysql.merlin.server.monitors.MonitorService.getCurrentHistory(Mon
itorService.java:1103)
        at com.mysql.merlin.server.inventory.InventoryService.getServerStatus(In
ventoryService.java:109)
        at com.mysql.merlin.server.inventory.InventoryService.getServersStatus(I
nventoryService.java:133)
        at com.mysql.merlin.server.inventory.InventoryCommandProcessor.getServer
Status(InventoryCommandProcessor.java:164)
        at com.mysql.merlin.server.inventory.InventoryCommandProcessor.processRe
quest(InventoryCommandProcessor.java:86)
        at com.mysql.merlin.server.MerlinServlet.processRequest(MerlinServlet.ja
va:120)
        at com.mysql.merlin.server.MerlinServlet.doCommon(MerlinServlet.java:92)

        at com.mysql.merlin.server.MerlinServlet.doPost(MerlinServlet.java:68)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:252)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:173)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
alve.java:213)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
alve.java:178)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authentica
torBase.java:514)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
ava:126)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
ava:105)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
ve.java:107)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
a:148)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
:868)
        at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.p
rocessConnection(Http11BaseProtocol.java:663)
        at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpo
int.java:527)
        at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFol
lowerWorkerThread.java:80)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadP
ool.java:684)
        at java.lang.Thread.run(Thread.java:595)
[23 Oct 2006 4:13] Andy Bang
Note that data collection seems to be working fine because we see the following in the Merlin DB:

mysql> SELECT a.host, i.attrib, v.value, v.begin_time, v.end_time
    -> FROM inv_servers a, dc_items i, dc_integer v, dc_schedule s
    -> WHERE i.attrib LIKE 'server%'
    ->       AND s.item_id = i.item_id
    ->       AND s.collect_id = v.collect_id
    ->       AND a.server_id = s.server_id
    -> ORDER BY v.end_time;
+------------------+------------------+-------+---------------------+-----------
----------+
| host             | attrib           | value | begin_time          | end_time
          |
+------------------+------------------+-------+---------------------+-----------
----------+
| Andys Laptop 4.1 | server.reachable |     1 | 2006-10-22 20:36:41 | 2006-10-22
 20:44:00 |
| Andys Laptop 4.1 | server.connected |     1 | 2006-10-22 20:36:41 | 2006-10-22
 20:44:00 |
| Andys Laptop 5.0 | server.reachable |     1 | 2006-10-22 20:34:19 | 2006-10-22
 20:45:00 |
| Andys Laptop 5.0 | server.connected |     1 | 2006-10-22 20:34:19 | 2006-10-22
 20:45:00 |
| Andys Laptop 4.1 | server.connected |     0 | 2006-10-22 20:45:55 | 2006-10-22
 20:45:55 |
| Andys Laptop 4.1 | server.reachable |     0 | 2006-10-22 20:45:55 | 2006-10-22
 20:45:55 |
+------------------+------------------+-------+---------------------+-----------
----------+
6 rows in set (0.01 sec)

So my conclusion is that expression evaluation of one rule for the second agent has stopped for some reason.
[24 Oct 2006 21:26] Sloan Childers
Looks like the mysqld monitoring rule may be the only thing broken.  Look there first.