Bug #45376 MEM sends success trap after an event is already in state=success, when closed
Submitted: 8 Jun 2009 14:05 Modified: 28 Jul 2009 16:20
Reporter: Shannon Wade Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Server Severity:S4 (Feature request)
Version:2.0.5.7153 OS:Any
Assigned to: Sloan Childers CPU Architecture:Any

[8 Jun 2009 14:05] Shannon Wade
Description:
When a rule is evaluated and deemed critical (possibly others as well), it sends a critical snmp trap. When that rule is re-eval'd and cleared, it sends a success trap. However when that previously critical event, now 'success' is closed, it sends another 'success' trap. 

For dba's admins,etc this results in excess and confusing paging,etc by their upstream trap monitor.

How to repeat:
1- Generate a critical event (stopping MySQL for instance) and wait for 'critical' trap.
2- Allow critical event to clear (start MySQL backup for instance) and wait for 'success' trap.
3- Close the previously mentioned 'critical' event.

Below is my test, I checked this behavior several times. The 2nd success is not from normal eval as traps are not sent for simply success states, only when there is a change in state.

I bring MySQL down, get critical alert

trap:
d2650 UDP: [127.0.0.1]:57560 SNMPv2-SMI::mib-2.1.3.0 = 1:23:41:25.83, SNMPv2-SMI::snmpModules.1.1.4.1.0 = MYSQLTRAP-MIB::advisorTrap, MYSQLTRAP-MIB::myServerName = "test", MYSQLTRAP-MIB::myRuleAlarmLevel = "critical", MYSQLTRAP-MIB::myRuleCategory = "Heat Chart", MYSQLTRAP-MIB::myRuleAlarmTime = "2009-06-08T13:11:01.689Z", MYSQLTRAP-MIB::myRuleName = "MySQL Server Not Reachable", MYSQLTRAP-MIB::myRuleDesc = "To perform useful work, it must be possible to connect to the local MySQL database server. If the MySQL Enterprise Service Agent cannot communicate with the server, it is likely the server is not running.", MYSQLTRAP-MIB::myRuleAdvice = "Investigate why the agent cannot communicate with the local MySQL database server on test. Ensure that the MySQL server is running. If necessary, restart the server. Ensure that the agent has the correct login credentials (login and password). If necessary, change the login credentials in the agent-instance.ini file associated with the test instance or create an account with the specified login credentials in the MySQL server. See the Advanced Agent Configuration section of the documentation for more information on agent configuration files., = , The = last error message reported by the server is: MySQL server has gone away.", MYSQLTRAP-MIB::myRuleCommand = "N/A", MYSQLTRAP-MIB::myRuleInfo = "N/A", MYSQLTRAP-MIB::myRuleExpression = "(%server.reachable% == THRESHOLD)", MYSQLTRAP-MIB::myRuleEvalExpression = "(0 == 0)", MYSQLTRAP-MIB::myRuleCopyright = "Copyright (c) 2005-2008 MySQL AB, 2008-2009 Sun Microsystems, Inc. All rights reserved.", SNMPv2-SMI::snmpModules.18.1.3.0 = 127.0.0.1, SNMPv2-SMI::snmpModules.18.1.4.0 = "public", SNMPv2-SMI::snmpModules.1.1.4.3.0 = MYSQLTRAP-MIB::monitor

I bring mysql up, when rule eval'd i get a success state...so previous critical has cleared.

trap:
d2650 UDP: [127.0.0.1]:57560 SNMPv2-SMI::mib-2.1.3.0 = 1:23:41:32.43, SNMPv2-SMI::snmpModules.1.1.4.1.0 = MYSQLTRAP-MIB::advisorTrap, MYSQLTRAP-MIB::myServerName = "test", MYSQLTRAP-MIB::myRuleAlarmLevel = "success", MYSQLTRAP-MIB::myRuleCategory = "Heat Chart", MYSQLTRAP-MIB::myRuleAlarmTime = "2009-06-08T13:12:07.685Z", MYSQLTRAP-MIB::myRuleName = "MySQL Server Not Reachable", MYSQLTRAP-MIB::myRuleDesc = "To perform useful work, it must be possible to connect to the local MySQL database server. If the MySQL Enterprise Service Agent cannot communicate with the server, it is likely the server is not running.", MYSQLTRAP-MIB::myRuleAdvice = "Investigate why the agent cannot communicate with the local MySQL database server on test. Ensure that the MySQL server is running. If necessary, restart the server. Ensure that the agent has the correct login credentials (login and password). If necessary, change the login credentials in the agent-instance.ini file associated with the test instance or create an account with the specified login credentials in the MySQL server. See the Advanced Agent Configuration section of the documentation for more information on agent configuration files., = , The = last error message reported by the server is: MySQL server has gone away.", MYSQLTRAP-MIB::myRuleCommand = "N/A", MYSQLTRAP-MIB::myRuleInfo = "N/A", MYSQLTRAP-MIB::myRuleExpression = "(%server.reachable% == THRESHOLD)", MYSQLTRAP-MIB::myRuleEvalExpression = "(1 == 0)", MYSQLTRAP-MIB::myRuleCopyright = "Copyright (c) 2005-2008 MySQL AB, 2008-2009 Sun Microsystems, Inc. All rights reserved.", SNMPv2-SMI::snmpModules.18.1.3.0 = 127.0.0.1, SNMPv2-SMI::snmpModules.18.1.4.0 = "public", SNMPv2-SMI::snmpModules.1.1.4.3.0 = MYSQLTRAP-MIB::monitor

I now close this previous critical now success trap, and get another success trap so...this results in more paging,etc to dba,etc

trap:
d2650 UDP: [127.0.0.1]:57560 SNMPv2-SMI::mib-2.1.3.0 = 1:23:41:43.84, SNMPv2-SMI::snmpModules.1.1.4.1.0 = MYSQLTRAP-MIB::advisorTrap, MYSQLTRAP-MIB::myServerName = "test", MYSQLTRAP-MIB::myRuleAlarmLevel = "success", MYSQLTRAP-MIB::myRuleCategory = "Heat Chart", MYSQLTRAP-MIB::myRuleAlarmTime = "2009-06-08T13:14:01.731Z", MYSQLTRAP-MIB::myRuleName = "MySQL Server Not Reachable", MYSQLTRAP-MIB::myRuleDesc = "To perform useful work, it must be possible to connect to the local MySQL database server. If the MySQL Enterprise Service Agent cannot communicate with the server, it is likely the server is not running.", MYSQLTRAP-MIB::myRuleAdvice = "Investigate why the agent cannot communicate with the local MySQL database server on test. Ensure that the MySQL server is running. If necessary, restart the server. Ensure that the agent has the correct login credentials (login and password). If necessary, change the login credentials in the agent-instance.ini file associated with the test instance or create an account with the specified login credentials in the MySQL server. See the Advanced Agent Configuration section of the documentation for more information on agent configuration files., = , The = last error message reported by the server is: MySQL server has gone away.", MYSQLTRAP-MIB::myRuleCommand = "N/A", MYSQLTRAP-MIB::myRuleInfo = "N/A", MYSQLTRAP-MIB::myRuleExpression = "(%server.reachable% == THRESHOLD)", MYSQLTRAP-MIB::myRuleEvalExpression = "(1 == 0)", MYSQLTRAP-MIB::myRuleCopyright = "Copyright (c) 2005-2008 MySQL AB, 2008-2009 Sun Microsystems, Inc. All rights reserved.", SNMPv2-SMI::snmpModules.18.1.3.0 = 127.0.0.1, SNMPv2-SMI::snmpModules.18.1.4.0 = "public", SNMPv2-SMI::snmpModules.1.1.4.3.0 = MYSQLTRAP-MIB::monitor

In my testing these are the last 3 events from my trap log as well so nothing seemed to have come in to change that state between those two success states.

Suggested fix:
Prevent MEM from sending out this extra success trap.
[8 Jun 2009 15:36] Sloan Childers
Here are the event states:

unknown (state before rule has run)
failure (internal error when running the rule)
closed (user initiated state)
success
info
warning
critical

For any particular rule schedule, any time an event changes state we send an SNMP trap.  We have left event correlation to the upstream SNMP manager software.  The state before closing is not always success.
[8 Jun 2009 17:43] MySQL Verification Team
After discussion with sloan, I am changing this to a feature request. As sloan mentions closed is a state so that is working as designed, sending a trap on state change.

I am changing this to a feature request to have the option not to send a trap after it was closed only if it was in success before it was closed. This can prevent confusion (pages,etc) users get if their upstream snmp trap managers do not have an option to filter this out.
[8 Jun 2009 18:45] Sloan Childers
Need to dig into history of SNMP traps and see when/if we quit sending closed traps for alarms.
[10 Jun 2009 20:07] Sloan Childers
fixed 2.1 to send an SNMP trap for the 'closed' state, as originally designed
[15 Jun 2009 22:36] Keith Russell
Patch installed in versions => 2.1.0.1062.
[16 Jun 2009 18:56] Bill Weber
verified build 2.1.0.1062 sends trap for closed event
[28 Jul 2009 16:20] Tony Bedford
An entry was added to the 2.1.0 changelog:

When an event was closed a “SUCCESS” SNMP trap was sent, rather than a “CLOSED” SNMP trap.