Description:
When a rule is evaluated and deemed critical (possibly others as well), it sends a critical snmp trap. When that rule is re-eval'd and cleared, it sends a success trap. However when that previously critical event, now 'success' is closed, it sends another 'success' trap.
For dba's admins,etc this results in excess and confusing paging,etc by their upstream trap monitor.
How to repeat:
1- Generate a critical event (stopping MySQL for instance) and wait for 'critical' trap.
2- Allow critical event to clear (start MySQL backup for instance) and wait for 'success' trap.
3- Close the previously mentioned 'critical' event.
Below is my test, I checked this behavior several times. The 2nd success is not from normal eval as traps are not sent for simply success states, only when there is a change in state.
I bring MySQL down, get critical alert
trap:
d2650 UDP: [127.0.0.1]:57560 SNMPv2-SMI::mib-2.1.3.0 = 1:23:41:25.83, SNMPv2-SMI::snmpModules.1.1.4.1.0 = MYSQLTRAP-MIB::advisorTrap, MYSQLTRAP-MIB::myServerName = "test", MYSQLTRAP-MIB::myRuleAlarmLevel = "critical", MYSQLTRAP-MIB::myRuleCategory = "Heat Chart", MYSQLTRAP-MIB::myRuleAlarmTime = "2009-06-08T13:11:01.689Z", MYSQLTRAP-MIB::myRuleName = "MySQL Server Not Reachable", MYSQLTRAP-MIB::myRuleDesc = "To perform useful work, it must be possible to connect to the local MySQL database server. If the MySQL Enterprise Service Agent cannot communicate with the server, it is likely the server is not running.", MYSQLTRAP-MIB::myRuleAdvice = "Investigate why the agent cannot communicate with the local MySQL database server on test. Ensure that the MySQL server is running. If necessary, restart the server. Ensure that the agent has the correct login credentials (login and password). If necessary, change the login credentials in the agent-instance.ini file associated with the test instance or create an account with the specified login credentials in the MySQL server. See the Advanced Agent Configuration section of the documentation for more information on agent configuration files., = , The = last error message reported by the server is: MySQL server has gone away.", MYSQLTRAP-MIB::myRuleCommand = "N/A", MYSQLTRAP-MIB::myRuleInfo = "N/A", MYSQLTRAP-MIB::myRuleExpression = "(%server.reachable% == THRESHOLD)", MYSQLTRAP-MIB::myRuleEvalExpression = "(0 == 0)", MYSQLTRAP-MIB::myRuleCopyright = "Copyright (c) 2005-2008 MySQL AB, 2008-2009 Sun Microsystems, Inc. All rights reserved.", SNMPv2-SMI::snmpModules.18.1.3.0 = 127.0.0.1, SNMPv2-SMI::snmpModules.18.1.4.0 = "public", SNMPv2-SMI::snmpModules.1.1.4.3.0 = MYSQLTRAP-MIB::monitor
I bring mysql up, when rule eval'd i get a success state...so previous critical has cleared.
trap:
d2650 UDP: [127.0.0.1]:57560 SNMPv2-SMI::mib-2.1.3.0 = 1:23:41:32.43, SNMPv2-SMI::snmpModules.1.1.4.1.0 = MYSQLTRAP-MIB::advisorTrap, MYSQLTRAP-MIB::myServerName = "test", MYSQLTRAP-MIB::myRuleAlarmLevel = "success", MYSQLTRAP-MIB::myRuleCategory = "Heat Chart", MYSQLTRAP-MIB::myRuleAlarmTime = "2009-06-08T13:12:07.685Z", MYSQLTRAP-MIB::myRuleName = "MySQL Server Not Reachable", MYSQLTRAP-MIB::myRuleDesc = "To perform useful work, it must be possible to connect to the local MySQL database server. If the MySQL Enterprise Service Agent cannot communicate with the server, it is likely the server is not running.", MYSQLTRAP-MIB::myRuleAdvice = "Investigate why the agent cannot communicate with the local MySQL database server on test. Ensure that the MySQL server is running. If necessary, restart the server. Ensure that the agent has the correct login credentials (login and password). If necessary, change the login credentials in the agent-instance.ini file associated with the test instance or create an account with the specified login credentials in the MySQL server. See the Advanced Agent Configuration section of the documentation for more information on agent configuration files., = , The = last error message reported by the server is: MySQL server has gone away.", MYSQLTRAP-MIB::myRuleCommand = "N/A", MYSQLTRAP-MIB::myRuleInfo = "N/A", MYSQLTRAP-MIB::myRuleExpression = "(%server.reachable% == THRESHOLD)", MYSQLTRAP-MIB::myRuleEvalExpression = "(1 == 0)", MYSQLTRAP-MIB::myRuleCopyright = "Copyright (c) 2005-2008 MySQL AB, 2008-2009 Sun Microsystems, Inc. All rights reserved.", SNMPv2-SMI::snmpModules.18.1.3.0 = 127.0.0.1, SNMPv2-SMI::snmpModules.18.1.4.0 = "public", SNMPv2-SMI::snmpModules.1.1.4.3.0 = MYSQLTRAP-MIB::monitor
I now close this previous critical now success trap, and get another success trap so...this results in more paging,etc to dba,etc
trap:
d2650 UDP: [127.0.0.1]:57560 SNMPv2-SMI::mib-2.1.3.0 = 1:23:41:43.84, SNMPv2-SMI::snmpModules.1.1.4.1.0 = MYSQLTRAP-MIB::advisorTrap, MYSQLTRAP-MIB::myServerName = "test", MYSQLTRAP-MIB::myRuleAlarmLevel = "success", MYSQLTRAP-MIB::myRuleCategory = "Heat Chart", MYSQLTRAP-MIB::myRuleAlarmTime = "2009-06-08T13:14:01.731Z", MYSQLTRAP-MIB::myRuleName = "MySQL Server Not Reachable", MYSQLTRAP-MIB::myRuleDesc = "To perform useful work, it must be possible to connect to the local MySQL database server. If the MySQL Enterprise Service Agent cannot communicate with the server, it is likely the server is not running.", MYSQLTRAP-MIB::myRuleAdvice = "Investigate why the agent cannot communicate with the local MySQL database server on test. Ensure that the MySQL server is running. If necessary, restart the server. Ensure that the agent has the correct login credentials (login and password). If necessary, change the login credentials in the agent-instance.ini file associated with the test instance or create an account with the specified login credentials in the MySQL server. See the Advanced Agent Configuration section of the documentation for more information on agent configuration files., = , The = last error message reported by the server is: MySQL server has gone away.", MYSQLTRAP-MIB::myRuleCommand = "N/A", MYSQLTRAP-MIB::myRuleInfo = "N/A", MYSQLTRAP-MIB::myRuleExpression = "(%server.reachable% == THRESHOLD)", MYSQLTRAP-MIB::myRuleEvalExpression = "(1 == 0)", MYSQLTRAP-MIB::myRuleCopyright = "Copyright (c) 2005-2008 MySQL AB, 2008-2009 Sun Microsystems, Inc. All rights reserved.", SNMPv2-SMI::snmpModules.18.1.3.0 = 127.0.0.1, SNMPv2-SMI::snmpModules.18.1.4.0 = "public", SNMPv2-SMI::snmpModules.1.1.4.3.0 = MYSQLTRAP-MIB::monitor
In my testing these are the last 3 events from my trap log as well so nothing seemed to have come in to change that state between those two success states.
Suggested fix:
Prevent MEM from sending out this extra success trap.