Bug #25083 Deleted data collection item definitions not removed from dc_known_items
Submitted: 14 Dec 2006 22:31 Modified: 5 Aug 2008 17:52
Reporter: Andy Bang Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Server Severity:S2 (Serious)
Version:1.0.0 OS:Windows (Windows XP)
Assigned to: Darren Oldag CPU Architecture:Any

[14 Dec 2006 22:31] Andy Bang
Description:
If you delete a DC item definition in an agent's items-mysql-network.xml file, the item still appears in merlin.dc_known_items on the Merlin server.  So if you include that DC item in a rule and schedule it against that agent, you do NOT get a warning that the item isn't known; instead you get an error message and stack trace in the AgentTasks log.

I'm not sure this is an agent bug or a server bug.  If the agent correctly reports known items on a restart, this is a server bug.

I'm not sure about priority.  I doubt we'll be deleting DC items from the XML files, but customers who create their own custom DCs might.

How to repeat:
1) It's easiest to start from a fresh system.
2) Start one agent.
3) Review what items are known for that agent:

select i.category, i.attrib from dc_items i, dc_known_items k where i.item_id = k.item_id and k.agent_id = 1 order by category, attrib;

4) Remove an item from items-mysql-network.xml in the agent directory.
5) Restart the agent.
6) Review what items are known for that agent again.  You'll see the same list as in step #3 above.  In other words, the Merlin server still thinks the agent knows about the item you just removed.
7) Run a rule against that agent that uses the DC item you just deleted.  You will NOT get a message that the item isn't known -- this is wrong.
8) Check out the agent's log file.  You'll see the following error message:

2006-12-14 14:03:16: (critical) exception received from server: E0001: Internal Error: java.lang.NullPointerException

This message will keep repeating in the log based on the frequency at which you scheduled the rule.

9) Look at the AgentTasks log file on the Merlin server.  You'll see the following (note the "attrib is unknown" error message):

Warning 12/14/2006 2:03 PM <doc><agentId>1</agentId><agentUtc>2006-12-14T22:03:16.156Z</agentUtc><hostname>Agent1</hostname><uuid>ae45424f-adbe-452a-9abd-3bac33dc13e1</uuid><version>1.0.0</version><shutdown>false</shutdown><tasks><task><taskId>43</taskId><command>collect_data</command><utc>2006-12-14T22:03:15.156Z</utc><data><exceptions><error>attrib is unknown: table_collation</error></exceptions></data></task> </tasks></doc> java.lang.NullPointerException at com.mysql.merlin.server.collect.DCService.processLastKnownCollectedValue(DCService.java:248) at com.mysql.merlin.server.collect.DCService.access$100(DCService.java:51) at com.mysql.merlin.server.collect.DCService$2.execute(DCService.java:125) at com.mysql.util.jdbctemplate.ActionExecutor.execute(ActionExecutor.java:56) at com.mysql.merlin.server.db.GeneralOperations.execute(GeneralOperations.java:92) at com.mysql.merlin.server.collect.DCService.collectData(DCService.java:133) at com.mysql.merlin.server.event.AgentBulkCollectDataEvent.performAction(AgentBulkCollectDataEvent.java:20) at com.mysql.merlin.server.collect.DCService.processEvent(DCService.java:111) at com.mysql.merlin.server.event.SynchronousEventDispatcher.postEvent(SynchronousEventDispatcher.java:5) at com.mysql.merlin.server.event.AbstractEventDispatcher.postEvent(AbstractEventDispatcher.java:30) at com.mysql.merlin.server.agent.AgentService.processRequest(AgentService.java:291) at com.mysql.merlin.server.agent.AgentService.heartbeat(AgentService.java:151) at com.mysql.merlin.server.agent.HeartBeatCommandProcessor.processRequest(HeartBeatCommandProcessor.java:72) at com.mysql.merlin.server.MerlinServlet.processRequest(MerlinServlet.java:120) at com.mysql.merlin.server.MerlinServlet.doCommon(MerlinServlet.java:92) at com.mysql.merlin.server.MerlinServlet.doPost(MerlinServlet.java:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:524) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595)  

Suggested fix:
When a DC item is removed from one of the agent's items-xxx.xml files, it should be deleted from the dc_kwown_items table for that agent, too, so you can't schedule a rule that uses it.
[15 Dec 2006 19:54] Darren Oldag
We actually have all sorts of problems with inventory where we only add, and never delete.  The same is true here.  When an agent lists known items, the server simply takes the union of the existing items and the new items.

It is a server bug that needs to be fixed.  Please prioritize accordingly.
[22 Feb 2007 17:07] Bill Weber
Also, see duplicate bug http://bugs.mysql.com/bug.php?id=26547
[29 Mar 2008 22:42] Darren Oldag
this was actually fixed in 1.2 (in order to handle slave emancipation when slave status items are removed).  but you can choose to verify it there, or in 2.0.