Description:
If you delete a DC item definition in an agent's items-mysql-network.xml file, the item still appears in merlin.dc_known_items on the Merlin server. So if you include that DC item in a rule and schedule it against that agent, you do NOT get a warning that the item isn't known; instead you get an error message and stack trace in the AgentTasks log.
I'm not sure this is an agent bug or a server bug. If the agent correctly reports known items on a restart, this is a server bug.
I'm not sure about priority. I doubt we'll be deleting DC items from the XML files, but customers who create their own custom DCs might.
How to repeat:
1) It's easiest to start from a fresh system.
2) Start one agent.
3) Review what items are known for that agent:
select i.category, i.attrib from dc_items i, dc_known_items k where i.item_id = k.item_id and k.agent_id = 1 order by category, attrib;
4) Remove an item from items-mysql-network.xml in the agent directory.
5) Restart the agent.
6) Review what items are known for that agent again. You'll see the same list as in step #3 above. In other words, the Merlin server still thinks the agent knows about the item you just removed.
7) Run a rule against that agent that uses the DC item you just deleted. You will NOT get a message that the item isn't known -- this is wrong.
8) Check out the agent's log file. You'll see the following error message:
2006-12-14 14:03:16: (critical) exception received from server: E0001: Internal Error: java.lang.NullPointerException
This message will keep repeating in the log based on the frequency at which you scheduled the rule.
9) Look at the AgentTasks log file on the Merlin server. You'll see the following (note the "attrib is unknown" error message):
Warning 12/14/2006 2:03 PM <doc><agentId>1</agentId><agentUtc>2006-12-14T22:03:16.156Z</agentUtc><hostname>Agent1</hostname><uuid>ae45424f-adbe-452a-9abd-3bac33dc13e1</uuid><version>1.0.0</version><shutdown>false</shutdown><tasks><task><taskId>43</taskId><command>collect_data</command><utc>2006-12-14T22:03:15.156Z</utc><data><exceptions><error>attrib is unknown: table_collation</error></exceptions></data></task> </tasks></doc> java.lang.NullPointerException at com.mysql.merlin.server.collect.DCService.processLastKnownCollectedValue(DCService.java:248) at com.mysql.merlin.server.collect.DCService.access$100(DCService.java:51) at com.mysql.merlin.server.collect.DCService$2.execute(DCService.java:125) at com.mysql.util.jdbctemplate.ActionExecutor.execute(ActionExecutor.java:56) at com.mysql.merlin.server.db.GeneralOperations.execute(GeneralOperations.java:92) at com.mysql.merlin.server.collect.DCService.collectData(DCService.java:133) at com.mysql.merlin.server.event.AgentBulkCollectDataEvent.performAction(AgentBulkCollectDataEvent.java:20) at com.mysql.merlin.server.collect.DCService.processEvent(DCService.java:111) at com.mysql.merlin.server.event.SynchronousEventDispatcher.postEvent(SynchronousEventDispatcher.java:5) at com.mysql.merlin.server.event.AbstractEventDispatcher.postEvent(AbstractEventDispatcher.java:30) at com.mysql.merlin.server.agent.AgentService.processRequest(AgentService.java:291) at com.mysql.merlin.server.agent.AgentService.heartbeat(AgentService.java:151) at com.mysql.merlin.server.agent.HeartBeatCommandProcessor.processRequest(HeartBeatCommandProcessor.java:72) at com.mysql.merlin.server.MerlinServlet.processRequest(MerlinServlet.java:120) at com.mysql.merlin.server.MerlinServlet.doCommon(MerlinServlet.java:92) at com.mysql.merlin.server.MerlinServlet.doPost(MerlinServlet.java:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:524) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595)
Suggested fix:
When a DC item is removed from one of the agent's items-xxx.xml files, it should be deleted from the dc_kwown_items table for that agent, too, so you can't schedule a rule that uses it.