Bug #42040 Count of monitored instances is inconsistent
Submitted: 12 Jan 2009 9:02 Modified: 14 Jan 2009 23:07
Reporter: Simon Mudd (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Web Severity:S3 (Non-critical)
Version:2.0.1.7125 OS:Any
Assigned to: Eric Herman CPU Architecture:Any

[12 Jan 2009 9:02] Simon Mudd
Description:
This issue is known to the developers but I don't seem to have made a ticket for it before.

The number of monitored instances is shown in merlin in 2 places:

1. as a number in the servers column for the group: All Servers
2. at the bottom of the page in the format:

Monitoring XXXX instances on XXXX hosts (XXXX hosts available).

In our case these 2 numbers do not match which is a bit silly.

How to repeat:
Go on to the Enterprise Dashboard - Monitor Tab.

Look at the number of servers shown in the group All Servers.
Compare the number at the bottom with it says Monitoring XXX instances....

These 2 numbers SHOULD be the same.

Suggested fix:
Adjust the counts so the numbers are consistent.
[14 Jan 2009 12:35] Eric Herman
Developer testing shows that there is no bug with server counting and host counting.

This particular incident is a special case resulting from some past data corruption. In the past, we directed the customer to delete a server from inventory (actually I think some values were changed from "foo" to "foo.deleted") so that this half-created, half-deleted server would not prevent the application from starting.

However, the server was still be counted in the instance count.

The culprit was found using:

SELECT *
  FROM inventory_instances
 WHERE type_id=2
   AND instance_id NOT IN (
         SELECT instance_id
           FROM group_members_v2
          WHERE group_id = 1 )

Which shows us any server which is not a member of the "all servers" group.

Then we did some queries on the instance_id and the parent_id to determine that this was, in fact the bogus server of the past.

SELECT namespace,
type_name,
attribute_name,
instance_name,
dc1.value,
FROM_UNIXTIME(MAX(dc1.end_time)/1000) AS time,
instance_id,
parent_id,
dc1.instance_attribute_id
FROM inventory_instance_attributes
JOIN inventory_attributes USING (attribute_id)
JOIN inventory_types USING (type_id)
JOIN inventory_namespaces USING (namespace_id)
JOIN inventory_instances USING (instance_id)
JOIN dc_ng_string_now dc1 USING (instance_attribute_id)
JOIN (SELECT instance_attribute_id, MAX(end_time) max_end
FROM dc_ng_string_now
GROUP BY instance_attribute_id) s
ON dc1.instance_attribute_id = s.instance_attribute_id AND dc1.end_time = s.max_end
WHERE instance_id = [PUT_INSTANCE_ID_OR_PARENT_ID_HERE]
GROUP BY instance_attribute_id
ORDER BY instance_name;

And we were able to see this was, in fact the case.

Next I advised him to (A) stop tomcat (B) delete the row in inventory_instances (C) start tomcat (D) observe that the count is now correct.

Since there is a foreign key constraint, the deletion required is:

BEGIN ;
DELETE FROM inventory_instance_attributes WHERE instance_id = [INSTANCE_ID];
DELETE FROM inventory_instances WHERE instance_id = [INSTANCE_ID];
commit;

After this, the showed correctly.
[14 Jan 2009 23:07] Bill Weber
closing since this looks like it was a special case