Bug #42086 merlin reports agent down even though it is working
Submitted: 13 Jan 2009 15:31 Modified: 6 Mar 2009 20:24
Reporter: Simon Mudd (OCA) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S3 (Non-critical)
Version:AG=2.0.0.7111 / SRV=2.0.1.7125 OS:Any
Assigned to: Jan Kneschke CPU Architecture:Any

[13 Jan 2009 15:31] Simon Mudd
Description:
The merlin front-end shows a server as being down. When I check it is not.
If I restart the agent the problem disappears. While that's fine there must be something wrong with either the agent or the merlin server.

How to repeat:
Login to merlin- Manage servers
- look for a server crossed out.
- restart the agent
- the status goes back to normal.

Suggested fix:
Fix agent or merlin server so that if the server is reachable merlin doesn't report it isn't.
[13 Jan 2009 15:35] Simon Mudd
Perhaps I should be clearer. I think the crossed out entry shows that the MYSQL server is considered down. That is what is wrong. The box is a production server and has been up for some time....
[13 Jan 2009 15:48] Simon Mudd
# mysql -e "show status like 'uptime'"
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| Uptime        | 350617 | 
+---------------+--------+
#

That's just over 4 days. So it looks like the checking or updating of the mysqld server connection state after a disconnection is what causes the problem somehow.
[14 Jan 2009 19:21] Gary Whizin
Agent log shows some sort of connection error with mysql (Dashboard is actually correctly reporting that the agent cannot connect).

Error 111 is connection refused, MySQL Error 2003 is very low level.

Can you connect to mysql from commandline using the exact same options as specified in the agent's ini file?
[28 Jan 2009 20:37] Mark Leith
The problem here seems to be transient connection problems with the mysqld process. What we really need to know if is there are any other connectivity problems to the mysqld process (for instance from the application, from a command line client, etc.) at the time that the agent also gets the 111 error.
[29 Jan 2009 6:43] Simon Mudd
Hi Mark,

Yes, I understand the problem is transient, as I say it happens when the host is rebooted. The agent process starts before mysqld is started, and thus finds it does not respond initially. Once mysqld is started the client applications come in making requests. We don't have problems from that side.

Yet the agent seems to consistently initially find that mysqld is down and "appears" to not try again, or at least not update the initial state.

We are restarting a couple of boxes this morning using 2.0.4.7139 agents so it should be interesting to see if the agents on both machines report the same problem again. If nothing has been fixed then I expect the same behaviour.
[6 Mar 2009 20:24] Mark Matthews
Duplicate of Bug#42581