Bug #53090 When mysqld is hung, agent does not report a problem
Submitted: 22 Apr 2010 22:39 Modified: 14 Jun 2010 9:34
Reporter: Adam Dixon Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S3 (Non-critical)
Version:2.1.1.1141, 2.2 OS:Any
Assigned to: Michael Schuster CPU Architecture:Any

[22 Apr 2010 22:39] Adam Dixon
Description:
To test how an agent handles a mysqld hang - where other parts of the system work fine, I used kill -SIGSTOP <pid> to hang mysqld process. Expecting the agent to indicate to the monitor 'something' was wrong. This never happens. The monitor reports green/green, nothing in log (even in debug level) of a problem.

When using sigstop like this tring to connect just hangs (as expected) and using mysqladmin ping just hangs like any other client.

How to repeat:
Standard setup of MEM,
kill -SIGSTOP `cat /path/to/your/mysqld.pid`

Wait a few minutes and find nothing has changed in the monitor, or logs of a problem, even though the mysqld is hung, processing nothing and quite unhealth.

To continue the mysqld;
kill -SIGCONT `cat /path/to/your/mysqld.pid`

Suggested fix:
Some other health check should govern the status of the monitored mysqld. Since in this case the mysqld is hung and the mysql collect queries never return it hangs in green/green forever.
[23 Apr 2010 1:03] Enterprise Tools JIRA Robot
Diego Medina writes: 
Verified as described
[27 Apr 2010 15:57] Enterprise Tools JIRA Robot
Jan Kneschke writes: 
reduce the mysql connection's timeout settings for read/write to a lower value would be one way.
[3 May 2010 18:19] Enterprise Tools JIRA Robot
Mark Matthews writes: 
Consider fixing in 3.0, as this seems too big of a change in behavior to land in a maint release.
[7 Jun 2010 23:42] Enterprise Tools JIRA Robot
Andy Bang writes: 
In build 2.2.2.1722.
[10 Jun 2010 15:49] Enterprise Tools JIRA Robot
Diego Medina writes: 
Verified fixed on 2.2.2.1722

On the logs you now see:

2010-06-10 11:43:42: (critical) agent_mysqld.c:716: successfully connected to database at 127.0.0.1:5132 as user msandbox (with password: YES)
2010-06-10 11:46:41: (critical) agent_mysqld.c:695: agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 5132, socket = ''): Can't connect to MySQL server on '127.0.0.1' (4) (mysql-errno = 2003)
2010-06-10 11:48:11: (critical) last message repeated 1 times
2010-06-10 11:48:11: (critical) agent_mysqld.c:695: agent connecting to mysql-server failed: mysql_real_connect(host = '127.0.0.1', port = 5132, socket = ''): Can't connect to MySQL server on '127.0.0.1' (4) (mysql-errno = 2003)

And the dashboard shows the mysql server as down.
[14 Jun 2010 9:34] MC Brown
A note has been added to the changelog for 2.2.2: 

        When <command>mysqld</command> is in a hung state and                                                                            
        unresponsive, &merlin_agent; and &merlin_server; may not                                                                         
        report the server as unavailable.