Description:
If the agent starts up and then dies right away, it fails to remove it's pid file, causing the initscript to thing it started up correctly.
How to repeat:
Add an invalid system user to the .ini file:
shell> tail -n2 /opt/mysql/enterprise/agent/etc/mysql-monitor-agent.ini
#log-level=debug
user=foasfoassaf
Then start the agent:
/etc/init.d/mysql-monitor-agent start ; echo $?
Starting MySQL Enterprise agent service... [ OK ]
0
The initscript looks at the pid file to determine if the agent started and see it there so it's successful (and returns 0).
Upon closer inspection, we see that indeed the agent did briefly start up but died right away due to the bad "user" parameter:
2010-07-21 17:58:22: (critical) MySQL Monitor Agent 2.2.1.1717 started.
2010-07-21 17:58:22: (critical) unknown user: foasfoassaf
2010-07-21 17:58:22: (critical) mysql-monitor-agent-cli.c:545: Failure from chassis_mainloop. Shutting down.
2010-07-21 17:58:22: (critical) network-io.c:312: successfully reconnected to dashboard at http://agent:pass@localhost:18080/heartbeat
The pid file is also never removed:
shell> cat /opt/mysql/enterprise/agent/mysql-monitor-agent.pid; echo
24064
shell> ps auxf | grep $( cat /opt/mysql/enterprise/agent/mysql-monitor-agent.pid )
root 24090 0.0 0.0 61196 716 pts/2 S+ 18:13 0:00 | \_ grep 24064
Suggested fix:
1) Remove the pid file gracefully (in case of error starting up, remove it)
2) Only create the pid file *after* the .ini options file has been parsed and validated. This helps the initscripts know the true state of the agent