Bug #45471 retry connecting to a lost MEM mysqld
Submitted: 12 Jun 2009 15:11 Modified: 28 Jul 2009 13:38
Reporter: Lig Isler-Turmelle Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Server Severity:S4 (Feature request)
Version:2.1 OS:Any
Assigned to: Sloan Childers CPU Architecture:Any
Tags: windmill

[12 Jun 2009 15:11] Lig Isler-Turmelle
Description:
Currently if the MEM mysqld backend is down (and no proxy is being used), tomcat will retry to access the mysqld for a while (50 times or > 180 seconds whichever comes first).

We think it should take longer. For example, we have a network outage (say a failed firewall).  This breaks connectivity between the MEM server and it's mysqld thus shutting tomat (and hence ALL front-end reporting) down.  

We would like tomcat to keep trying rather then shut down. Even if after the first "rush" to reconnect, you then only try once every 30 seconds or so... being sure to log the problem of course.

How to repeat:
shut down MEM's mysqld leaving the rest up and running.

Suggested fix:
let MEM have more time to reconnect to the database.
[12 Jun 2009 15:31] Simon Mudd
This keep retrying is to make the system more resilient in the event of an unexpected failure. If mysql goes away then the DBA only has to worry about getting it up again, and does not have to check that merlin is still running. (He would expect it to do as well as possible under the circumstances but to recover once the database is reachable again)

The comment about the delayed retry interval is to avoid tomcat generating a "connect storm" which if the database does not reside locally on the same host could be something you want to avoid.
[12 Jun 2009 19:01] Mark Matthews
We'll make the timeout(s) configurable, because it can't be "forever" as the application will spool data or run out of resources eventually, and then die, however different deployments will of course have different constraints.

Notice that the retry code currently is for every time a connection is pulled from the connection pool, so this isn't just a startup condition. If mysqld is gone, and retries fail, then the calling thread (agent, or UI), will get an exception, and be able to retry. Making the timeout
[12 Jun 2009 19:03] Mark Matthews
(last comment got truncated)

We'll make the timeout(s) configurable, because it can't be "forever" as the application will spool data or run out of resources eventually, and then die, however different deployments will of course have different constraints.

Notice that the retry code currently is for every time a connection is pulled from the connection pool, so this isn't just a startup condition. If mysqld is gone, and retries fail, then the calling thread (agent, or UI), will get an exception, and be able to retry. Making the timeout configurable will keep the application from keeling over with out-of-memory or thread errors, while being flexible to those who can trade memory for longer timeouts.
[15 Jun 2009 19:45] Gary Whizin
Could make it user-configurable, but would require some tricky testing to see how long DB could be unreachable before hitting some tipping point (would not be safe to retry indefinitely). Will consider implementing & documenting as a "user beware!" setting.
[15 Jun 2009 21:55] Sloan Childers
BUG#45471 retry connecting to a lost MEM mysqld
- make the db connect retries and timeout configurable via config.properties
- mysql.max_connect_retries
- mysql.max_connect_timeout_msec
- the default retry count is currently 50 (unchanged)
- the default retry timeout is currently 180 seconds (unchanged)
- the way this works is whichever runs out first... number of retries      
OR number of msecs attempting retries
[18 Jun 2009 2:32] Keith Russell
Patch installed in versions => 2.1.0.1063.
[28 Jul 2009 13:38] Tony Bedford
An entry has been added to the 2.1.0 changelog:

If the Service Manager lost connection to the repository server, it would shut down after 50 attempts to reconnect or if it was unable to reconnect within 180 seconds. This behavior has now been made configurable through parameters in the config.properties file. The parameters are:

* mysql.max_connect_retries - default is 50.

* mysql.max_connect_timeout_msec - default is 180 seconds.