Bug #5020 autoReconnect / autoReconnectForPools broken since 3.0.11
Submitted: 12 Aug 2004 15:59 Modified: 2 Aug 2006 8:42
Reporter: Klaus Halfmann Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Documentation Severity:S1 (Critical)
Version:3.0.14-production OS:Windows (Windows XP SP2)
Assigned to: Mark Matthews CPU Architecture:Any

[12 Aug 2004 15:59] Klaus Halfmann
Description:
Our Systems run 24 Hours a Day, but at night nobody connects, 
so the Socket Connections time out. At morning the first attempts
to do anything (e.g. login) fail until we are able to close() and
create..() the Connections again. 

This did not happen until after 3.0.11

How to repeat:
Connection reconnectableConn = DriverManager.getConnection(
        "jdbc:mysql:///test?autoReconnect=true");

rs = reconnectableConn.createStatement().executeQuery("SELECT 1");
assertTrue  (rs.next());
assertEquals(1, rs.getInt(1));

restartMySQL();  // Actually: NET STOP MySQL; NET START MySQL;

rs = reconnectableConn.createStatement().executeQuery("SELECT 1");
assertTrue  (rs.next());
assertEquals(1, rs.getInt(1));

...

java.sql.SQLException: Communication link failure: 
java.net.SocketException, underlying cause: 
Software caused connection abort: recv failed

** BEGIN NESTED EXCEPTION ** 

java.net.SocketException
MESSAGE: Software caused connection abort: recv failed

STACKTRACE:

java.net.SocketException: Software caused connection abort: recv failed
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:183)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:222)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:277)
	at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1385)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:1532)
	...
** END NESTED EXCEPTION **
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:1707)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:1923)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1163)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1272)
	at com.mysql.jdbc.MysqlIO.sqlQuery(MysqlIO.java:1218)
	at com.mysql.jdbc.Connection.execSQL(Connection.java:2233)
	at com.mysql.jdbc.Connection.execSQL(Connection.java:2181)
	at com.mysql.jdbc.Statement.executeQuery(Statement.java:1159)
	 

Suggested fix:
Connections should properly reconnect when the Socket times out
or the connection is lost some other way.

Please contact me for a complete TestSuite based on your Regression Tests.
[12 Aug 2004 16:06] Klaus Halfmann
I Uploaded the files for the Tescases, no need to contact me any more.
(Well contact me for a solution ;-)
[12 Aug 2004 16:46] Mark Matthews
Note: Autoreconnect functionality will be depcreated and eventually removed in future releases. 

The reason this isn't working for your particular case is that the methodolgy for autoreconnect was changed to be safer after 3.0.11, and is related to autoCommit state, which will also cause the current 'in-flight' transaction to fail (if you attempt your transaction again _after_ the failure, the driver will reconnect). Please see the docs for the explanation on how to correctly use this feature in the 'Troubleshooting' section.

In any case, there is no 100% safe way that a JDBC driver can re-connect automatically if a TCP/IP connection dies without risking corruption of the database 'state' (even _with_ transactional semantics), which is why this feature will eventually be removed.

The JDBC spec does not specify that a connection is alive no matter what happens to the underlying network for this very reason. 

Clients of JDBC drivers are responsible for dealing with network failures, as only the application itself (really the developer of the application) 'knows' what the 'correct' response to a transaction failing due to the network going down is. 'Wait_timeout' expiring on the server is basically a 'forced' network failure by the server. You can correct this in a non-robust way by setting 'wait_timeout' higher, however, you as a developer should be handling SQL exceptions in your code and taking appropriate recovery actions, not just passing them up the call stack.

Connection errors aways have a SQLState class of '08'. If you detect this, you can get another connection and retry the transaction (if it is appropriate).

If this is impractical for whatever reason, configure your connection pool to test for connection 'liveness' and discard connections that are idle longer than whatever you've set 'wait_timeout' to on the server (all modern connection pools can do this, but the configuration of them depends on the pool).
[1 Jun 2005 13:01] Kirk Wylie
If the argument is that there might be a transaction going, then a pretty simple case can be made:
- If the connection is in autoCommit mode, it's safe to autoReconnect.
- If the connection is not in autoCommit mode, but there is no open transaction, it's safe to autoReconnect (since the connection may be pooled).
- If the connection is not in autoCommit mode, and there is an open transaction, then throw an exception.

This would resolve the concerns which you have with non-autoCommit mode AND preserve application functionality.

However, I find it a little disturbing that this was changed and there's no notice in the changelogs that would indicate that this behavior has changed (I just checked since I was having the same problem). Since this change was likely to break many applications, it probably should have been added to the CHANGES file.
[7 Feb 2006 16:48] Mark Matthews
> If the argument is that there might be a transaction going, then a pretty simple
> case can be made:
> - If the connection is in autoCommit mode, it's safe to autoReconnect.

No, it's not. You don't know what happened to the in-flight statement.

> - If the connection is not in autoCommit mode, but there is no open transaction,
> it's safe to autoReconnect (since the connection may be pooled).

It's not always possible to determine if there's an open transaction.

> - If the connection is not in autoCommit mode, and there is an open transaction,
> then throw an exception.
> 
> This would resolve the concerns which you have with non-autoCommit mode AND
> preserve application functionality.
> 
> However, I find it a little disturbing that this was changed and there's no
> notice in the changelogs that would indicate that this behavior has changed (I
> just checked since I was having the same problem). Since this change was likely
> to break many applications, it probably should have been added to the CHANGES
> file.

The functionality wasn't removed in 3.0 or 3.1, so if it's not working it's not a bug, however we weren't able to reproduce the issue, and we have test coverage on the autoReconnect functionality. Each change in the _behavior_ of this functionality is in the CHANGES file as far as I'm aware , but it appears there's a TYPO, which we'll correct, where we state that in _3.0.2_ this behavior changed, and it requires autoCommit(FALSE), but it should be autoCommit(TRUE).

In any case, with the information that a JDBC _driver_ has available to it, it is really difficult (and almost impossible) to do auto-reconnect safely, because it doesn't have full visibility into the lifecycle of calls into it, or the intent of your application. 

If your application really needs a non-stale connection, then your best option is to configure your pool correctly, i.e. don't let connections sit idle longer than 'wait_timeout' on the MySQL server, and have the pool test connections before handing them out.
[29 Jul 2006 5:42] MC Brown
Changed category to Connector/J.
[2 Aug 2006 8:42] MC Brown
A note on the potential issues with autoReconnect, and advice not to use it, has
been added to the Connector/J documentation.