MySQL Bugs: #75615: Incorrect implementation of Connection.setNetworkTimeout()

Bug #75615	Incorrect implementation of Connection.setNetworkTimeout()
Submitted:	24 Jan 2015 5:51	Modified:	15 May 2018 23:44
Reporter:	Brett Wooldridge	Email Updates:
Status:	Analyzing	Impact on me:	None
Category:	Connector / J	Severity:	S2 (Serious)
Version:		OS:	Any
Assigned to:	Filipe Silva	CPU Architecture:	Any

Description:
The existing implementation of Connection.setNetworkTimeout() presents a race condtion to users of the code.

Omitting some boiler-plate, the current code looks something like this:

public void setNetworkTimeout(Executor executor, final int milliseconds) throws SQLException {
   ...
   checkClosed();
   final MysqlIO mysqlIo = this.io;

   executor.execute(new Runnable() {
      public void run() {
         try {
            setSocketTimeout(milliseconds); // for re-connects
            mysqlIo.setSocketTimeout(milliseconds);
         } catch (SQLException e) {
            throw new RuntimeException(e);
         }
      });
   }
}

If the user has the following code...

connection.setNetworkTimeout(executor, 5000);
connection.close();

...Connector/J ends up throwing a runtime exception from MysqlIO.setSocketTimeout().  The implementation of MysqlIO.setSocketTimeout() looks something like this:

protected void setSocketTimeout(int milliseconds) throws SQLException {
   try {
      this.mysqlConnection.setSoTimeout(milliseconds);
   } catch (SocketException e) {
      ...
   }
}

So, why is the exception being thrown?  The Connection.setNetworkTimeout() implementaton is incorrectly (mistakenly) using the provided Executor to set the timeout.  If the user provides a ThreadPoolExecutor to Connection.setNetworkTimeout(executor, timeout), the actual setting of the timeout will occur "at some point in the future" -- or not at all if the executor queue is full and there is a DiscardPolicy.

When a user calls Connection.setNetworkTimeout(), the setting of the timeout is put into the Executor and control is returned to the user thread.  If the user then calls connection.close(), it is highly likely that the mysqlConnection member variable of MysqlIO will become null before the Executor actually fires.  When it does run, because mysqlConnection has become null, a NullPointerException is thrown in the Executor thread.

The intent of the Executor is to provide pools that require a separate thread to monitor the connection a means to implement such asynchronous monitoring.  It is not intended to be used by implementations that are capable of setting a socket-level timeout (as MySQL is capable of).

Examining other driver implementations one can see that the Executor is rarely used at all, because most drivers are capable of implementating timeouts at the socket-level, without the need to monitor the connection from a separate thread to implement timeouts.

MariaDB (line 1350):
https://fossies.org/linux/mariadb-java-client/src/main/java/org/mariadb/jdbc/MySQLConnecti...

PostgreSQL NG (line 1209):
https://github.com/impossibl/pgjdbc-ng/blob/develop/src/main/java/com/impossibl/postgres/j...

The result of the MySQL implementation is that calling setNetworkTimeout() followed by any other statement (Connection.close(), Statement.execute(), ResultSet.next(), etc.) has no guarantee as to when the network timeout will actually be applied.

The JavaDoc is extremely vague as to the Executor parameter, saying only "The Executor implementation which will be used by setNetworkTimeout".  However, to interpret this to mean that the actual setting of a socket timeout should be done on the Executor thread is clearly incorrect in that it provides non-deterministic behavior to clients.  Clients should expect that activity performed on the Connection after the setNetworkTimeout() call will be subject to that network timeout, which is not possible of there is no guarantee as to when the network timeout will go into effect (possibly immediately, possibly in several seconds, minutes, or never).

How to repeat:
Simply create a test where you call:

Executor executor = Executors.newCachedThreadPool();
connection.setNetworkTimeout(executor, 5000);
connection.close();

And observe that a NullPointerException is emitted to stdout similar to this:

java.lang.NullPointerException
at com.mysql.jdbc.MysqlIO.setSocketTimeout(MysqlIO.java:4890)
at com.mysql.jdbc.ConnectionImpl$12.run(ConnectionImpl.java:5540)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Suggested fix:
Connection.setNetworkTimeout() should ignore the provided Executor, as it is inapplicable to how MySQL manages network timeouts.  An implementation that simply does:

checkClosed();
try {
   setSocketTimeout(milliseconds); // for re-connects
   io.setSocketTimeout(milliseconds);
} catch (SQLException e) {
   throw new RuntimeException(e);
}

is correct.

The API docs leave much to be desired around this feature. As I remember the EG meeting where this was discussed, the assumption being made in this bug report isn't entirely correct either. Some clarification from the original requester of this feature is needed, before we assume what the intent was, because honestly it's not clear from reading the specification, which means that part of the specification needs work. I'll ask the next time I speak with them.

Regardless, there shouldn't be a race, so that's a bug. 

Note that you can (and have for many years) been able to set the network timeout via properties in the URL using "socketTimeout=..." and "connectTimeout=...".

Thanks for the fast feedback Mark.  A short-term fix would be to simply add a null check guard in MysqlIO.setNetworkTimeout().  But as long as you are in the code, obviously we'd prefer to see the behavior of ConnectionImple.setNetworkTimeout(executor, timeout) changed.

Re: socketTimeout and connectionTimeout, while these are "solutions" in some sense, the socketTimeout applies across the board to all activity on the socket, not just a single Connection instance.

We are actually opening this bug as part of the HikariCP connection pool project, so we have little control over how the user configures the underlying driver.  Our use-case is that during certain pool activities, such as retiring a connection, we want to guard against events like network partitioning hanging a thread indefinitely during connection.close().  The most effective way to do this is via setNetworkTimeout().

Our workaround for now is that we detect that the underlying Connection is a MySQL connection, and we use an Executor implementation that runs immediately on the calling thread, like so:

private static class SynchronousExecutor implements Executor {
   public void execute(Runnable command) {
      try {
         command.run();
      }
      catch (Throwable t) {
         LOGGER.warn("Exception executing {}", command.toString(), t);
      }
   }
}

This ensures there is no race between a call to setNetworkTimeout() and a subsequent close().  Obviously, as a general JDBC connection pool provider we would prefer not to have code specific to any particular driver, but as it stands this is our best option.  We will track this issue, and look forward to being able to remove the workaround.

Any progress on this bug.  Just checking as it has been 6 months.

Any progress on this bug?  We've passed the one year mark on an issue acknowledged as serious.  Stating that the JDBC specification is ambigous is no excuse as the use-case it addresses is clear from the JavaDoc:

| Note: This method is intended to address a rare but serious condition where network partitions can cause 
| threads issuing JDBC calls to hang uninterruptedly in socket reads, until the OS TCP-TIMEOUT (typically 10
| minutes).
| ...
| This method can be invoked more than once, such as to set a limit for an area of JDBC code, and to reset to
| the default on exit from this area. Invocation of this method has no impact on already outstanding requests.

As implemented by Connector/J setNetworkTimeout() is useless for all intents and purposes.  Intuitively, a user calling:

conn.setNetworkTimeout(1000);
PreparedStatement ps = conn.prepareStatement(...);

expects that the timeout is immediately applied, and that subsequent connection use will honor the specified timeout.  The statement "Invocation of this method has no impact on already outstanding requests" implies that the timeout will apply to as-yet unexecuted requests, i.e. subsequent requests.

As referenced in an additional comment above, no other database drivers whose source we examined used the specified Executor in any way.  The Executor itself is for extremely esoteric cases where setting a timeout at the socket-level is not possible.

Given that the current implementation does not otherwise use the executor, a dirty workaround is to pass an executor that runs the task inline:

    connection.setNetworkTimeout(Runnable::run, 123);

This is the same as directExecutor() from Guava: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/M...)

That is exactly what we are doing. Of course it means that the HikariCP connection pool has to have code to specifically detect and handle MySQL.  We would obviously prefer not having code specific to any one database.  How about MySQL actually fix the implementation?

How's this coming Filipe Silva? Would it help if I provided a patch with Brett's suggested fix?

Hi Johnathan,

Thank you for the kind offer but let it be for now, I'll try to fit it in our plans as soon as possible.

Please accept our apologies for the very long delay. It hasn't been set with a higher priority as there are a few workarounds that help dealing with the issue.

Hi Johnathan, Brett,

I'm changing setNetworkTimeout() so that it uses the executor if one is provided or sets the socket timeout directly if null is sent in place of the executor. Does it work for you?

Thank you,

Filipe,

No offense, but this doesn't really help.  It means that developers must still special-case handling for MySQL; passing null instead of the Executor.

Not only that, the JavaDoc for setNetworkTimeout() states that it should throw a SQLException:

SQLException - if a database access error occurs, this method is called on a closed connection, THE EXECUTOR IS NULL, or the value specified for seconds is less than 0.

Accepting null as a "legitmate" value would violate the API contract as documented.

I understand why MySQL would not simply ignore the Executor, as other drivers do.  The Executor is there primarily for use "in environments where no network is present"; for example embedded databases.  However, MySQL is TCP-based, and therefore can leverage the timeout support provided by the Socket class, so the use of an Executor is not necesary.

Opps, can't edit comments in bugzilla, so:

I DON'T understand why MySQL would not simply ignore the Executor, as other drivers do.  The Executor is there primarily for use "in environments where no network is present"; for example embedded databases.  However, MySQL is TCP-based, and therefore can leverage the timeout support provided by the Socket class, so the use of an Executor is not necesary.

Hi Brett,

Something here doesn't add up. Please help me understand your point of view:

1st, The main intent of this method and the included executor is to handle a rare condition related to network partitions and threads waiting on socket timeouts. Why do you say that the Executor is there primarily for use "in environments where no network is present"?

2nd, I agree that accepting null as the executor argument is a violation of the API contract, but ignoring an explicit argument is so too - "executor - The Executor implementation which will be used by setNetworkTimeout.", not "should" or "can". So, why forcing the user to use something that only serves to confuse him? What's the point of forcing an object that is to be ignored? Accepting a null value would be my attempt to make it easier for you since you don't want to use the executor. We are still discussing this, though, so don't expect anything yet.

3rd, how exactly are you doing special code for this driver when you know that others ignore this executor? What kind of executor do you use on the other cases then?

4th, there is a race condition alright. And that is being fixed. With this race condition fixed you can use whatever executor you want, if you are ok with the thread scheduling delays. If not, you can just use the special executor that runs the code immediately or send null, eventually. Why doesn't this work for you?

5th, to keep complying with the API contracts we should not allow a null executor and we would have to use the provided executor. It's up to the user to decide what kind of executor he needs and provide it. The fact that other drivers ignore the executor doesn't mean that this is the right way to do it. So, I still fail to understand why using the provided executor is not an option for you and why you need specific code for connecting to MySQL (other than to avoid the current race condition we have, obviously).

Thank you,

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Here is the thread with a definitive answer, from the Oracle JDBC specification mailing list, provided directly by Douglas Surber of Oracle:

http://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/2017-November/000236.html

In summary:

> 1) Are the effects of setNetworkTimeout() intended to be immediate?

The network timeout should apply to all subsequent database interactions. It does not apply to pending interactions. It is vendor dependent whether the timeout applies to the entire JDBC API method call or to each network round trip (if there are more than one) to implement the method call.

> 2) What is the role/purpose of the Executor parameter passed to
> setNetworkTimeout()?

When the timeout expires the thread that is hung will be kicked loose with an exception. This thread may not clean up the connection so it is closed properly. The Executor provides a thread that the implementation can use to do this clean up. If the hung thread does the full clean up as it throws then the Executor is not needed. 

———————
So, I don’t really have an opinion on MySQL’s implementation, as long as the effect of setNetworkTimeout() is immediate (not applied asynchronously) in compliance with the spec.