Bug #18367 Thread Deadlocks in Connector/J 5.0
Submitted: 20 Mar 2006 23:09 Modified: 5 May 2006 3:00
Reporter: Jim Redman
Status: No Feedback
Category:Connector/J Severity:S1 (Critical)
Version:5.0/5.0.1 OS:Linux (Linux)
Assigned to: Target Version:

[20 Mar 2006 23:09] Jim Redman
Description:
_MANY_ Thread deadlocks in Connector/J 5.0.

How to repeat:
I suspect that just about any seriously multi-threaded application will hang.

Suggested fix:
There are some obvious problems in the code, related to the fact that there are _three_
mutex's in play when a query is executed.  This is just asking to lock up, and it surely
does.

There seems to be some fundamental code design philosophy that says that more
synchronization will prevent lockups, whereas the reverse is true.  Synchronization is
jewelers screwdriver, not a sledgehammer.  Only small critical sections should be
sync'd.

1) Just about every method in Connection is synchronized.  This doesn't make sense,
especially when the method is just a "get" method, get:

	public synchronized String getCatalog() throws SQLException {
		return this.database;
	}
(this was the source of one of our current lockups)
it's not clear that it makes sense on many of the methods.

2) The Connection mutex.  This is a great idea, so how about you drop all the
"synchronized" in Connection and use the mutex WHERE IT'S NEEDED.  The mutex is private
to the class, and user code synchronizing on the Connection (which we do to avoid
problems with the 3.1 series) will not be able to impact your code.

3) Synchronization in Prepared statement and derivatives has the same problem as
Connection eg:
public synchronized void clearWarnings() throws SQLException {
		this.warningChain = null;
	}
[20 Mar 2006 23:54] Jim Redman
One thing that is pretty much certain to cause a deadlock is synchronizing _any_ method is
Statement that also calls:

		synchronized (this.connection.getMutex()) {

Jim
[22 Mar 2006 20:59] Mark Matthews
I don't agree with your statement that only small critical sections should be
synchronized, check any "best practices" document for Java that shows that over
synchronization is safer, and has little to no overhead for uncontended access.

The methods in question are all synchronized because they are not primitives that can be
changed atomically.

In any case, even though it's allowed by the JDBC specificiation, it is not recommended
to use a JDBC connection from multiple threads, since almost no vendor actually supports
more than one concurrent operation on the physical database connection at any one time.

Do you have a compelling reason to do so that might make it obvious why/where we should
change our current behavior?
[23 Mar 2006 16:32] Jim Redman
You must go in the direction that you feel is best.  There's no real issue with the
performance of synchronization, but it will surely cause deadlocks as you have it
implemented.

If you're going to say that Connector/J is not thread safe, which is certainly your
choice, then the synchronizations are neither helpful nor harmful.  I do hope you will
document it prominently so that other don't suffer with these problems.

The code as it stands deadlocks right and left, I've documented here why it deadlocks and
the general approach and specific solutions to make Connector/J thread safe. You didn't
quote a reference for the "best practices" document, but you can go that theoretical way
and have a single threaded connector, or you can follow what I said,  and very quickly
have a thread-safe solution, code is attached.

Your call.

In answer to your specific question as to why we do this.  We build a drag-and-drop,
Java-based, toolkit for manufacturing and factory automation that is heavily
multi-threaded.  It makes extensive use of databases for everything from historical data,
alarming to workorders  and MES interactions.  In the threading model we share a database
Connection across threads, but use a Statement only within a specific thread, with the
exception that we close the Statements when we close the Connection which could be in any
thread.

We close the Connection(s) on any error.  The environment may be unstable, for example,
you could have a wireless network next to a welder that will (sometimes, often,
intermittently, etc.) generate enough interference to bring down the wireless network for
anywhere from a few ms to seconds.  If this happens during a database transaction, we'll
close all the Connections and Statements.  Attempts will be made to reform the
Connections and Statements whenever they don't exist or have been closed.  If the
interruption is short enough the re-open could occur before the close is complete, or it
could be seconds or minutes later.

As you might realize, we have some experience with synchronization, deadlocks and making
code thread-safe.

The environment is probably the harshed that Connector/J will see.  But you can either be
thread-safe, in which case it will work in this environment, or not.  If the code
deadlocks, in this system, it will deadlock in others, just not as often.   You can
restrict what objects are thread-safe, for example jTDS states that the Connection is
thread-safe, but Statements (except for "cancel") are not (incidently they say that
because the don't want to add synchronization and so possible deadlocks.)
[23 Mar 2006 16:33] Jim Redman
Removed synchronization on most methods, eliminates thread deadlocks.

Attachment: NoSync.tar.gz (application/x-gzip, text), 76.54 KiB.

[3 Apr 2006 23:07] Mark Matthews
We're going to tackle this when we fix BUG#18719 in Connector/J 5.0, so I'm consolidating
all thread deadlock reports into this one bug, as they're all related.
[4 Apr 2006 23:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/4463
[5 Apr 2006 3:00] Mark Matthews
Please test propsed fixes in this nightly build:

http://downloads.mysql.com/snapshots/mysql-connector-java-5.0/mysql-connector-java-5.0-nig...
[6 May 2006 1:01] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".