Bug #114523 Deadlock when closing connection
Submitted: 1 Apr 9:48 Modified: 9 May 11:59
Reporter: Luke Morfill Email Updates:
Status: Verified Impact on me:
Category:Connector / J Severity:S2 (Serious)
Version:8.3.0 OS:Any
Assigned to: CPU Architecture:Any

[1 Apr 9:48] Luke Morfill
When closing a connection using this Java connector, the library can get into a deadlock. This causes both the thread trying to close the connection and one of the connector threads to lock up. This is caused when one of the connector threads is reading whilst another thread tries to close the connection.

As far as I can tell, I've only been able to reproduce this with the replication flag.

Here is what the deadlock looks like:

"pool-3-thread-1@2589" daemon prio=5 tid=0x1d nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
	 blocks main@1
	 waiting for main@1 to release lock on <0xaa2> (a com.mysql.cj.jdbc.ha.ReplicationConnectionProxy)
	  at com.mysql.cj.jdbc.result.ResultSetImpl.checkColumnBounds(ResultSetImpl.java:498)
	  at com.mysql.cj.jdbc.result.ResultSetImpl.getString(ResultSetImpl.java:875)
	  at jdk.internal.reflect.GeneratedMethodAccessor6.invoke(Unknown Source:-1)
	  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	  at java.lang.reflect.Method.invoke(Method.java:568)
	  at com.mysql.cj.jdbc.ha.MultiHostConnectionProxy$JdbcInterfaceProxy.invoke(MultiHostConnectionProxy.java:110)
	  - locked <0xaa3> (a com.mysql.cj.jdbc.ha.LoadBalancedConnectionProxy)
	  at jdk.proxy1.$Proxy4.getString(Unknown Source:-1)
	  at com.rubikcubeman.mysql.MySQLDeadlockTest.lambda$main$1(MySQLDeadlockTest.java:33)

"main@1" prio=5 tid=0x1 nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
	 blocks pool-3-thread-1@2589
	 waiting for pool-3-thread-1@2589 to release lock on <0xaa3> (a com.mysql.cj.jdbc.ha.LoadBalancedConnectionProxy)
	  at com.mysql.cj.jdbc.ha.MultiHostConnectionProxy.invoke(MultiHostConnectionProxy.java:489)
	  at jdk.proxy1.$Proxy0.close(Unknown Source:-1)
	  at com.mysql.cj.jdbc.ha.ReplicationConnectionProxy.doClose(ReplicationConnectionProxy.java:291)
	  at com.mysql.cj.jdbc.ha.MultiHostConnectionProxy.invoke(MultiHostConnectionProxy.java:491)
	  - locked <0xaa2> (a com.mysql.cj.jdbc.ha.ReplicationConnectionProxy)
	  at jdk.proxy1.$Proxy1.close(Unknown Source:-1)
	  at com.rubikcubeman.mysql.MySQLDeadlockTest.main(MySQLDeadlockTest.java:50)

How to repeat:
You need to have a thread reading from a ResultSet while having another thread close the connection.

I mocked up a bit of code which can reproduce the issue: https://github.com/rubik-cube-man/mysql-connector-deadlock/tree/main

If you wish to use my code to replicate the issue, you can follow the README to get it running. It may not work on the first execution; however, after a couple of runs, you're likely to have the program fail to terminate.

Suggested fix:
It appears that the order in which locks are acquired differs depending on whether you're reading off a ResultSet or closing the connection. I would suggest making the order of acquiring locks consistent.
[1 Apr 17:49] Luke Morfill
It would seem that today we saw some of our major applications crash a well to a very similar issue. Here are the deadlocks that we saw in our application. The deadlock is slightly different to the one above as it appears to be caused by some different methods.

We also use Hikari in our project; however, I don't think this affects the deadlocks at all.

"Custom Pool Thread #8321":
	at com.mysql.cj.jdbc.ha.ReplicationConnectionProxy.getCurrentConnection(ReplicationConnectionProxy.java:517)
	- waiting to lock <0x0000000628194a10> (a com.mysql.cj.jdbc.ha.ReplicationConnectionProxy)
	at com.mysql.cj.jdbc.ha.ReplicationMySQLConnection.getCurrentConnection(ReplicationMySQLConnection.java:59)
	- locked <0x0000000628194a68> (a com.mysql.cj.jdbc.ha.ReplicationMySQLConnection)
	at com.mysql.cj.jdbc.ha.ReplicationMySQLConnection.getActiveMySQLConnection(ReplicationMySQLConnection.java:54)
	at com.mysql.cj.jdbc.ha.MultiHostMySQLConnection.getConnectionMutex(MultiHostMySQLConnection.java:646)
	at com.mysql.cj.jdbc.result.ResultSetImpl.realClose(ResultSetImpl.java:1821)
	at com.mysql.cj.jdbc.result.ResultSetImpl.close(ResultSetImpl.java:560)
	at com.mysql.cj.jdbc.StatementImpl.realClose(StatementImpl.java:1789)
	at com.mysql.cj.jdbc.StatementImpl.close(StatementImpl.java:429)
	at jdk.internal.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.10/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(java.base@17.0.10/Method.java:568)
	at com.mysql.cj.jdbc.ha.MultiHostConnectionProxy$JdbcInterfaceProxy.invoke(MultiHostConnectionProxy.java:110)
	- locked <0x0000000628194a88> (a com.mysql.cj.jdbc.ha.LoadBalancedConnectionProxy)
	at jdk.proxy2.$Proxy29.close(jdk.proxy2/Unknown Source)
	at com.zaxxer.hikari.pool.ProxyStatement.close(ProxyStatement.java:75)
        ... Our code which I can't share unfortunately.

	at com.mysql.cj.jdbc.ha.ReplicationMySQLConnection.getCurrentConnection(ReplicationMySQLConnection.java:59)
	- waiting to lock <0x0000000628194a68> (a com.mysql.cj.jdbc.ha.ReplicationMySQLConnection)
	at com.mysql.cj.jdbc.ha.ReplicationMySQLConnection.getActiveMySQLConnection(ReplicationMySQLConnection.java:54)
	at com.mysql.cj.jdbc.ha.MultiHostMySQLConnection.setNetworkTimeout(MultiHostMySQLConnection.java:636)
	at jdk.internal.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.10/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(java.base@17.0.10/Method.java:568)
	at com.mysql.cj.jdbc.ha.ReplicationConnectionProxy.invokeMore(ReplicationConnectionProxy.java:331)
	at com.mysql.cj.jdbc.ha.MultiHostConnectionProxy.invoke(MultiHostConnectionProxy.java:518)
	- locked <0x0000000628194a10> (a com.mysql.cj.jdbc.ha.ReplicationConnectionProxy)
	at jdk.proxy2.$Proxy20.setNetworkTimeout(jdk.proxy2/Unknown Source)
	at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:561)
	at com.zaxxer.hikari.pool.PoolBase.quietlyCloseConnection(PoolBase.java:136)
	at com.zaxxer.hikari.pool.HikariPool.lambda$closeConnection$1(HikariPool.java:444)
	at com.zaxxer.hikari.pool.HikariPool$$Lambda$2069/0x00007f8e50a2f828.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.10/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.10/ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(java.base@17.0.10/Thread.java:840)
[2 Apr 10:27] Luke Morfill
Bumping to Serious as we're seeing applications/threads drop like flies.

We're investigating reverting to an older version for the time being.
[9 May 11:26] Filipe Silva
Hi Luke,

Thank you for your interest in MySQL Connector/J and for this report.

I'm setting this report as "verified" however mind that, although we try to keep Connector/J thread safe as much as possible, it's not advisable to share MySQL connections between threads. I urge you to use a connection pool and make each thread get its own connection and release it as soon as it doesn't need it anymore.

I will also recommend you to move away from Connector/J replication connections and favor  MySQL Router and InnoDB Cluster, which are much more efficient in providing high availability and read/write splitting.
[9 May 11:59] Luke Morfill
Hello, thank you for your response and your suggestions! I will see what I can do to implement your ideas.

One thing to note is that we HikariCP in production for the management of our connections. We don't pass connections between threads. I just wrote an example of the deadlock using a Hikari data source. Would this be a bug inside Hikari as it appears to be closing all the connections from the thread which attempts to close the pool?
[11 May 1:03] Filipe Silva
Hmm... that might be a valid cause for this deadlock. No, I don't think it's a bug in HikariCP but, definitely, using jdbc:mysql:replication connections in a connection pool sounds quite risky.

We shall try to fix this. However, given that there are better alternatives, please don't keep your expectations too high.