Bug #22643 DBCP can't validate ReplicationConnections
Submitted: 24 Sep 2006 18:44 Modified: 12 Oct 2007 13:45
Reporter: Clive Cox Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S1 (Critical)
Version:3.1.13, 5.0.x OS:Linux (linux)
Assigned to: CPU Architecture:Any

[24 Sep 2006 18:44] Clive Cox
Description:
ConnectorJ 5.0.3
MySQL 5.0.21

The ReplicationConnection has 2 private connections 1 for slave and 1 for master.
DBCP validates connections via a configurable SQL query.
However, this validation is only done on ReplicationDriver's currentConnection.
If the other becomes bad, say via wait_timeout exceeded DBCP will still happily use this connection.
Worse, it will continue to use even though SQL errors are returned when the client tries to use the bad connection as to get to the bad connection the ReplicationConnection will call swapConnections which only sets the currentConnection as the last thing it does. So any errors occurring during swapping of connections will still leave the old connection as the current connection and this connection will be okay, so DBCP will still think everything is fine.

An example stack trace I get is:
       at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2592)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2909)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1600)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1695)
        at com.mysql.jdbc.Connection.execSQL(Connection.java:2998)
        at com.mysql.jdbc.Connection.setCatalog(Connection.java:5028)
        at com.mysql.jdbc.ReplicationConnection.swapConnections(ReplicationConnection.java:457)
        at com.mysql.jdbc.ReplicationConnection.switchToSlavesConnection(ReplicationConnection.java:432)
        at com.mysql.jdbc.ReplicationConnection.setReadOnly(ReplicationConnection.java:383)
        at org.apache.commons.dbcp.DelegatingConnection.setReadOnly(DelegatingConnection.java:239)

How to repeat:
Haven't got an easy test case yet.
But simply use a connection in a manner that only uses the slave or master only until the other passes the wait_timeout should do it.

Suggested fix:
No easy fix I can think of.

Better (in this case) would be to set currentConnection at start of swapConnections so at least DBCP could spot the problem with testOnReturn=true.
However, this might not be acceptable semantics.
[24 Sep 2006 18:59] Clive Cox
The stack trace shown is actually from connector/J 3.1.13
The problem should exist in 5.0.3 as swapConnections looks the same.
[25 Sep 2006 8:26] Clive Cox
Test showing bug

Attachment: TestDBCPWithRepDriver.java (application/text, text), 4.22 KiB.

[25 Sep 2006 8:32] Clive Cox
My Test is with a slight modification of the BasicDataSourceExample taken from the DBCP website http://svn.apache.org/viewvc/jakarta/commons/proper/dbcp/trunk/doc/BasicDataSourceExample....

It sets the pool to size 1 and runs some SQL every 5 seconds for 80 seconds with the connection setReadOnly(false).

It then runs an SQL query with setReadOnly(true).

IMPORTANT: set wait_timeout in the mysql server to 70

The args[0] URI I was using is:
"jdbc:mysql://localhost:3306,localhost:3306/mydb?characterEncoding=utf8&useServerPrepStmts=true&logger=com.mysql.jdbc.log.StandardLogger&roundRobinLoadBalance=true&transformedBitIsBoolean=true"

So I just connect to the same db for both master and slave.

the args[1] SQL was just "select 1" 

I get a stack trace:
Now have run for 85
com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception: 

** BEGIN NESTED EXCEPTION ** 

java.net.SocketException
MESSAGE: Software caused connection abort: recv failed

STACKTRACE:

java.net.SocketException: Software caused connection abort: recv failed
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
	at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
	at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
	at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1910)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2304)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2803)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1573)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1665)
	at com.mysql.jdbc.Connection.execSQL(Connection.java:3118)
	at com.mysql.jdbc.Connection.setCatalog(Connection.java:5286)
	at com.mysql.jdbc.ReplicationConnection.swapConnections(ReplicationConnection.java:493)
	at com.mysql.jdbc.ReplicationConnection.switchToSlavesConnection(ReplicationConnection.java:472)
	at com.mysql.jdbc.ReplicationConnection.setReadOnly(ReplicationConnection.java:419)
	at org.apache.commons.dbcp.DelegatingConnection.setReadOnly(DelegatingConnection.java:239)
	at net.playtxt.jdbc.TestDBCPWithRepDriver.main(TestDBCPWithRepDriver.java:75)

** END NESTED EXCEPTION **

Last packet sent to the server was 31 ms ago.
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2515)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2803)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1573)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1665)
	at com.mysql.jdbc.Connection.execSQL(Connection.java:3118)
	at com.mysql.jdbc.Connection.setCatalog(Connection.java:5286)
	at com.mysql.jdbc.ReplicationConnection.swapConnections(ReplicationConnection.java:493)
	at com.mysql.jdbc.ReplicationConnection.switchToSlavesConnection(ReplicationConnection.java:472)Creating connection.
Exception:Communications link failure due to underlying exception: 

** BEGIN NESTED EXCEPTION ** 

java.net.SocketException
MESSAGE: Software caused connection abort: recv failed

STACKTRACE:

java.net.SocketException: Software caused connection abort: recv failed
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
	at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
	at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
	at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1910)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2304)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2803)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1573)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1665)
	at com.mysql.jdbc.Connection.execSQL(Connection.java:3118)
	at com.mysql.jdbc.Connection.setCatalog(Connection.java:5286)
	at com.mysql.jdbc.ReplicationConnection.swapConnections(ReplicationConnection.java:493)
	at com.mysql.jdbc.ReplicationConnection.switchToSlavesConnection(ReplicationConnection.java:472)
	at com.mysql.jdbc.ReplicationConnection.setReadOnly(ReplicationConnection.java:419)
	at org.apache.commons.dbcp.DelegatingConnection.setReadOnly(DelegatingConnection.java:239)
	at net.playtxt.jdbc.TestDBCPWithRepDriver.main(TestDBCPWithRepDriver.java:75)

** END NESTED EXCEPTION **

Last packet sent to the server was 31 ms ago.

	at com.mysql.jdbc.ReplicationConnection.setReadOnly(ReplicationConnection.java:419)
	at org.apache.commons.dbcp.DelegatingConnection.setReadOnly(DelegatingConnection.java:239)
	at net.playtxt.jdbc.TestDBCPWithRepDriver.main(TestDBCPWithRepDriver.java:75)
[25 Sep 2006 8:36] Clive Cox
Forgot to say you will need to set databse username and password for your
database and you need:

Also, as the code say:
To compile this example, you'll want:
//  * commons-pool.jar
//  * commons-dbcp.jar
//  * j2ee.jar (for the javax.sql classes)
// in your classpath.
//
// To run this example, you'll want:
//  * commons-collections.jar
//  * commons-pool.jar
//  * commons-dbcp.jar
//  * j2ee.jar (for the javax.sql classes)
//  * the classes for your (underlying) JDBC driver
// in your classpath.
[22 Oct 2006 11:14] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with the latest versions of Connector/J, 3.1.14 and 5.0.4, and inform about the results.
[1 Nov 2006 11:38] Clive Cox
Yes the bug still occurs in 5.0.4.

If you look at my comments you will see that this is not a bug that will easily disappear without a fundamental change to how the ReplicationDriver works and/or collaboration with DBCP.

A simple inspection of the code will make clear that keeping 2 connections open in the ReplicationDriver will mean 1 gets timed-out if not used and DBCP will only test the active connection as it has no idea there is another "hidden" connection.
[9 Nov 2006 12:18] Tonci Grgin
Hi Clive and thanks for your report.
Verified as described with provided test case.
[9 Nov 2006 12:19] Tonci Grgin
Stack trace

Attachment: 22643STR.zip (application/x-zip-compressed, text), 1.01 KiB.

[7 Dec 2006 21:18] Mark Matthews
We're looking into ways of fixing this without having to change DBCP.
[6 Sep 2007 14:57] Fabian Panthen
Is there any idea of when a fix for this can be expected?
[6 Sep 2007 15:04] Fabian Panthen
We experience the same issue with c3pO
[10 Oct 2007 17:56] Matt Cohen
Is there any idea when a fix for this will be released?
[12 Oct 2007 13:45] Mark Matthews
Fixed in 5.0.9, 5.1.5. You have to use a specially-constructed validation query that starts with "/* ping */", here's the entry from the changelog:

Specifying a "validation query" in your connection pool 
      that starts with "/* ping */" _exactly_ will cause the driver to 
      instead send a ping to the server and return a fake result set (much 
      lighter weight), and when using a ReplicationConnection or a LoadBalancedConnection, 
      will send the ping across all active connections.
[16 Oct 2007 22:28] Matt Cohen
Thank you very much. :)