Bug #33861 load balance bestresponsetime blacklists need global state
Submitted: 14 Jan 2008 13:22 Modified: 16 Oct 2008 13:53
Reporter: Domas Mituzas Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S4 (Feature request)
Version:5.1-nightly-20080114 OS:Any
Assigned to: Todd Farmer CPU Architecture:Any
Tags: again
Triage: D2 (Serious)

[14 Jan 2008 13:22] Domas Mituzas
Description:
in case of: failOverReadOnly=false&loadBalanceStrategy=bestResponseTime&autoReconnectForPools=true&connectTimeout=1000

bad connection is tried on each connection fetch - as if no internal blacklist would exist:

$ ./run51 al
1550ms loadbalance cn = jdbc:mysql://localhost:3306/
1109ms loadbalance cn = jdbc:mysql://localhost:3306/
1105ms loadbalance cn = jdbc:mysql://localhost:3306/
1102ms loadbalance cn = jdbc:mysql://localhost:3306/
1108ms loadbalance cn = jdbc:mysql://localhost:3306/
1100ms loadbalance cn = jdbc:mysql://localhost:3306/
1120ms loadbalance cn = jdbc:mysql://localhost:3306/
1098ms loadbalance cn = jdbc:mysql://localhost:3306/
1095ms loadbalance cn = jdbc:mysql://localhost:3306/
1095ms loadbalance cn = jdbc:mysql://localhost:3306/
1095ms loadbalance cn = jdbc:mysql://localhost:3306/
1095ms loadbalance cn = jdbc:mysql://localhost:3306/
1099ms loadbalance cn = jdbc:mysql://localhost:3306/
1100ms loadbalance cn = jdbc:mysql://localhost:3306/
1093ms loadbalance cn = jdbc:mysql://localhost:3306/
1091ms loadbalance cn = jdbc:mysql://localhost:3306/
1098ms loadbalance cn = jdbc:mysql://localhost:3306/

How to repeat:
do note, that failing connections on local network are shortened by OS detecting 'host is down'
to test a bad host, put it outside of local LAN ip scope, or use firewall to deny packets

                        while(i++<30) {
                                Properties info = new Properties();
                                long startTime = System.currentTimeMillis()
                                Connection cn = DriverManager.getConnection("jdbc:mysql:loadbalance://192.168.5.5:3306,localhost:3306/test?failOverReadOnly=false&loadBalanceStrategy=bestResponseTime&autoReconnectForPools=true&connectTimeout=1000");
                                long stopTime = System.currentTimeMillis();
                                System.out.println((stopTime-startTime)+"ms loadbalance cn = " + cn.getMetaData().getURL());
                        }

Suggested fix:
make better use of connection blacklist
[14 Jan 2008 17:02] Mark Matthews
This behavior is as-intended, as the concept is that you leave these *physical* connections in a pool for quite some time. Therefore, to avoid a bottleneck on connection "handout" from the pool, each physical connection has its own copy of the blacklist, they're not shared. Does the user require creation of physical connections often? If so, we'll have to do some thinking about how we can do this without being a bottleneck when we have to access the "shared state" blacklist.
[15 Feb 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[12 Mar 2008 15:33] Domas Mituzas
I'm reverifying it, to reclassify it as:
load balance bestresponsetime blacklists need global state

It is needed that new connections created would consult previously recorded global state for slow or unreachable hosts.
[15 Oct 2008 21:30] Todd Farmer
Fixed and pushed.
[16 Oct 2008 13:53] Tony Bedford
An entry was added to the 5.1.7 changelog:

The loadBalance bestResponseTime blacklists did not have a global state.