Bug #63135 Blacklist ceases to work after 2 failures.
Submitted: 7 Nov 2011 21:55 Modified: 16 Dec 2011 7:29
Reporter: James G Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:5.1.18 OS:Any
Assigned to: CPU Architecture:Any
Tags: blacklist

[7 Nov 2011 21:55] James G
Description:
I use the replication extension to connect my web app to 1 master and 2 readonly servers.  I took both of the readonly servers down and then brought one up.  Even though I've set the loadBalanceBlacklistTimeout option to 30 seconds, it still just randomly selects either one of the readonly servers resulting in an overall slow performance because one of the readonly servers is still down.

How to repeat:
My url is in the form:
jdbc:mysql:replication://master,readonly1,readonly2
Connection properties are:
useEncoding=true;characterEncoding=UTF-8;useOldAliasMetadataBehavior=true;loadBalanceBlacklistTimeout=30000;connectTimeout=2000
1.  Start app running
2.  Stop both readonly db servers
3.  Start one readonly db server
4.  Notice 50% attempts always try the down readonly server

Suggested fix:
I debugged the connector to see that in com.mysql.jdbc.LoadBalancingConnectionProxy.getGlobalBlacklist() ,
the comparison "keys.size() == this.hostList.size()" is always true even though one of the entries in the black list gets old.  The blacklist can never be reset at this point because the expire code is below this check.  I think that by moving the if block "keys.size() == this.hostList.size()" to below the timeout expired block, that will solve this problem and allow the good connection to be removed from the blacklist.
[8 Nov 2011 16:12] Todd Farmer
Thanks for your bug report and useful analysis of the root cause.  Fixed in r1104.
[16 Dec 2011 7:29] Philip Olson
Fixed as of 5.1.18:

+        The loadBalanceBlacklistTimeout option was
+        not functioning properly. Working connections were not being
+        removed from the blacklist.