Bug #96309 MultiHost in loadbalance may lead to a TPS reduction during a quick switch
Submitted: 24 Jul 2019 8:46 Modified: 2 Oct 20:14
Reporter: qingshun luo Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S5 (Performance)
Version:5-1-31,even 5-1-46 OS:Any
Assigned to: CPU Architecture:Any
Tags: loadbalance

[24 Jul 2019 8:46] qingshun luo
Description:
During use of LoadbalanceProxyConnection with multi-host, a quick switch may cause normal sever still kept in blacklist during loadBalanceBlacklistTimeout.

As to TPS reduction situation, consider a simple example:

Two hosts, located at two seperately machine marked as A and B serve for jdbc loadbalance.
Machine A shutdown, witch is added into blacklist. When machine A recovers, machine B shutdown in no time.
Now machine A can serve normally, but it's still kept in blacklist with long delay until loadBalanceBlacklistTimeout reaches, leading to low TPS.

Test result describe:

```
loadBalanceBlacklistTimeout=60000

Time          Network Action             MySQL Cluster Status        TPS&Client Status               Blacklist   
...           -                          mysql#A=Ok,mysql#B=Ok       TPS=5, Normal                   None
time0         Drop mysql#A               mysql#A=Drop,mysql#B=Ok     TPS=5, Normal                   mysql#A
time0+5s      Recover mysql#A            mysql#A=Ok,mysql#B=Ok       TPS=5, Normal                   mysql#A
time0+10s     Drop mysql#B               mysql#A=Ok,mysql#B=Drop     TPS=0 or 1, Abnormal            mysql#A,mysql#B
...
time0+59s     Drop mysql#B               mysql#A=Ok,mysql#B=Drop     TPS=0 or 1, Abnormal            mysql#A,mysql#B
time0+60s~    Drop mysql#B               mysql#A=Ok,mysql#B=Drop     TPS=5, Normal                   mysql#B
```

From [time0+10s] to  [time0+59s], client status is abnormal, mysql#A kept in blacklist can serve normally indeed.

How to repeat:
1. Test code:
    
    ```
    private static String driver = "com.mysql.jdbc.Driver";
    private static String url = "jdbc:mysql:loadbalance://172.21.34.87:40991,172.21.34.87:40992/schema_test_jdbc?useSSL=false&connectTimeout=3000&socketTimeout=3000&loadBalanceBlacklistTimeout=60000";
    private static String user = "test";
    private static String password = "test";

    private static SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    private static Connection c;

    static void executeSimpleTransaction(String name, Connection c){
        try { 
            c.setAutoCommit(false); 
            Statement s = c.createStatement();
            String row_res;
            ResultSet rs = s.executeQuery("SELECT SLEEP(0.2)"); 
            while(rs.next()){
                row_res = rs.getString("SLEEP(0.2)");
            }
            rs.close();
            c.commit(); 
        } catch (SQLException e) {
            System.out.println(name + " execute Query fail at " + sdf.format(new Date()));
            e.printStackTrace(); 
        } 
    } 

     static Connection getNewConnection() throws SQLException, ClassNotFoundException {
        Class.forName("com.mysql.jdbc.Driver");
        return DriverManager.getConnection(url, user, password);
     }
     
     public static void main(String[] args) throws Exception {
        long tx_count = 0;
        long pre_tx_count = 0;

        long current_time = new Date().getTime();
        long previous_time = current_time;

        int re_connect = -1;
        int i = 0;

        String thd_name="TEST-JDBC ";
        while(true) {
            try {
                current_time = new Date().getTime();
                if (current_time - previous_time >= 3000) {
                    System.out.println(thd_name + sdf.format(new Date()) + " transaction TPS:" + (tx_count-pre_tx_count)/((current_time - previous_time)/1000)); /* Print current TPS */
                    pre_tx_count = tx_count;
                    previous_time = current_time;
                }
                if (re_connect != 0) {
                    c = getNewConnection();
                    i = i + 1;
                    re_connect = 0;
                }
                executeSimpleTransaction(thd_name, c);
                tx_count = tx_count+1;
            } catch(SQLException e) {
                System.out.println(thd_name + " connect fail at " + sdf.format(new Date()));
                e.printStackTrace();
            } catch (Exception e) {
                System.out.println(thd_name + " throw exception at " + sdf.format(new Date()));
                e.printStackTrace();
            }
        }
     }
     ```
     
3. Simulate step:

In test program machine, simulate server machine shutdown with root user:

root@dev3498:~> iptables -F
root@dev3498:~> date; iptables -A OUTPUT -p tcp -d 172.21.34.87 --dport 40991 -j DROP
Wed Jul 24 13:47:48 CST 2019
root@dev3498:~> date; iptables -D OUTPUT -p tcp -d 172.21.34.87 --dport 40991 -j DROP  
Wed Jul 24 13:47:52 CST 2019
root@dev3498:~> date; iptables -A OUTPUT -p tcp -d 172.21.34.87 --dport 40992 -j DROP 
Wed Jul 24 13:47:55 CST 2019

Then a significant response latency and TPS reduction will be observed, sustaining about 60s, which depends on loadBalanceBlacklistTimeout setting.

4. Corresponding log info:

TEST-JDBC 2019-07-24 13:47:45 transaction TPS:5
TEST-JDBC  execute Query fail at 2019-07-24 13:47:52
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during commit(). Transaction resolution unknown.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
        at com.mysql.jdbc.Util.getInstance(Util.java:408)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:918)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
        at com.mysql.jdbc.ConnectionImpl.commit(ConnectionImpl.java:1559)
        at com.mysql.jdbc.MultiHostMySQLConnection.commit(MultiHostMySQLConnection.java:129)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.mysql.jdbc.LoadBalancedConnectionProxy.invokeMore(LoadBalancedConnectionProxy.java:495)
        at com.mysql.jdbc.MultiHostConnectionProxy.invoke(MultiHostConnectionProxy.java:457)
        at com.sun.proxy.$Proxy0.commit(Unknown Source)
        at com.companyname.bank.App.executeSimpleTransaction(App.java:149)
        at com.companyname.bank.App.main(App.java:121)
TEST-JDBC 2019-07-24 13:47:52 transaction TPS:2
TEST-JDBC 2019-07-24 13:47:55 transaction TPS:5
TEST-JDBC  execute Query fail at 2019-07-24 01:48:01
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during commit(). Transaction resolution unknown.
......
TEST-JDBC 2019-07-24 13:48:01 transaction TPS:0
TEST-JDBC 2019-07-24 13:48:05 transaction TPS:1
TEST-JDBC 2019-07-24 13:48:08 transaction TPS:0
TEST-JDBC 2019-07-24 13:48:13 transaction TPS:2
TEST-JDBC 2019-07-24 13:48:17 transaction TPS:1
TEST-JDBC 2019-07-24 13:48:20 transaction TPS:1
TEST-JDBC 2019-07-24 13:48:23 transaction TPS:0
TEST-JDBC 2019-07-24 13:48:27 transaction TPS:0
TEST-JDBC 2019-07-24 13:48:30 transaction TPS:1
TEST-JDBC 2019-07-24 13:48:34 transaction TPS:1
TEST-JDBC 2019-07-24 13:48:37 transaction TPS:0
TEST-JDBC 2019-07-24 13:48:40 transaction TPS:0
TEST-JDBC 2019-07-24 13:48:44 transaction TPS:0
TEST-JDBC 2019-07-24 13:48:47 transaction TPS:1
TEST-JDBC 2019-07-24 13:48:50 transaction TPS:0
TEST-JDBC 2019-07-24 13:48:53 transaction TPS:5

Suggested fix:
When host serves normally, remove host from Blacklist.

```
diff --git a/src/com/mysql/jdbc/LoadBalancedConnectionProxy.java b/src/com/mysql/jdbc/LoadBalancedConnectionProxy.java
index 80ec7696..84cd130f 100644
--- a/src/com/mysql/jdbc/LoadBalancedConnectionProxy.java
+++ b/src/com/mysql/jdbc/LoadBalancedConnectionProxy.java
@@ -371,6 +371,8 @@ public class LoadBalancedConnectionProxy extends MultiHostConnectionProxy implem
     public synchronized ConnectionImpl createConnectionForHost(String hostPortSpec) throws SQLException {
         ConnectionImpl conn = super.createConnectionForHost(hostPortSpec);
 
+        this.removeFromGlobalBlacklist(hostPortSpec);
+
         this.liveConnections.put(hostPortSpec, conn);
         this.connectionsToHostsMap.put(conn, hostPortSpec);
 
@@ -639,6 +641,20 @@ public class LoadBalancedConnectionProxy extends MultiHostConnectionProxy implem
         addToGlobalBlacklist(host, System.currentTimeMillis() + this.globalBlacklistTimeout);
     }
 
+    /**
+     * Removes a host from the blacklist.
+     * 
+     * @param host
+     *            The host need remove from the blacklist.
+     */
+    public void removeFromGlobalBlacklist(String host) {
+        if (isGlobalBlacklistEnabled()) {
+            synchronized (globalBlacklist) {
+                globalBlacklist.remove(host);
+            }
+        }
+    }
+
     /**
      * Checks if host blacklist management was enabled.
      */
```
[13 Sep 2019 8:03] Alexander Soklakov
Hi qingshun,

Did you try Connector/J 8.0?
[17 Sep 2019 1:46] qingshun luo
Test result with Connector/J 8.0.17

Attachment: test-result.log (application/octet-stream, text), 9.60 KiB.

[17 Sep 2019 1:47] qingshun luo
Thank you for the report.
I try to test with Connector/J 8.0.17, similar result is obtained.
[31 Jul 19:06] Filipe Silva
Hi qingshun luo,

Sorry for the late reply.

Thank you very much for this report. This is happening exactly as you reported.

And thank you too for the suggested fix.
[2 Oct 20:14] Daniel So
Posted by developer:
 
Added the following entry to the Connector/J 8.0.22 changelog: 

"In a load balancing setup, if the connection parameter loadBalanceBlacklistTimeout was set, a server that was once unavailable remained in the blocklist even after a connection to it has been reestablished, and this affected the system's performance. With this fix, the server is removed from the blocklist as soon as it becomes available again."
[2 Oct 20:14] Daniel So
Posted by developer:
 
Added the following entry to the Connector/J 8.0.22 changelog: 

"In a load balancing setup, if the connection parameter loadBalanceBlacklistTimeout was set, a server that was once unavailable remained in the blocklist even after a connection to it has been reestablished, and this affected the system's performance. With this fix, the server is removed from the blocklist as soon as it becomes available again."