Description:
It seems to me that mysql-proxy tries to set state of unavailable backend server to UP every 10 seconds. And if there's a failure on next connection to this backend server, the backend server is marked as down again for next 10 seconds.
Suppose the backend was just marked as UP after 10 seconds of unavailability.
Then next connection is performed.
If connection failure is detected at network-mysqld-proxy.c:3855 (v0.6.1), the connection is retried with another backend server.
But if it's detected at network-mysqld-proxy.c:3707, the connection is not retried and reported as failed to client.
I think mysql-proxy should retry connection even if it's detected on line 3707.
How to repeat:
Run mysql-proxy with two backends. One available, one non-existing or unavailable.
/usr/local/sbin/mysql-proxy --admin-address=:4046 --proxy-address=:4045 --proxy-backend-addresses=:3307 --proxy-backend-addresses=:3306
then try to connect to proxy port :4045 several times in parallell.
Sooner or later you wil get connection failure, even though second backend at :3306 is available.
This is a simplified test case actually. I met this error in different conditions. I wrote lua script to try backend servers in fixed order always starting from first server (default round-robin doesn't suit me).
Suggested fix:
--- src/network-mysqld-proxy.c.orig Fri Feb 29 19:37:57 2008
+++ src/network-mysqld-proxy.c Sat Dec 8 02:11:10 2007
@@ -3707,6 +3707,20 @@
g_critical("%s.%d: connect(%s) failed: %s",
__FILE__, __LINE__,
con->server->addr.str, strerror(so_error));
+
+ if (st->backend->state != BACKEND_STATE_DOWN) {
+ g_critical("%s.%d: marking %s as down",
+ __FILE__, __LINE__, con->server->addr.str);
+
+ st->backend->state = BACKEND_STATE_DOWN;
+ g_get_current_time(&(st->backend->state_since));
+
+ network_socket_free(con->server);
+ con->server = NULL;
+
+ return RET_ERROR_RETRY;
+ }
+
return RET_ERROR;
}