MySQL Bugs: #4143: Under certain circumstances, MySQL client hangs when the server dies

Bug #4143	Under certain circumstances, MySQL client hangs when the server dies
Submitted:	15 Jun 2004 12:41	Modified:	21 Aug 2006 17:33
Reporter:	[ name withheld ]	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server	Severity:	S4 (Feature request)
Version:	4.0.20	OS:	Linux (Linux 2.4)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
The context:

We are running a proprietary, single-purpose HTTP server using a MySQL database as its
back-end. The server is designed to be capable of limited operation even with the database
being down or unreachable. The server is using an Apache-like preforked model of process
management with each worker process maintaining a separate persistent connection to the
MySQL database. The database and the server run on physically different machines.

Last week, the machine running the database went down due to a hardware problem.

What we expected to happen:

The HTTP server would continue operating, reporting DB operation failures to the log and
resuming flawless operation when the database came up.

What actually happened:

The HTTP server stopped responding. It's worker processes hanged while waiting for read
socket operations. Even when the server came up again, the worker processes were not able to
resume the operation.

Diagnostics:

It appears that the MySQL client library does not allow the client to set timeouts on socket
read and write (unlike the connect_timeout, which is configurable through the mysql_options
function). These timeouts seem to be implemented on lower level, yet hardcoded in MySQL
client:

In include/mysql.h:

#define CLIENT_NET_READ_TIMEOUT 365*24*3600 /* Timeout on read */
#define CLIENT_NET_WRITE_TIMEOUT 365*24*3600 /* Timeout on write */

If the remote server does not explicitely tear down the connection, the client hangs for a long
time.

How to repeat:
When the connection between any program using the MySQL client library and server is
established, one can block traffic between the server and client e.g. using iptables -j drop
between the client and server (both directions), thus simulating the problem. The client then
hangs.

Suggested fix:
MySQL client library should be modified to allow the application to set socket read and write
timeouts, in a manner analogical to connect_timeout setting.

Appart from fixing our problem this will provide the client with greater control and failover
capability when MySQL and network traffic problems are encountered.

I closed this bug report becasue it's a feature request and not a real
bug in MySQL. The main problem is that keep-alive on Linux is not
configurable; If this should work, this probelm wouldn't exists.

The defines in mysql.h is only used for connections on the mysqld side
and not usable for the client. Here follows a short description of how things
works:

- The mysqld server has an alarm thread that is used to request
timeouts. We use this to break connections that have been hanging too
long on a read or write. We can't use this method in libmysql as the
library doesn't have an alarm thread and we can't use alarms as the
application may use these.

- Detecting failed connects in client library is easy as we know that
he connect time will be short and we don't do connects that often.
(This is in sharp contrast to a read from the server that may take
'any' amount of time).
For this we do a combination of select/poll that is quite slow but
ok for connect.

- Changing the communication layer to use select/poll is likely to slow
down the library notable. Even if we do this, we can't detect if the
server goes down during a read.
We could have a much shorter timeout on write and this would allow us to
detect some cases.

I will add a request in our 'to-do' database about this and see if we
could add a variable to have timeouts and only use 'slower code' in
the case where the user have requested timeouts on read/write.

If you want to look at this yourself, the code is in libmysql/net.c. The connection code is in libmysql/libmysql.c

Regards,
Monty

You can set timeouts in client library with mysql_options() (MYSQL_OPT_READ_TIMEOUT and MYSQL_OPT_WRITE_TIMEOUT), however they work only on Windows.

This bug is a duplicate of bug#9678.

Note that the above comment on 21 Aug 2006 19:33 by Tomash Brechko is no longer valid.

The MYSQL_OPT_READ_TIMEOUT and MYSQL_OPT_WRITE_TIMEOUT options to mysql_options() work for all TCP/IP connections since 4.1.22/5.0.25/5.1.12.

From the manual page at http://dev.mysql.com/doc/refman/5.1/en/mysql-options.html:

MYSQL_OPT_READ_TIMEOUT (argument type: unsigned int *)

The timeout in seconds for attempts to read from the server. Each attempt uses this timeout value and there are retries if necessary, so the total effective timeout value is three times the option value. You can set the value so that a lost connection can be detected earlier than the TCP/IP Close_Wait_Timeout value of 10 minutes. This option works only for TCP/IP connections and, prior to MySQL 5.1.12, only for Windows.

MYSQL_OPT_WRITE_TIMEOUT (argument type: unsigned int *)

The timeout in seconds for attempts to write to the server. Each attempt uses this timeout value and there are net_retry_count retries if necessary, so the total effective timeout value is net_retry_count times the option value. This option works only for TCP/IP connections and, prior to MySQL 5.1.12, only for Windows.