Bug #52633 | When building with -DNO_ALARM on Linux, wait_timeout is effective 10x longer | ||
---|---|---|---|
Submitted: | 6 Apr 2010 20:49 | Modified: | 30 Jun 2011 22:15 |
Reporter: | Ryan Mack | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S2 (Serious) |
Version: | 5.1.45, 5.1 bzr | OS: | Linux (CentOS 5) |
Assigned to: | Paul DuBois | CPU Architecture: | Any |
Tags: | Contribution |
[6 Apr 2010 20:49]
Ryan Mack
[7 Apr 2010 9:45]
Sveta Smirnova
Thank you for the report. Verified as described, although in my case it is 11x
[9 Apr 2010 17:17]
Omer Barnir
This is the intended behavior: must fail once then it starts the retry countdown. (that's why 11x on the last comment)
[9 Apr 2010 17:28]
Ryan Mack
I'm unclear why this isn't a bug. When compiling the current code with -DNO_ALARM if I set wait_timeout to 10 seconds, the connections are kept open for 110 seconds. If we are relying on the timeouts to disconnect idle connections after the specified time, I don't think it makes sense to retry the read when it times out. An alternative fix to the one I provided would be to scale the timeout value down by a factor of 11. Thank you, Ryan
[19 Apr 2010 20:14]
Ryan Mack
Reopening because I still think this is a bug as described in my last comment. Thank you, Ryan
[23 Apr 2010 0:05]
Mark Callaghan
The existing behavior is already confusing. Retrying the read multiple times means the per-read timeout is a multiple of the timeout specified by the user. http://mysqlha.blogspot.com/2010/03/can-protocol-be-gpl.html This confusing behavior (retry 3 times, timeout is 3X what you set) is documented and apparently here to stay. Retrying 11 times is not documented. It is also silly. We should fix this now as MySQL 5.5 builds with -DNO_ALARM by default.
[23 Apr 2010 8:05]
Sveta Smirnova
Set bug status to "Documentation", because I got explanation why server behaves in such a way.
[23 Apr 2010 14:57]
Ryan Mack
Would you mind explaining why? Thanks, Ryan
[23 Apr 2010 16:17]
Mark Callaghan
This problem might have been fixed in 5.5.4-m3. In 5.5.4, vio_should_retry() is defined as: my_bool vio_should_retry(Vio * vio) { int en = socket_errno; /* man 2 read write EAGAIN or EWOULDBLOCK when a socket is a non-blocking mode means that the read/write would block. man 7 socket EAGAIN or EWOULDBLOCK when a socket is in a blocking mode means that the corresponding receiving or sending timeout was reached. */ return en == SOCKET_EINTR || (!vio_is_blocking(vio) && (en == SOCKET_EAGAIN || en == SOCKET_EWOULDBLOCK)); } And in 5.1.45 it is my_bool vio_should_retry(Vio * vio __attribute__((unused))) { int en = socket_errno; return (en == SOCKET_EAGAIN || en == SOCKET_EINTR || en == SOCKET_EWOULDBLOCK); } ------------------ This has a big effect on the start of loop in my_real_read. For the code below, when the client-side read timeout expires in 5.5.4-m3, vio_should_retry() returns 0 when there is a read timeout and vio_read() returned -1. But in MySQL 5.1.45, vio_should_retry() returns -1 on a timeout and we then run into the code that Ryan changed to fix things. Maybe we need to fix vio_should_retry in 5.1? pos = net->buff + net->where_b; /* net->packet -4 */ for (i=0 ; i < 2 ; i++) { while (remain > 0) { /* First read is done with non blocking mode */ if ((long) (length= vio_read(net->vio, pos, remain)) <= 0L) { my_bool interrupted = vio_should_retry(net->vio);
[26 Apr 2010 20:20]
Omer Barnir
Issue is fixed in 5.5. Will not be fixed in 5.1 as can have side effects
[30 Jun 2011 22:15]
Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products. Added this explanation to the net_retry_count description for the 5.0/5.1 manuals (5.5 and up are not affected) and referred to it from the net_read_timeout and net_write_timeout descriptions: On Linux, the NO_ALARM build flag (-DNO_ALARM) modifies how the binary treats both net_read_timeout and net_write_timeout. With this flag enabled, neither timer cancels the current statement until after the failing connection has been waited on an additional net_retry_count times. This means that the effective timeout value becomes (timeout setting) * (net_retry_count+1).