Bug #51244 | wait_timeout fails on OpenSolaris | ||
---|---|---|---|
Submitted: | 17 Feb 2010 13:43 | Modified: | 7 Dec 2011 3:25 |
Reporter: | Alexander Nozdrin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: General | Severity: | S1 (Critical) |
Version: | M3 (Celosia) | OS: | Any |
Assigned to: | Davi Arnaut | CPU Architecture: | Any |
Tags: | experimental, pb2, test failure |
[17 Feb 2010 13:43]
Alexander Nozdrin
[3 Mar 2010 21:08]
Sveta Smirnova
Thank you for the report. Verified as described using pushbuild logs. Not repeatable on support machine.
[22 Mar 2010 12:24]
Daniel Fischer
It's not a pushbuild bug, but there are bugs in almost everything else contributing to the test failure. This is what happens, as far as I understand it: - During configure, we test whether socket timeouts work by creating an INET domain STREAM socket and trying to set SO_SNDTIMEO and SO_RCVTIMEO on it. If either of these calls to setsockopt() fails, timeouts are understood as being unsupported. - Opensolaris 2009.06 (aka snv111b), which pushbuild runs, supports these timeouts on this type of socket, and hence, configure will set up the build so that socket timeouts are used instead of the more expensive alarm() based mechanism. - Opensolaris up to at least snv101b (2008.11, tested in VBox) and Solaris 10 and below did not support these timeouts, and hence the alarm() based mechanism was used. According to Sveta, the support machine she mentioned above also runs snv101b. This explains why the test case doesn't fail there. (The test can still fail under very high load if it takes more than 30 seconds for the timeout to be noticed, and this has happened in the past as far as I know, but it doesn't fail consistently as it does on snv111b.) - The main.wait_timeout test case tests timeouts on a UNIX domain STREAM socket, which we technically don't reliably know to work because there is no build-time test for this situation and no guarantee that socket timeouts are supported on all socket types when they are supported on one type. - A modified test case testing TCP socket timeouts (by replacing "localhost" with "127.0.0.1" as a quick hack) passes on snv111b. - Setting timeouts on UNIX domain sockets with code similar to what we use in the configure test on one of the snv machines pushbuild uses was accepted by the OS and no error was returned. - Interestingly, retrieving the timeout values directly after setting them resulted in random garbage with a high number of zeroed-out results, but never the 2 second value that the test tries to set. So we have bugs in Opensolaris (broken UNIX domain socket timeouts), our test suite (testing timeouts on wrong socket type), our configure script (only testing TCP timeouts and assuming that other socket types will be the same), and mysqld's general idea of how to implement socket timeouts (not differentiating between socket types). There are multiple possible fixes. The most correct one, but unfortunately the one that is the most effort, is probably differentiating correctly between TCP sockets and UNIX domain sockets, testing timeout capabilities for both, and using socket timeouts only on sockets where they are known to be supported. The test also would have to verify that the timeout has been set using getsockopt() to work around the OpenSolaris bug. Technically, Opensolaris is just the first platform we found where the assumption that socket timeouts will work on UNIX domain sockets if they work on TCP sockets is wrong, and the same could happen on other platforms. I'm setting the category to "Server: General" because this fix would mostly touch the server source. If this is not possible, the test in configure could be adapted so that both socket types are tested. Since mysqld does not differentiate between socket types, working timeouts on both types that we use would have to be made a requirement. This is not a very good idea because there are performance implications of using the more portable alarm() based mechanism. On the other hand, the only system that would be affected (as far as I can tell) is Opensolaris and we used alarm() there until very recently anyway because timeouts weren't supported until the last release. Alternatively, we may just not care about timeouts on UNIX domain sockets. They are much less important because, without an unstable network between two sockets and both processes running on the same machine, most conditions where a client becomes unavailable will actually be recognised by the server in a timely manner. In this case, the test case should be changed to test timeouts on TCP connections instead and it should be documented that they don't work on specific platforms (snv111b plus others we may find in the future). There is no workaround that can be applied on the pushbuild side.
[26 Apr 2010 14:49]
Alexander Nozdrin
Making experimental on Solaris due to this bug.
[17 Jan 2011 9:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/128908 3255 Davi Arnaut 2011-01-17 Bug#54790: Use of non-blocking mode for sockets limits performance Bug#51244: wait_timeout fails on OpenSolaris The problem was that a optimization for the case when the server uses alarms for timeouts could cause a slowdown when socket timeouts are used instead. In case alarms are used for timeouts, a non-blocking read is attempted first in order to avoid the cost of setting up a alarm. If this non-blocking read fails, the socket mode is changed to blocking and a alarm is armed. If socket timeout is used, there is no point in attempting a non-blocking read first as the timeout will be automatically armed by the OS. Yet the server would attempt a non-blocking read first and later switch the socket to blocking mode. This could inadvertently impact performance as switching the blocking mode of a socket requires at least two calls into the kernel on Linux, apart from problems inherited by the scalability of fcntl(2). The solution is to remove alarm based timeouts from the protocol layer and push timeout handling down to the virtual I/O layer. This approach allows the handling of socket timeouts on a platform-specific basis. The blocking mode of the socket is no longer exported and VIO read and write operations either complete or fail with a error or timeout. On Linux, the MSG_DONTWAIT flag is used to enable non-blocking send and receive operations. If the operation would block, poll() is used to wait for readiness or until a timeout occurs. This strategy avoids the need to set the socket timeout and blocking mode twice per query. On Windows, as before, the timeout is set on a per-socket fashion. In all remaining operating systems, the socket is set to non-blocking mode and poll() is used to wait for readiness or until a timeout occurs. In order to cleanup the code after the removal of alarm based timeouts, the low level packet reading loop is unrolled into two specific sequences: reading the packet header and the payload. This make error handling easier down the road. In conclusion, benchmarks have shown that these changes do not introduce any performance hits and actually slightly improves the server throughput for higher numbers of threads. Incompatible changes: A timeout is now always applied to a individual receive or send I/O operation. In contrast, a alarm based timeout was applied to an entire send or receive packet operation. Building and running MySQL on POSIX systems now requires support for poll() and O_NONBLOCK. These should be available in any modern POSIX system. O Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT is no longer 20 seconds. The default value now is no timeout (infinite), the same as in all other platforms. @ include/my_global.h A socket timeout is now signalled with a ETIMEDOUT error, which in some cases is emulated (that is, set explicitly). Add a wrapper for ECONNRESET so that it can be used to map SSL-specific errors. @ include/violite.h Introduce a vio_io_wait method which can be used to wait for I/O events on a VIO. The supported I/O events are read and write. The notification of error conditions is implicit. Extend vio_reset, which was SSL-specific, to be able to rebind a new socket to an already initialized Vio object. Remove vio_is_blocking. The blocking mode is no longer exported. Also, remove fcntl_mode as the blocking mode is no longer buffered. It is now a product of the timeout. Add prototype for a wrapper vio_timeout() function, which invokes the underlying timeout method of the VIO. Add to VIO two member variables to hold the read and write timeout values. Rename vio_was_interrupted to vio_was_timeout. Remove vio_poll_read, which is now superseded vio_io_wait. Introduce vio_connect to avoid code duplication. @ mysql-test/t/ssl.test Add test case to ensure that timeouts work over SSL. @ mysql-test/t/wait_timeout.test Add test case for Bug#54790. @ sql-common/client.c Look into last_errno to check whether the socket was interrupted due to a timeout. Add a couple of wrappers to retrieve the connect_timeout option value. Use vio_io_wait instead of vio_poll_read. The result of a failed socket operation is in socket_errno, which is WSAGetLastError on Windows. Replace my_connect() and wait_for_data() with vio_connect() to avoid code duplication. The polling inside wait_for_data() is somewhat equivalent to wait_for_io(). Also, this can be considered a bug fix as wait_for_data() would incorrectly use POLLIN (there is data to read) to wait (poll) for the connection to be established. This problem probably never surfaced because the OS network layer timeout is usually lower then our default. @ sql/net_serv.cc Remove net_data_is_ready, it is no longer necessary as its functionality is now implemented by vio_io_wait. Remove the use of signal based alarms for timeouts. Instead, rely on the underlying VIO layer to provide socket timeouts and control the blocking mode. The error handling is changed to use vio_was_timeout() to recognize a timeout occurrence and vio_should_retry() to detect interrupted I/O operations. The loop in the packet reading path (my_real_path) is unrolled into two steps: reading the packet header and payload. Each step is now documented and should be easier to understand. net_clear() no longer drains the socket buffer. Unread data on the socket constitutes a protocol violation and shouldn't be skipped. The previous behavior was race prone anyway as it only drained data on the socket buffer, missing any data in transit. Instead, net_clear() now just checks if the endpoint is still connected. Remove use of the MYSQL_INSTANCE_MANAGER macro. @ tests/mysql_client_test.c Add test case to ensure that the MYSQL_OPT_READ_TIMEOUT option works as expected. @ vio/vio.c Rename no_poll_read to no_io_wait. The stub method is used for the transport types named pipe and shared memory. Remove hPipe parameter to vio_init, it is now set in the vio_new_win32pipe function. Initialize timeout values to -1 (infinite) in vio_init. Improve vio_reset to also reinitialize timeouts and assert that rebinding is only supported for socket-based transport types. Move event object creation to its own initialization function and perform error checking. Remove caching of the socket status flags, including blocking mode. It is no longer necessary. Add a vio_timeout wrapper which adjusts the timeout value from seconds to milliseconds and invokes the underlying VIO-specific timeout handler. It also helps the underlying handler detect the previous state of the timeouts (activated or deactivated). @ vio/viopipe.c Simplify the pipe transport code and adjust to handle the new timeout interface. @ vio/vioshm.c Move shared memory related code over from viosocket.c @ vio/viosocket.c Use vio_ssl_errno to retrieve the error number if the VIO type is SSL. Add the vio_socket_io_wait() helper which wraps vio_io_wait() in order to avoid code duplication in vio_write()/vio_read(). The function selects a appropriate timeout (based on the passed event) and translates the return value. Re-implement vio_read()/vio_write() as simple event loops. The loop consists of retrying the I/O operation until it succeeds or fails with a non-recoverable error. If it fails with error set to EAGAIN or EWOULDBLOCK, vio_io_wait() is used to wait for the socket to become ready. On Linux, the MSG_DONTWAIT flag is used to get the same effect as if the socket is in non-blocking mode. Add vio_set_blocking() which tweaks the blocking mode of a socket. Add vio_win32_timeout() which sets the timeout of a socket on Windows. Add vio_socket_timeout() which implements the logic around when a socket should be switched to non-blocking mode. Except on Windows, the socket is put into non-blocking mode if a timeout is set. vio_should_retry now indicates whether a I/O operation was interrupted by a temporary failure (interrupted by a signal) and should be retried later. vio_was_timeout indicates whether a I/O operation timed out. Remove socket_poll_read, now superseded by vio_io_wait. Implement vio_io_wait() using select() on Windows. Otherwise, use poll(). vio_io_wait() will loop until either a timeout occurs, or the socket becomes ready. In the presence of spurious wake-ups, the wait is resumed if possible. Move code related to the transport types named pipe and shared memory to their own files. Introduce vio_connect to avoid code duplication. The function implements a timed connect by setting the socket to non-blocking and polling until the connection is either established or fails. @ vio/viossl.c Re-implement vio_ssl_read() and vio_ssl_write() as simple event loops. Use vio_socket_io_wait() to handle polling, the SSL transport can only be used with sockets. Obtain the error status from SSL. Map the obtained error code to a platform equivalent one.
[7 Dec 2011 3:25]
Paul DuBois
Noted in 5.6.3 changelog. Changelog entry same as for Bug#54790.