Bug #51244 wait_timeout fails on OpenSolaris
Submitted: 17 Feb 2010 13:43 Modified: 7 Dec 2011 3:25
Reporter: Alexander Nozdrin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: General Severity:S1 (Critical)
Version:M3 (Celosia) OS:Any
Assigned to: Davi Arnaut CPU Architecture:Any
Tags: experimental, pb2, test failure

[17 Feb 2010 13:43] Alexander Nozdrin
Description:
main.wait_timeout [ fail ]

CURRENT_TEST: main.wait_timeout
mysqltest: At line 58: query 'Failed to detect that client has been aborted' 

How to repeat:
Check daily-next-mr.
[3 Mar 2010 21:08] Sveta Smirnova
Thank you for the report.

Verified as described using pushbuild logs. Not repeatable on support machine.
[22 Mar 2010 12:24] Daniel Fischer
It's not a pushbuild bug, but there are bugs in almost everything else contributing to the test failure. This is what happens, as far as I understand it:

- During configure, we test whether socket timeouts work by creating an INET domain STREAM socket and trying to set SO_SNDTIMEO and SO_RCVTIMEO on it. If either of these calls to setsockopt() fails, timeouts are understood as being unsupported.
 
- Opensolaris 2009.06 (aka snv111b), which pushbuild runs, supports these timeouts on this type of socket, and hence, configure will set up the build so that socket timeouts are used instead of the more expensive alarm() based mechanism.

- Opensolaris up to at least snv101b (2008.11, tested in VBox) and Solaris 10 and below did not support these timeouts, and hence the alarm() based mechanism was used. According to Sveta, the support machine she mentioned above also runs snv101b. This explains why the test case doesn't fail there. (The test can still fail under very high load if it takes more than 30 seconds for the timeout to be noticed, and this has happened in the past as far as I know, but it doesn't fail consistently as it does on snv111b.)

- The main.wait_timeout test case tests timeouts on a UNIX domain STREAM socket, which we technically don't reliably know to work because there is no build-time test for this situation and no guarantee that socket timeouts are supported on all socket types when they are supported on one type. 

- A modified test case testing TCP socket timeouts (by replacing "localhost" with "127.0.0.1" as a quick hack) passes on snv111b.

- Setting timeouts on UNIX domain sockets with code similar to what we use in the configure test  on one of the snv machines pushbuild uses was accepted by the OS and no error was returned.

- Interestingly, retrieving the timeout values directly after setting them resulted in random garbage with a high number of zeroed-out results, but never the 2 second value that the test tries to set.

So we have bugs in Opensolaris (broken UNIX domain socket timeouts), our test suite (testing timeouts on wrong socket type), our configure script (only testing TCP timeouts and assuming that other socket types will be the same), and mysqld's general idea of how to implement socket timeouts (not differentiating between socket types).

There are multiple possible fixes. The most correct one, but unfortunately the one that is the most effort, is probably differentiating correctly between TCP sockets and UNIX domain sockets, testing timeout capabilities for both, and using socket timeouts only on sockets where they are known to be supported. The test also would have to verify that the timeout has been set using getsockopt() to work around the OpenSolaris bug. Technically, Opensolaris is just the first platform we found where the assumption that socket timeouts will work on UNIX domain sockets if they work on TCP sockets is wrong, and the same could happen on other platforms. I'm setting the category to "Server: General" because this fix would mostly touch the server source.

If this is not possible, the test in configure could be adapted so that both socket types are tested. Since mysqld does not differentiate between socket types, working timeouts on both types that we use would have to be made a requirement. This is not a very good idea because there are performance implications of using the more portable alarm() based mechanism. On the other hand, the only system that would be affected (as far as I can tell) is Opensolaris and we used alarm() there until very recently anyway because timeouts weren't supported until the last release.

Alternatively, we may just not care about timeouts on UNIX domain sockets. They are much less important because, without an unstable network between two sockets and both processes running on the same machine, most conditions where a client becomes unavailable will actually be recognised by the server in a timely manner. In this case, the test case should be changed to test timeouts on TCP connections instead and it should be documented that they don't work on specific platforms (snv111b plus others we may find in the future).

There is no workaround that can be applied on the pushbuild side.
[26 Apr 2010 14:49] Alexander Nozdrin
Making experimental on Solaris due to this bug.
[17 Jan 2011 9:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/128908

3255 Davi Arnaut	2011-01-17
      Bug#54790: Use of non-blocking mode for sockets limits performance
      Bug#51244: wait_timeout fails on OpenSolaris
      
      The problem was that a optimization for the case when the server
      uses alarms for timeouts could cause a slowdown when socket
      timeouts are used instead. In case alarms are used for timeouts,
      a non-blocking read is attempted first in order to avoid the
      cost of setting up a alarm. If this non-blocking read fails,
      the socket mode is changed to blocking and a alarm is armed.
      
      If socket timeout is used, there is no point in attempting a
      non-blocking read first as the timeout will be automatically
      armed by the OS. Yet the server would attempt a non-blocking
      read first and later switch the socket to blocking mode. This
      could inadvertently impact performance as switching the blocking
      mode of a socket requires at least two calls into the kernel
      on Linux, apart from problems inherited by the scalability
      of fcntl(2).
      
      The solution is to remove alarm based timeouts from the
      protocol layer and push timeout handling down to the virtual
      I/O layer. This approach allows the handling of socket timeouts
      on a platform-specific basis. The blocking mode of the socket
      is no longer exported and VIO read and write operations either
      complete or fail with a error or timeout.
      
      On Linux, the MSG_DONTWAIT flag is used to enable non-blocking
      send and receive operations. If the operation would block,
      poll() is used to wait for readiness or until a timeout occurs.
      This strategy avoids the need to set the socket timeout and
      blocking mode twice per query.
      
      On Windows, as before, the timeout is set on a per-socket
      fashion. In all remaining operating systems, the socket is set
      to non-blocking mode and poll() is used to wait for readiness
      or until a timeout occurs.
      
      In order to cleanup the code after the removal of alarm based
      timeouts, the low level packet reading loop is unrolled into
      two specific sequences: reading the packet header and the
      payload. This make error handling easier down the road.
      
      In conclusion, benchmarks have shown that these changes do not
      introduce any performance hits and actually slightly improves
      the server throughput for higher numbers of threads.
      
      Incompatible changes:
      
      A timeout is now always applied to a individual receive or
      send I/O operation. In contrast, a alarm based timeout was
      applied to an entire send or receive packet operation.
      
      Building and running MySQL on POSIX systems now requires
      support for poll() and O_NONBLOCK. These should be available
      in any modern POSIX system.
      
      O Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT
      is no longer 20 seconds. The default value now is no timeout
      (infinite), the same as in all other platforms.
     @ include/my_global.h
        A socket timeout is now signalled with a ETIMEDOUT error,
        which in some cases is emulated (that is, set explicitly).
        
        Add a wrapper for ECONNRESET so that it can be used to map
        SSL-specific errors.
     @ include/violite.h
        Introduce a vio_io_wait method which can be used to wait
        for I/O events on a VIO. The supported I/O events are read
        and write. The notification of error conditions is implicit.
        
        Extend vio_reset, which was SSL-specific, to be able to
        rebind a new socket to an already initialized Vio object.
        
        Remove vio_is_blocking. The blocking mode is no longer
        exported. Also, remove fcntl_mode as the blocking mode is
        no longer buffered. It is now a product of the timeout.
        
        Add prototype for a wrapper vio_timeout() function, which
        invokes the underlying timeout method of the VIO. Add to
        VIO two member variables to hold the read and write timeout
        values.
        
        Rename vio_was_interrupted to vio_was_timeout.
        
        Remove vio_poll_read, which is now superseded vio_io_wait.
        
        Introduce vio_connect to avoid code duplication.
     @ mysql-test/t/ssl.test
        Add test case to ensure that timeouts work over SSL.
     @ mysql-test/t/wait_timeout.test
        Add test case for Bug#54790.
     @ sql-common/client.c
        Look into last_errno to check whether the socket was
        interrupted due to a timeout.
        
        Add a couple of wrappers to retrieve the connect_timeout
        option value.
        
        Use vio_io_wait instead of vio_poll_read.
        
        The result of a failed socket operation is in socket_errno,
        which is WSAGetLastError on Windows.
        
        Replace my_connect() and wait_for_data() with vio_connect()
        to avoid code duplication. The polling inside wait_for_data()
        is somewhat equivalent to wait_for_io(). Also, this can be
        considered a bug fix as wait_for_data() would incorrectly
        use POLLIN (there is data to read) to wait (poll) for the
        connection to be established. This problem probably never
        surfaced because the OS network layer timeout is usually
        lower then our default.
     @ sql/net_serv.cc
        Remove net_data_is_ready, it is no longer necessary as its
        functionality is now implemented by vio_io_wait.
        
        Remove the use of signal based alarms for timeouts. Instead,
        rely on the underlying VIO layer to provide socket timeouts
        and control the blocking mode. The error handling is changed
        to use vio_was_timeout() to recognize a timeout occurrence
        and vio_should_retry() to detect interrupted I/O operations.
        
        The loop in the packet reading path (my_real_path) is unrolled
        into two steps: reading the packet header and payload. Each
        step is now documented and should be easier to understand.
        
        net_clear() no longer drains the socket buffer. Unread data
        on the socket constitutes a protocol violation and shouldn't
        be skipped. The previous behavior was race prone anyway
        as it only drained data on the socket buffer, missing any
        data in transit. Instead, net_clear() now just checks if
        the endpoint is still connected.
        
        Remove use of the MYSQL_INSTANCE_MANAGER macro.
     @ tests/mysql_client_test.c
        Add test case to ensure that the MYSQL_OPT_READ_TIMEOUT option
        works as expected.
     @ vio/vio.c
        Rename no_poll_read to no_io_wait. The stub method is used
        for the transport types named pipe and shared memory.
        
        Remove hPipe parameter to vio_init, it is now set in the
        vio_new_win32pipe function.
        
        Initialize timeout values to -1 (infinite) in vio_init.
        
        Improve vio_reset to also reinitialize timeouts and assert
        that rebinding is only supported for socket-based transport
        types.
        
        Move event object creation to its own initialization function
        and perform error checking.
        
        Remove caching of the socket status flags, including blocking
        mode. It is no longer necessary.
        
        Add a vio_timeout wrapper which adjusts the timeout value
        from seconds to milliseconds and invokes the underlying
        VIO-specific timeout handler. It also helps the underlying
        handler detect the previous state of the timeouts (activated
        or deactivated).
     @ vio/viopipe.c
        Simplify the pipe transport code and adjust to handle the
        new timeout interface.
     @ vio/vioshm.c
        Move shared memory related code over from viosocket.c
     @ vio/viosocket.c
        Use vio_ssl_errno to retrieve the error number if the VIO
        type is SSL.
        
        Add the vio_socket_io_wait() helper which wraps vio_io_wait()
        in order to avoid code duplication in vio_write()/vio_read().
        The function selects a appropriate timeout (based on the passed
        event) and translates the return value.
        
        Re-implement vio_read()/vio_write() as simple event loops. The
        loop consists of retrying the I/O operation until it succeeds
        or fails with a non-recoverable error. If it fails with error
        set to EAGAIN or EWOULDBLOCK, vio_io_wait() is used to wait
        for the socket to become ready. On Linux, the MSG_DONTWAIT
        flag is used to get the same effect as if the socket is in
        non-blocking mode.
        
        Add vio_set_blocking() which tweaks the blocking mode of
        a socket.
        
        Add vio_win32_timeout() which sets the timeout of a socket
        on Windows.
        
        Add vio_socket_timeout() which implements the logic around
        when a socket should be switched to non-blocking mode. Except
        on Windows, the socket is put into non-blocking mode if a
        timeout is set.
        
        vio_should_retry now indicates whether a I/O operation was
        interrupted by a temporary failure (interrupted by a signal)
        and should be retried later. vio_was_timeout indicates
        whether a I/O operation timed out.
        
        Remove socket_poll_read, now superseded by vio_io_wait.
        
        Implement vio_io_wait() using select() on Windows. Otherwise,
        use poll(). vio_io_wait() will loop until either a timeout
        occurs, or the socket becomes ready. In the presence of
        spurious wake-ups, the wait is resumed if possible.
        
        Move code related to the transport types named pipe and
        shared memory to their own files.
        
        Introduce vio_connect to avoid code duplication. The
        function implements a timed connect by setting the socket
        to non-blocking and polling until the connection is either
        established or fails.
     @ vio/viossl.c
        Re-implement vio_ssl_read() and vio_ssl_write() as simple
        event loops. Use vio_socket_io_wait() to handle polling,
        the SSL transport can only be used with sockets.
        
        Obtain the error status from SSL. Map the obtained error
        code to a platform equivalent one.
[7 Dec 2011 3:25] Paul DuBois
Noted in 5.6.3 changelog.

Changelog entry same as for Bug#54790.