Bug #54790 Use of non-blocking mode for sockets limits performance
Submitted: 25 Jun 2010 3:58 Modified: 3 Jun 2011 2:10
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: General Severity:S3 (Non-critical)
Version:5.1.47 OS:Any
Assigned to: Davi Arnaut CPU Architecture:Any
Tags: blocking, net_serv, performance, sysbench
Triage: Triaged: D3 (Medium) / R2 (Low)

[25 Jun 2010 3:58] Mark Callaghan
Description:
See http://www.facebook.com/notes/mysqlfacebook/using-pmp-to-double-mysql-throughput-part-1/40...

Use of non-blocking mode for sockets (see my_net_init in net_serv.cc) limits throughput on Linux servers with 16 cores.

How to repeat:
All details are in the facebook note

Suggested fix:
Stop using non-blocking mode
[25 Jun 2010 3:59] Mark Callaghan
This is a performance issue.
[25 Jun 2010 19:01] Davi Arnaut
I find it a bit hard to believe that the non-blocking mode is the culprit here, I would bet the problem is around some misuse of non-blocking i/o pattern. This should be investigated thoroughly, especially given that long term we would like to always be non-blocking in order to better handle kill.
[25 Jun 2010 19:05] Davi Arnaut
Also, FWIW, lock_kernel is known as the "big kernel lock" and there is some recent work in the kernel to fix it. Nonetheless, if mysql keeps calling fcntl to tweak the blocking, we know who the real culprit is.
[25 Jun 2010 19:09] Davi Arnaut
Not really a feature request either if it calls fcntl frequently, its not how the non-blocking i/o pattern is supposed to work.
[25 Jun 2010 19:12] Mark Callaghan
It would help if you added instrumentation to count number of reads and number of times socket blocking mode had to be changed after non-blocking read failed.
[25 Jun 2010 19:14] Davi Arnaut
Incidentally, I have a work in progress patch that might solve this if the problem is due to intensive switching of the blocking mode. To make kill reliable, it removes all the signal/alarm stuff and uses non-blocking mode and poll to monitor the user socket and a notification fd.
[25 Jun 2010 19:15] Davi Arnaut
If using PMP, a variant could be quite easy. IIRC, strace can provide count of specific calls..
[25 Jun 2010 19:21] Mark Callaghan
We build with  -DNO_ALARM -DSIGNAL_WITH_VIO_CLOSE. The alarm code really needs to be disabled on servers that don't require it. You have many more options for doing this once connection pool is used by default. Alas, I can't wait for that in a GA release.
[25 Jun 2010 19:21] Davi Arnaut
Indeed, counters there could be quite helpful in debugging network issues. We should probably add some statistics to the vio layer so the server layer can expose it.
[26 Jun 2010 11:07] Sveta Smirnova
Results in my environment with build BUILD/compile-pentium64-debug-max which does not show any improvement if patch used..

QPS

num of threads    1              2              4              8              16            32            64
without patch       1028.75   1799.52   2805.87   4413.71   4460.98   4142.71   3346.61   
with patch            614.81     1493.58   2765.10   4495.48   4523.66   4138.18   3321.12
[26 Jun 2010 12:28] Mark Callaghan
Sveta, how many cores does your server have? I reproduce this on a 16 core server.
[26 Jun 2010 12:30] Mark Callaghan
...and I ran sysbench and mysqld on separate servers. Note that your results degrade after 32, so you can reproduce part of my results.
[26 Jun 2010 12:46] Sveta Smirnova
Mark,

this is 4 cores-box. I will try on 16-cores box, thought if use compile options similar to what you use results with patch are better:

                1         2         4         8         16        32        64
without patch   1315.22   2341.90   4604.90   5855.23   5817.66   5744.56   5690.91
with patch      1292.31   2341.73   4605.39   5889.03   5857.50   5876.27   5724.56
[26 Jun 2010 13:13] Mark Callaghan
I am repeating all of my tests using a db server with 8 cores. After that I might pin mysqld to 4 cores, but I bet that the benefit of the patch decreases greatly on servers with fewer cores.
[28 Jun 2010 8:57] Sveta Smirnova
Thank you for the feedback.

I tested it on 16-cores machine and got results:

                1         2         4         8         16        32        64
without patch   1348.68   2361.22   4082.03   5246.22   5969.74   4881.80   3247.25
(clean run)
with patch      1338.86   2498.24   4770.70   6418.95   6172.78   4893.08   3241.13

So improvement exists. Bug set to "Verified".
[28 Jun 2010 19:48] Davi Arnaut
A possible low risk fix for 5.1: use recv with MSG_DONTWAIT instead of switching the non-blocking mode.
[29 Jun 2010 22:23] Mark Callaghan
Results for facebook patch with MySQL 5.1.47.

First column uses a 3-part name to describe the mysqld host.
* part 1 is CPU
* part 2 is storage -- always 'tmpfs' in this case
* part 3 is binary

For CPU:
* x8 is 8-core xeon (intel 5420)
* n8 is 8-core nehalem with hyperthreading off (intel 5520)
* n16 is nehalem above with hyperthrading on (intel 5520)

All binaries are MySQL 5.1.47 with the facebook patch. All use the innodb_fast_checksum option. These are listed in order so that each binary gets all of the changes for the binaries listed above it and then one more patch:
* "-base" is only the changes listed above
* "-bufra" is BUF_READ_AHEAD_AREA changed to be the constant "64"
* "-margin" is changes to make buf_flush_free_margin less expensive
* "-fcntl" is change to not use socket in non-blocking mode

Results are QPS for 1, 2, 4, 8, 16, 32, 64, 128 and 256 concurrent connections

x8-tmpfs-base: 919,1974,3887,9900,22792,34944,38520,38777,36876
x8-tmpfs-bufra: 974,2061,3886,9680,23488,48903,51953,51419,48423
x8-tmpfs-margin: 915,2008,3888,9448,24738,77720,73758,70169,67583
x8-tmpfs-fcntl: 864,1999,4013,9281,21366,73939,118858,113052,87093

n8-tmpfs-base: 1378,2742,5834,11917,25039,38713,43546,41952,38505
n8-tmpfs-bufra: 1378,2835,5489,13480,25843,59438,66626,61059,54523
n8-tmpfs-margin: 1351,2730,5397,12538,27379,74807,96857,86876,76166
n8-tmpfs-fcntl: 1435,2583,6112,12860,29970,77612,99092,90253,77524

n16-tmpfs-base: 919,1974,3887,9900,22792,34944,38520,38777,36876
n16-tmpfs-bufra: 974,2061,3886,9680,23488,48903,51953,51419,48423
n16-tmpfs-margin: 915,2008,3888,9448,24738,77720,73758,70169,67583
n16-tmpfs-fcntl: 864,1999,4013,9281,21366,73939,118858,113052,87093
[2 Jul 2010 3:46] Mark Callaghan
my_net_init calls vio_blocking to put the socket in non-blocking mode. It ignores the return value from vio_blocking. my_real_read calls vio_blocking in a loop until it returns a value >= 0. Why doesn't my_net_init check the return value?
[2 Jul 2010 4:12] Mark Callaghan
This is wrong. EINTR means interrupted. EGAIN and EWOULDBLOCK mean that the call timed out when SO_SNDTIMEO/SO_RCVTIMEO are used. Those are different things and shouldn't be confused.

my_bool
vio_was_interrupted(Vio *vio __attribute__((unused)))
{
  int en= socket_errno;
  return (en == SOCKET_EAGAIN || en == SOCKET_EINTR ||
          en == SOCKET_EWOULDBLOCK || en == SOCKET_ETIMEDOUT);
}
[2 Jul 2010 13:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/112760

3456 Davi Arnaut	2010-07-02
      Bug#54790: Use of non-blocking mode for sockets limits performance
      
      The problem was that a optimization for the case when the
      server uses alarms for timeouts could cause a slowdown when
      socket timeouts are used instead.
      
      In case alarms are used for timeouts, a non-blocking read
      is attempted first in order to avoid the cost of setting up
      a alarm. If the non-blocking read fails, the socket mode is
      changed to blocking and a alarm is armed.
      
      If socket timeout is used, there is no point in attempting a
      non-blocking read first as the timeout will be automatically
      armed by the OS. Yet the server would attempt a non-blocking
      read first and later switch the socket to blocking mode. This
      could inadvertently impact performance as switching the blocking
      mode of a socket requires at least two calls into the kernel,
      apart from problems inherited by the scalability of fcntl.
      
      The solution is to not attempt a non-blocking read first if
      built-in socket timeouts are being used. Also, in this case,
      spurious interruptions with a errno value of EINTR should
      be ignored.
     @ sql/net_serv.cc
        Do not set socket to non-blocking if not using alarms.
        
        Assert that the socket mode is blocking if using std
        socket timeouts. Also, ignore spurious interruptions
        in this mode.
[2 Jul 2010 13:16] Davi Arnaut
Low risk fix for 5.1. Patch for trunk will remove the use of alarms.
[2 Jul 2010 13:24] Davi Arnaut
Meh, what a mess is the error handling of vio_read. Need to take into account the case Mark mentioned, detect a timeout when read fails for the no-alarm case.
[2 Jul 2010 13:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/112763

3456 Davi Arnaut	2010-07-02
      Bug#54790: Use of non-blocking mode for sockets limits performance
      
      The problem was that a optimization for the case when the
      server uses alarms for timeouts could cause a slowdown when
      socket timeouts are used instead.
      
      In case alarms are used for timeouts, a non-blocking read
      is attempted first in order to avoid the cost of setting up
      a alarm. If the non-blocking read fails, the socket mode is
      changed to blocking and a alarm is armed.
      
      If socket timeout is used, there is no point in attempting a
      non-blocking read first as the timeout will be automatically
      armed by the OS. Yet the server would attempt a non-blocking
      read first and later switch the socket to blocking mode. This
      could inadvertently impact performance as switching the blocking
      mode of a socket requires at least two calls into the kernel,
      apart from problems inherited by the scalability of fcntl.
      
      The solution is to not attempt a non-blocking read first if
      built-in socket timeouts are being used. Also, in this case,
      spurious interruptions with a errno value of EINTR should
      be ignored.
     @ sql/net_serv.cc
        Do not set socket to non-blocking if not using alarms.
        
        Assert that the socket mode is blocking if using std
        socket timeouts. Also, ignore spurious interruptions
        in this mode.
[2 Jul 2010 14:35] Mark Callaghan
I have few more requests.

There are tabs in this code. Normally that isn't horrible, but with all of the ifdefs I frequently read the preprocessed output to figure out what the code really does and tabs don't make that output hard to read.

This comment is wrong as the code puts the socket in blocking mode:

#if !defined(__WIN__) || defined(MYSQL_SERVER)
         /*
           We got an error that there was no data on the socket. We now set up
           an alarm to not 'read forever', change the socket to non blocking
           mode and try again
         */
         if ((interrupted || length == 0) && !thr_alarm_in_use(&alarmed))
         {
           if (!thr_alarm(&alarmed,net->read_timeout,&alarm_buff)) /*
Don't wait too long */
           {
             my_bool old_mode;
             while (vio_blocking(net->vio, TRUE, &old_mode) < 0)
             {
[2 Jul 2010 14:36] Mark Callaghan
One more question -- are there two checks for EINTR in my_real_read?

This is the first:

        if (thr_alarm_in_use(&alarmed) && !thr_got_alarm(&alarmed) &&
             interrupted)
         {                                     /* Probably in MIT threads */
           if (retry_count++ < net->retry_count)
             continue;
#ifdef EXTRA_DEBUG
           fprintf(stderr, "%s: read looped with error %d, aborting thread\n",
                   my_progname,vio_errno(net->vio));
#endif /* EXTRA_DEBUG */
         }

And this might be the second.:

#if defined(THREAD_SAFE_CLIENT) && !defined(MYSQL_SERVER)
         if (vio_errno(net->vio) == SOCKET_EINTR)
         {
           DBUG_PRINT("warning",("Interrupted read. Retrying..."));
           continue;
         }
#endif
[2 Jul 2010 17:08] Mark Callaghan
Why does net_real_write use: !defined(NO_ALARM) && !defined(__WIN__)
But my_real_read uses:
  !defined(NO_ALARM) && (!defined(__WIN__) || defined(MYSQL_SERVER)) 

from net_real_write:

    if ((long) (length= vio_write(net->vio,pos,(size_t) (end-pos))) <= 0)
    {
      my_bool interrupted = vio_should_retry(net->vio);
#if !defined(NO_ALARM) && !defined(__WIN__)
      if ((interrupted || length == 0) && !thr_alarm_in_use(&alarmed))

from my_real_read:

       /* First read is done with non blocking mode */
        if ((long) (length= vio_read(net->vio, pos, remain)) <= 0L)
        {
          my_bool interrupted = vio_should_retry(net->vio);

          DBUG_PRINT("info",("vio_read returned %ld  errno: %d",
                             (long) length, vio_errno(net->vio)));

#if !defined(NO_ALARM) && (!defined(__WIN__) || defined(MYSQL_SERVER))
[6 Jul 2010 19:50] Davi Arnaut
> Why does net_real_write use: !defined(NO_ALARM) && !defined(__WIN__)
> But my_real_read uses:
>  !defined(NO_ALARM) && (!defined(__WIN__) || defined(MYSQL_SERVER)) 

I don't know, probably a bug.
[28 Jul 2010 18:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114559

3158 Davi Arnaut	2010-07-28
      Bug#54790: Use of non-blocking mode for sockets limits performance
      
      A pre-requisite patch for unifying the handling of socket timeouts.
      This patch removes alarm based timeouts from the network layer
      and transitions the code to a pure socket based timeout approach.
      
      The handling of network I/O errors is split and documented, the
      server and client libraries have different approaches to treating
      interruptions.
      
      Also, the low level packet reading loop is unrolled into two specific
      sequences: reading the packet header and the payload. This make
      error handling easier down the road.
     @ include/my_global.h
        A socket timeout is signalled with EAGAIN/EWOULDBLOCK. See socket(7).
     @ sql/net_serv.cc
        Remove the use of signal based alarms for timeouts. Instead, rely
        on socket timeouts and blocking I/O. The error handling is changed
        to recognize EAGAIN/EWOULDBLOCK as a timeout and EINTR as a forced
        interruption depending on how the server was built.
        
        Furthermore, the loop in the packet reading path (my_real_path) is
        unrolled into two steps: reading the packet header and payload.
        Each step is now documented and should be easier to understand.
     @ vio/viosocket.c
        In some cases, I/O operations should only be retried on EINTR.
        Socket timeout is indicated with SOCKET_ETIMEDOUT. Method should
        be renamed to not cause confusion with other types of interruption.
[28 Jul 2010 18:30] Ryan Mack
From our office IRC session:

ryandmack:  i <3 this patch:  http://lists.mysql.com/commits/114559

mcallaghan: just got email for it, sounds like it might be my all time favorite patch

domas: :)
[3 Oct 2010 5:28] Shane Bester
Kostja, is this patch reviewed and safe to apply to 5.1 yet ?
[3 Oct 2010 12:18] Davi Arnaut
No, the correct status is in progress. Socket based timeouts do not work on HP-UX and some versions of Solaris. We are moving things to a poll() based timeout, but we need to make KILL reliable first (Bug#37780).
[16 Nov 2010 10:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124009

3186 Davi Arnaut	2010-11-16
      Bug#54790: Use of non-blocking mode for sockets limits performance
      
      The problem was that a optimization for the case when the
      server uses alarms for timeouts could cause a slowdown when
      socket timeouts are used instead. In case alarms are used for
      timeouts, a non-blocking read is attempted first in order to
      avoid the cost of setting up a alarm. If this non-blocking
      read fails, the socket mode is changed to blocking and a
      alarm is armed.
      
      If socket timeout is used, there is no point in attempting a
      non-blocking read first as the timeout will be automatically
      armed by the OS. Yet the server would attempt a non-blocking
      read first and later switch the socket to blocking mode. This
      could inadvertently impact performance as switching the
      blocking mode of a socket requires at least two calls into
      the kernel on Linux, apart from problems inherited by the
      scalability of fcntl.
      
      The solution is to remove alarm based timeouts from the
      protocol layer and push timeout handling down to the
      virtual I/O layer. This approach allows the handling of
      socket timeouts on a platform-specific basis. The blocking
      mode of the socket is no longer exported and VIO read and
      write operations will either complete or fail with a error
      or timeout.
      
      On Linux, the MSG_DONTWAIT flag is used to enable non-blocking
      send and receive operations. If the operation would block,
      poll() is used to wait for readiness or until a timeout
      occurs. This strategy avoids the need to set the socket
      timeout and blocking mode twice per query.
      
      On Windows, as before, the timeout is set on a per-socket
      fashion. In all remaining operating systems, the socket is
      set to non-blocking mode and poll() is used to wait for
      readiness or until a timeout occurs.
      
      In order to cleanup the code after the removal of alarm based
      timeouts, the low level packet reading loop is unrolled into
      two specific sequences: reading the packet header and the
      payload. This make error handling easier down the road.
     @ include/my_global.h
        A socket timeout is now signalled with a ETIMEDOUT error,
        which in some cases is emulated (that is, set explicitly).
        
        Add a wrapper for ECONNRESET so that it can be used to map
        SSL-specific errors.
     @ include/violite.h
        Introduce a vio_io_wait method which can be used to wait
        for I/O events on a VIO. The supported I/O events are read
        and write. The notification of error conditions is implicit.
        
        Extend vio_reset, which was SSL-specific, to be able to
        rebind a new socket to an already initialized Vio object.
        
        Remove vio_is_blocking. The blocking mode is no longer
        exported. Also, remove fcntl_mode as the blocking mode is
        no longer buffered. It is now a product of the timeout.
        
        Add prototype for a wrapper vio_timeout() function, which
        invokes the underlying timeout method of the VIO. Add to
        VIO two member variables to hold the read and write timeout
        values.
        
        Rename vio_was_interrupted to vio_was_timeout.
        
        Remove vio_poll_read, which is now superseded vio_io_wait.
        
        Introduce vio_connect to avoid code duplication.
     @ mysql-test/t/ssl.test
        Add test case to ensure that timeouts work over SSL.
     @ mysql-test/t/wait_timeout.test
        Add test case for Bug#54790.
     @ sql-common/client.c
        Look into last_errno to check whether the socket was
        interrupted due to a timeout.
        
        Use vio_io_wait instead of vio_poll_read.
        
        The result of a failed VIO socket operation is in
        socket_errno, which is WSAGetLastError on Windows.
        
        Replace my_connect() and wait_for_data() with vio_connect()
        to avoid code duplication. The polling inside wait_for_data()
        is somewhat equivalent to wait_for_io(). Also, this can be
        considered a bug fix as wait_for_data() would incorrectly
        use POLLIN (there is data to read) to wait (poll) for the
        connection to be established. This problem probably never
        surfaced because the OS network layer timeout is usually
        lower then our default.
     @ sql/net_serv.cc
        Remove net_data_is_ready, it is no longer necessary as its
        functionality is now implemented by vio_io_wait.
        
        Remove the use of signal based alarms for timeouts. Instead,
        rely on the underlying VIO layer to provide socket timeouts
        and control the blocking mode. The error handling is changed
        to use vio_was_timeout() to recognize a timeout occurrence
        and vio_should_retry() to detect interrupted I/O operations.
        
        The loop in the packet reading path (my_real_path) is unrolled
        into two steps: reading the packet header and payload. Each
        step is now documented and should be easier to understand.
        
        net_clear() no longer drains the socket buffer. Unread data
        on the socket constitutes a protocol violation and shouldn't
        be skipped. The previous behavior was race prone anyway
        as it only drained data on the socket buffer, missing any
        data in transit. Instead, net_clear() now just checks if
        the endpoint is still connected.
        
        Remove use of the MYSQL_INSTANCE_MANAGER macro.
     @ vio/vio.c
        Rename no_poll_read to no_io_wait. The stub method is used
        for the transport types named pipe and shared memory.
        
        Remove hPipe parameter to vio_init, it is now set in the
        vio_new_win32pipe function.
        
        Initialize timeout values to -1 (infinite) in vio_init.
        
        Improve vio_reset to also reinitialize timeouts and assert
        that rebinding is only supported for socket-based transport
        types.
        
        Move event object creation to its own initialization function
        and perform error checking.
        
        Remove caching of the socket status flags, including blocking
        mode. It is no longer necessary.
        
        Add a vio_timeout wrapper which adjusts the timeout value
        from seconds to milliseconds and invokes the underlying
        VIO-specific timeout handler. It also helps the underlying
        handler detect the previous state of the timeouts (activated
        or deactivated).
     @ vio/viopipe.c
        Simplify the pipe transport code and adjust to handle the
        new timeout interface.
     @ vio/vioshm.c
        Move shared memory related code over from viosocket.c
     @ vio/viosocket.c
        Use vio_ssl_errno to retrieve the error number if the VIO
        type is SSL.
        
        Add the vio_socket_io_wait() helper which wraps vio_io_wait()
        in order to avoid code duplication in vio_write()/vio_read().
        The function selects a appropriate timeout (based on the passed
        event) and translates the return value.
        
        Re-implement vio_read()/vio_write() as simple event loops. The
        loop consists of retrying the I/O operation until it succeeds
        or fails with a non-recoverable error. If it fails with error
        set to EAGAIN or EWOULDBLOCK, vio_io_wait() is used to wait
        for the socket to become ready. On Linux, the MSG_DONTWAIT
        flag is used to get the same effect as if the socket is in
        non-blocking mode.
        
        Add vio_set_blocking() which tweaks the blocking mode of
        a socket.
        
        Add vio_win32_timeout() which sets the timeout of a socket
        on Windows.
        
        Add vio_socket_timeout() which implements the logic around
        when a socket should be switched to non-blocking mode. Except
        on Windows, the socket is put into non-blocking mode if a
        timeout is set.
        
        vio_should_retry now indicates whether a I/O operation was
        interrupted by a temporary failure (interrupted by a signal)
        and should be retried later. vio_was_timeout indicates
        whether a I/O operation timed out.
        
        Remove socket_poll_read, now superseded by vio_io_wait.
        
        Implement vio_io_wait() using select() on Windows. Otherwise,
        use poll(). vio_io_wait() will loop until either a timeout
        occurs, or the socket becomes ready. In the presence of
        spurious wake-ups, the wait is resumed if possible.
        
        Move code related to the transport types named pipe and
        shared memory to their own files.
        
        Introduce vio_connect to avoid code duplication. The
        function implements a timed connect by setting the socket
        to non-blocking and polling until the connection is either
        established or fails.
     @ vio/viossl.c
        Re-implement vio_ssl_read() and vio_ssl_write() as simple
        event loops. Use vio_socket_io_wait() to handle polling,
        the SSL transport can only be used with sockets.
        
        Obtain the error status from SSL. Map the obtained error
        code to a platform equivalent one.
[16 Nov 2010 10:15] Davi Arnaut
The above patch is also a prerequisite for Bug#37780, which will implement I/O cancelation in the VIO layer.
[17 Jan 2011 9:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/128908

3255 Davi Arnaut	2011-01-17
      Bug#54790: Use of non-blocking mode for sockets limits performance
      Bug#51244: wait_timeout fails on OpenSolaris
      
      The problem was that a optimization for the case when the server
      uses alarms for timeouts could cause a slowdown when socket
      timeouts are used instead. In case alarms are used for timeouts,
      a non-blocking read is attempted first in order to avoid the
      cost of setting up a alarm. If this non-blocking read fails,
      the socket mode is changed to blocking and a alarm is armed.
      
      If socket timeout is used, there is no point in attempting a
      non-blocking read first as the timeout will be automatically
      armed by the OS. Yet the server would attempt a non-blocking
      read first and later switch the socket to blocking mode. This
      could inadvertently impact performance as switching the blocking
      mode of a socket requires at least two calls into the kernel
      on Linux, apart from problems inherited by the scalability
      of fcntl(2).
      
      The solution is to remove alarm based timeouts from the
      protocol layer and push timeout handling down to the virtual
      I/O layer. This approach allows the handling of socket timeouts
      on a platform-specific basis. The blocking mode of the socket
      is no longer exported and VIO read and write operations either
      complete or fail with a error or timeout.
      
      On Linux, the MSG_DONTWAIT flag is used to enable non-blocking
      send and receive operations. If the operation would block,
      poll() is used to wait for readiness or until a timeout occurs.
      This strategy avoids the need to set the socket timeout and
      blocking mode twice per query.
      
      On Windows, as before, the timeout is set on a per-socket
      fashion. In all remaining operating systems, the socket is set
      to non-blocking mode and poll() is used to wait for readiness
      or until a timeout occurs.
      
      In order to cleanup the code after the removal of alarm based
      timeouts, the low level packet reading loop is unrolled into
      two specific sequences: reading the packet header and the
      payload. This make error handling easier down the road.
      
      In conclusion, benchmarks have shown that these changes do not
      introduce any performance hits and actually slightly improves
      the server throughput for higher numbers of threads.
      
      Incompatible changes:
      
      A timeout is now always applied to a individual receive or
      send I/O operation. In contrast, a alarm based timeout was
      applied to an entire send or receive packet operation.
      
      Building and running MySQL on POSIX systems now requires
      support for poll() and O_NONBLOCK. These should be available
      in any modern POSIX system.
      
      O Windows, the default value for MYSQL_OPT_CONNECT_TIMEOUT
      is no longer 20 seconds. The default value now is no timeout
      (infinite), the same as in all other platforms.
     @ include/my_global.h
        A socket timeout is now signalled with a ETIMEDOUT error,
        which in some cases is emulated (that is, set explicitly).
        
        Add a wrapper for ECONNRESET so that it can be used to map
        SSL-specific errors.
     @ include/violite.h
        Introduce a vio_io_wait method which can be used to wait
        for I/O events on a VIO. The supported I/O events are read
        and write. The notification of error conditions is implicit.
        
        Extend vio_reset, which was SSL-specific, to be able to
        rebind a new socket to an already initialized Vio object.
        
        Remove vio_is_blocking. The blocking mode is no longer
        exported. Also, remove fcntl_mode as the blocking mode is
        no longer buffered. It is now a product of the timeout.
        
        Add prototype for a wrapper vio_timeout() function, which
        invokes the underlying timeout method of the VIO. Add to
        VIO two member variables to hold the read and write timeout
        values.
        
        Rename vio_was_interrupted to vio_was_timeout.
        
        Remove vio_poll_read, which is now superseded vio_io_wait.
        
        Introduce vio_connect to avoid code duplication.
     @ mysql-test/t/ssl.test
        Add test case to ensure that timeouts work over SSL.
     @ mysql-test/t/wait_timeout.test
        Add test case for Bug#54790.
     @ sql-common/client.c
        Look into last_errno to check whether the socket was
        interrupted due to a timeout.
        
        Add a couple of wrappers to retrieve the connect_timeout
        option value.
        
        Use vio_io_wait instead of vio_poll_read.
        
        The result of a failed socket operation is in socket_errno,
        which is WSAGetLastError on Windows.
        
        Replace my_connect() and wait_for_data() with vio_connect()
        to avoid code duplication. The polling inside wait_for_data()
        is somewhat equivalent to wait_for_io(). Also, this can be
        considered a bug fix as wait_for_data() would incorrectly
        use POLLIN (there is data to read) to wait (poll) for the
        connection to be established. This problem probably never
        surfaced because the OS network layer timeout is usually
        lower then our default.
     @ sql/net_serv.cc
        Remove net_data_is_ready, it is no longer necessary as its
        functionality is now implemented by vio_io_wait.
        
        Remove the use of signal based alarms for timeouts. Instead,
        rely on the underlying VIO layer to provide socket timeouts
        and control the blocking mode. The error handling is changed
        to use vio_was_timeout() to recognize a timeout occurrence
        and vio_should_retry() to detect interrupted I/O operations.
        
        The loop in the packet reading path (my_real_path) is unrolled
        into two steps: reading the packet header and payload. Each
        step is now documented and should be easier to understand.
        
        net_clear() no longer drains the socket buffer. Unread data
        on the socket constitutes a protocol violation and shouldn't
        be skipped. The previous behavior was race prone anyway
        as it only drained data on the socket buffer, missing any
        data in transit. Instead, net_clear() now just checks if
        the endpoint is still connected.
        
        Remove use of the MYSQL_INSTANCE_MANAGER macro.
     @ tests/mysql_client_test.c
        Add test case to ensure that the MYSQL_OPT_READ_TIMEOUT option
        works as expected.
     @ vio/vio.c
        Rename no_poll_read to no_io_wait. The stub method is used
        for the transport types named pipe and shared memory.
        
        Remove hPipe parameter to vio_init, it is now set in the
        vio_new_win32pipe function.
        
        Initialize timeout values to -1 (infinite) in vio_init.
        
        Improve vio_reset to also reinitialize timeouts and assert
        that rebinding is only supported for socket-based transport
        types.
        
        Move event object creation to its own initialization function
        and perform error checking.
        
        Remove caching of the socket status flags, including blocking
        mode. It is no longer necessary.
        
        Add a vio_timeout wrapper which adjusts the timeout value
        from seconds to milliseconds and invokes the underlying
        VIO-specific timeout handler. It also helps the underlying
        handler detect the previous state of the timeouts (activated
        or deactivated).
     @ vio/viopipe.c
        Simplify the pipe transport code and adjust to handle the
        new timeout interface.
     @ vio/vioshm.c
        Move shared memory related code over from viosocket.c
     @ vio/viosocket.c
        Use vio_ssl_errno to retrieve the error number if the VIO
        type is SSL.
        
        Add the vio_socket_io_wait() helper which wraps vio_io_wait()
        in order to avoid code duplication in vio_write()/vio_read().
        The function selects a appropriate timeout (based on the passed
        event) and translates the return value.
        
        Re-implement vio_read()/vio_write() as simple event loops. The
        loop consists of retrying the I/O operation until it succeeds
        or fails with a non-recoverable error. If it fails with error
        set to EAGAIN or EWOULDBLOCK, vio_io_wait() is used to wait
        for the socket to become ready. On Linux, the MSG_DONTWAIT
        flag is used to get the same effect as if the socket is in
        non-blocking mode.
        
        Add vio_set_blocking() which tweaks the blocking mode of
        a socket.
        
        Add vio_win32_timeout() which sets the timeout of a socket
        on Windows.
        
        Add vio_socket_timeout() which implements the logic around
        when a socket should be switched to non-blocking mode. Except
        on Windows, the socket is put into non-blocking mode if a
        timeout is set.
        
        vio_should_retry now indicates whether a I/O operation was
        interrupted by a temporary failure (interrupted by a signal)
        and should be retried later. vio_was_timeout indicates
        whether a I/O operation timed out.
        
        Remove socket_poll_read, now superseded by vio_io_wait.
        
        Implement vio_io_wait() using select() on Windows. Otherwise,
        use poll(). vio_io_wait() will loop until either a timeout
        occurs, or the socket becomes ready. In the presence of
        spurious wake-ups, the wait is resumed if possible.
        
        Move code related to the transport types named pipe and
        shared memory to their own files.
        
        Introduce vio_connect to avoid code duplication. The
        function implements a timed connect by setting the socket
        to non-blocking and polling until the connection is either
        established or fails.
     @ vio/viossl.c
        Re-implement vio_ssl_read() and vio_ssl_write() as simple
        event loops. Use vio_socket_io_wait() to handle polling,
        the SSL transport can only be used with sockets.
        
        Obtain the error status from SSL. Map the obtained error
        code to a platform equivalent one.
[3 Jun 2011 2:10] Paul Dubois
Noted in 5.6.3 changelog.

For socket I/O, an optimization for the case when the server used
alarms for timeouts could cause a slowdown when socket timeouts were
used instead. 

CHANGESET - http://lists.mysql.com/commits/138536
[3 Jun 2011 15:43] Davi Arnaut
CHANGESET - http://lists.mysql.com/commits/138456
[3 Jun 2011 15:48] Davi Arnaut
This fix also addressed the issue reported on Bug#36225.