Bug #37267 connect() EINPROGRESS failures mishandled in client library
Submitted: 8 Jun 2008 1:41 Modified: 4 May 2010 13:16
Reporter: Kurtis Rader Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S3 (Non-critical)
Version:5.0.51b, 5.0, 5.1, 6.0 bzr OS:Linux
Assigned to: Ingo Strüwing CPU Architecture:Any
Tags: non-blocking connect einprogress poll

[8 Jun 2008 1:41] Kurtis Rader
Description:
The my_connect() and wait_for_data() functions do not correctly handle non-blocking connect() calls that occur in the window between the bind() and listen() calls on the destination end-point. When that race occurs the connect() calls returns -1 with errno == EINPROGRESS. The wait_for_data() function then polls the socket for readability. Unfortunately the socket is in an error state so pollfd.revents has POLLERR, POLLHUP and POLLIN set. Since wait_for_data() only looks for POLLIN it reports success and the subsequent read() on the socket fails with errno == ECONNREFUSED.

How to repeat:
Use SSH to tunnel the connection. Wrap the ssh call with a script similar to this one:

    #!/bin/sh
    sleep 1
    exec /usr/bin/ssh "$@"

Then check whether ssh is ready to tunnel the MySQL connection by rapidly calling connect() on the local ssh end-point. When it returns success immediately call the MySQL connection function. With this setup I can hit the race greater than 99% of the time on my workstation. Without the artificial ssh startup delay and tight polling this race happens infrequently since ssh typically reaches the listen() call on the local end-point of the tunnel before the MySQL connect is attempted. But on a heavily loaded system the failed connections happen often enough (around 0.5% of the time) to be annoying.

Suggested fix:
There are a couple of problems. First is the use of pollfd.events == POLLIN. The correct flag is POLLOUT. The select() version of wait_for_data() gets this correct by passing NULL for the readfds parameter, and a bitmask containing the socket fd for the writefds parameter.

The other problem is that this is a very low probability race condition which can be correctly handled by simply closing the socket and going back to the top of my_connect() and issuing the connect() again. But note that to be 100% correct you need to catch ECONNREFUSED on the second thru n'th attempts because that will be returned until the target calls listen(). Alternatively, it may be adequate to insert a short sleep in the expectation that by the time the second connect() is performed the end-point will have transitioned to the LISTEN state.
[11 Jun 2008 12:03] Susanne Ebrecht
Many thanks for writing a bug report.

Please, provide your C code.
[11 Jun 2008 18:31] Kurtis Rader
The original problem was observed using the Python bindings and involves confidential code. But based on strace output the following program simulates the the behavior of the MySQL client library when compiled with "#define BROKEN 1". When BROKEN is set to zero the suggested fix is used. Simply save as test.c, "make test", "./test". When recreating the problem you should see "read: Connection refused" printed which indicates the read() call returned -1 with errno == ECONNREFUSED. Note the poll() results in that case which show POLLIN is set but so is POLLHUP and POLLERR.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <memory.h>
#include <errno.h>
#include <sys/fcntl.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/poll.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define TIMEOUT_SEC 5

#define BROKEN 1

int client() {
        int fd, flags, res;
        in_addr_t ip_addr;
        struct sockaddr_in sock_addr;
        struct pollfd ufds;

        memset(&sock_addr, sizeof(sock_addr), 0);
        sock_addr.sin_family = AF_INET;
        sock_addr.sin_port = htons(65123);
        ip_addr = inet_addr("127.0.0.1");
        memcpy(&sock_addr.sin_addr, &ip_addr, sizeof(ip_addr));

        usleep(100000);
        fd = socket(AF_INET, SOCK_STREAM, 0);
        printf("client socket fd=%d\n", fd);
connect_again:
        flags = fcntl(fd, F_GETFL, 0);
        fcntl(fd, F_SETFL, flags | O_NONBLOCK);
        printf("client fd set nonblocking\n");
        res = connect(fd, (struct sockaddr *)&sock_addr, sizeof(sock_addr));
        if (res == 0) {
                printf("connect succeeded\n");
                return;
        }
        if (errno == ECONNREFUSED) {
                usleep(100000);
                goto connect_again;
        }
        if (errno != EINPROGRESS) {
                perror("connect");
                return;
        }
        printf("client fd set blocking\n");
        fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);

        ufds.fd = fd;
#if BROKEN
        ufds.events = POLLIN | POLLPRI;
#else
        ufds.events = POLLOUT;
#endif
        printf("client connect in progress -- polling\n");
        res = poll(&ufds, 1, TIMEOUT_SEC * 1000);
        printf("poll() res=%d  revents=0x%x  "
               "POLLHUP=0x%x POLLERR=0x%x POLLIN=0x%x  POLLOUT=0x%x\n",
                res, ufds.revents,
                ufds.revents & POLLHUP, ufds.revents & POLLERR,
                ufds.revents & POLLIN, ufds.revents & POLLOUT);
        if (res == 0) {
                printf("client poll() timed out\n");
                return;
        }
#if BROKEN
        if (!(ufds.revents & POLLIN)) return;
#else
        if (ufds.revents & POLLHUP) goto connect_again;
        if (!(ufds.revents & POLLOUT)) {
                printf("client poll() did not return POLLOUT as expected\n");
                return;
        }
#endif
        printf("client calling read\n");
        res = read(fd, &flags, 1);
        if (res == -1) {
                perror("read");
                return;
        }
        printf("client read=%d\n", res);
        printf("client returning\n");
}

int server() {
        int fd, new_fd, opt;
        socklen_t addrlen;
        in_addr_t ip_addr;
        struct sockaddr_in sock_addr;

        memset(&sock_addr, sizeof(sock_addr), 0);
        sock_addr.sin_family = AF_INET;
        sock_addr.sin_port = htons(65123);
        ip_addr = inet_addr("127.0.0.1");
        memcpy(&sock_addr.sin_addr, &ip_addr, sizeof(ip_addr));

        fd = socket(AF_INET, SOCK_STREAM, 0);
        printf("server socket fd=%d\n", fd);
        opt = 1;
        setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
        if (bind(fd, (struct sockaddr *)&sock_addr, sizeof(sock_addr)) != 0) {
                perror("bind");
                return;
        }
        printf("server bound -- sleeping for 1 second\n");
        sleep(1);
        printf("server calling listen\n");
        if (listen(fd, 1)) {
                perror("listen");
                return;
        }
        printf("server calling accept\n");
        addrlen = sizeof(sock_addr);
        new_fd = accept(fd, (struct sockaddr *)&sock_addr, &addrlen);
        if (new_fd == -1) {
                perror("accept");
                return;
        }
        printf("server accepted connection\n");
        write(new_fd, "x", 1);
        printf("server returning\n");
}

int main() {
        if (fork()) {
                server();
        }
        else {
                client();
        }
        exit(0);
}
[23 Sep 2008 7:58] Sveta Smirnova
Thank you for the feedback.

Verified using code analysis and example provided.
[15 Sep 2009 7:53] Ingo Strüwing
I stumbled over this when the test rpl_ssl1 failed on my Ubuntu machine with IPV6 and IPV6 enabled. The slave tried to connect over IPV6 first, which failed and should have tried over IPV4 thereafter. However, since wait_for_data() returned success, the second attempt was not done as mysql_real_connect() assumed, the first connection was successful. This was 100% repeatable on my machine.

I have a patch for it, which I will publish soon.
[15 Sep 2009 13:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83276

3605 Ingo Struewing	2009-09-15
      Bug#37267 - connect() EINPROGRESS failures mishandled in client library
      
      We cann connect() in a non-blocking mode to be able to specify a
      non-standard timeout.
      
      The problem was that we did not fetch the status from the
      non-blocking connect(). We assumed that poll() would not return
      a POLLIN flag if the connect failed. But on some platforms this
      is not true.
      
      After a successful poll() we do now retrieve the status value
      from connect() with getsockopt(...SO_ERROR...). Now we do know
      if (and how) the connect failed.
      
      The test case for my investigation was rpl.rlp_ssl1 on an
      Ubuntu 9.04 x86_64 machine. Both, IPV4 and IPV6 were active.
      'localhost' resolved first for IPV6 and then for IPV4. The
      connection over IPV6 was blocked. rpl.rlp_ssl1 timed out
      as it did not notice the failed connect(). The first read()
      failed, which was interpreted as a master crash and the
      connection was tried to reestablish with the same result
      until the retry limit was reached.
      
      With the fix, the connect() problem is immediately recognized,
      and the connect() is retried on the second resolution for
      'localhost', which is successful.
     @ libmysqld/libmysqld.c
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Changed a DBUG print string to distinguish the two mysql_real_connect()
        implementations in DBUG traces.
     @ mysql-test/include/wait_for_slave_param.inc
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Made timeout value available in error message.
     @ mysql-test/suite/rpl/r/rpl_get_master_version_and_clock.result
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Fixed test result. Connect error is now detected as CR_CONN_HOST_ERROR
        (2003) instead of CR_SERVER_LOST (2013).
     @ sql-common/client.c
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Added retrieval of the error code from the non-blocking connect().
        Added DBUG.
        Added comment.
[15 Sep 2009 19:08] Jim Winstead
one small problem: in my_connect() you added a DBUG_RETURN to the windows/netware path that doesn't have a corresponding DBUG_ENTER.
[15 Sep 2009 20:03] Ingo Strüwing
Ouch. Yes. Thanks. I will fix it.
[28 Sep 2009 16:54] Vladislav Vaintroub
Looks fine -  good to push after fixing  windows/netware DBUG_RETURN
[29 Sep 2009 15:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85068

3605 Ingo Struewing	2009-09-29
      Bug#37267 - connect() EINPROGRESS failures mishandled in client library
      
      We cann connect() in a non-blocking mode to be able to specify a
      non-standard timeout.
      
      The problem was that we did not fetch the status from the
      non-blocking connect(). We assumed that poll() would not return
      a POLLIN flag if the connect failed. But on some platforms this
      is not true.
      
      After a successful poll() we do now retrieve the status value
      from connect() with getsockopt(...SO_ERROR...). Now we do know
      if (and how) the connect failed.
      
      The test case for my investigation was rpl.rlp_ssl1 on an
      Ubuntu 9.04 x86_64 machine. Both, IPV4 and IPV6 were active.
      'localhost' resolved first for IPV6 and then for IPV4. The
      connection over IPV6 was blocked. rpl.rlp_ssl1 timed out
      as it did not notice the failed connect(). The first read()
      failed, which was interpreted as a master crash and the
      connection was tried to reestablish with the same result
      until the retry limit was reached.
      
      With the fix, the connect() problem is immediately recognized,
      and the connect() is retried on the second resolution for
      'localhost', which is successful.
     @ libmysqld/libmysqld.c
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Changed a DBUG print string to distinguish the two mysql_real_connect()
        implementations in DBUG traces.
     @ mysql-test/include/wait_for_slave_param.inc
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Made timeout value available in error message.
     @ mysql-test/suite/rpl/r/rpl_get_master_version_and_clock.result
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Fixed test result. Connect error is now detected as CR_CONN_HOST_ERROR
        (2003) instead of CR_SERVER_LOST (2013).
     @ sql-common/client.c
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Added retrieval of the error code from the non-blocking connect().
        Added DBUG.
        Added comment.
[29 Sep 2009 15:48] Ingo Strüwing
Both reviewers agreed - setting this to "patch approved".
Note that this patch will not go to any production version or release canditate. The risk of this patch to break existing applications has been estimated as high.
[30 Sep 2009 10:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85172

2880 Ingo Struewing	2009-09-30
      Bug#37267 - connect() EINPROGRESS failures mishandled in client library
      
      We cann connect() in a non-blocking mode to be able to specify a
      non-standard timeout.
      
      The problem was that we did not fetch the status from the
      non-blocking connect(). We assumed that poll() would not return
      a POLLIN flag if the connect failed. But on some platforms this
      is not true.
      
      After a successful poll() we do now retrieve the status value
      from connect() with getsockopt(...SO_ERROR...). Now we do know
      if (and how) the connect failed.
      
      The test case for my investigation was rpl.rlp_ssl1 on an
      Ubuntu 9.04 x86_64 machine. Both, IPV4 and IPV6 were active.
      'localhost' resolved first for IPV6 and then for IPV4. The
      connection over IPV6 was blocked. rpl.rlp_ssl1 timed out
      as it did not notice the failed connect(). The first read()
      failed, which was interpreted as a master crash and the
      connection was tried to reestablish with the same result
      until the retry limit was reached.
      
      With the fix, the connect() problem is immediately recognized,
      and the connect() is retried on the second resolution for
      'localhost', which is successful.
     @ libmysqld/libmysqld.c
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Changed a DBUG print string to distinguish the two mysql_real_connect()
        implementations in DBUG traces.
     @ mysql-test/include/wait_for_slave_param.inc
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Made timeout value available in error message.
     @ mysql-test/suite/rpl/r/rpl_get_master_version_and_clock.result
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Fixed test result. Connect error is now detected as CR_CONN_HOST_ERROR
        (2003) instead of CR_SERVER_LOST (2013).
     @ sql-common/client.c
        Bug#37267 - connect() EINPROGRESS failures mishandled in client library
        Added retrieval of the error code from the non-blocking connect().
        Added DBUG.
        Added comment.
[30 Sep 2009 12:28] Ingo Strüwing
Queued to mysql-next-mr-bugfixing and mysql-6.0-backup-merge.
[2 Oct 2009 17:20] Ingo Strüwing
Also merged to mysql-6.0-codebase-bugfixing.
[9 Oct 2009 8:47] Bugs System
Pushed into 6.0.14-alpha (revid:alik@ibmvm-20091009083208-0o0f0i9w1sq3c1kn) (version source revid:ingo.struewing@sun.com-20091001194257-a6xrw9u0ccx24qgn) (merge vers: 6.0.14-alpha) (pib:12)
[13 Oct 2009 13:19] Paul DuBois
Noted in 6.0.14 changelog.

The client library mishandled EINPROGRESS errors for connections in
nonblocking mode.  

Setting report to NDI pending push into 5.5.x.
[16 Oct 2009 6:38] Tonci Grgin
Bug#47451 was marked as duplicate of this report.
[25 Oct 2009 13:38] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091025133616-ca4inerav4vpdnaz) (version source revid:ingo.struewing@sun.com-20090930110144-qnqbtc3kfsr1200h) (merge vers: 6.0.14-alpha) (pib:13)
[25 Oct 2009 20:54] Paul DuBois
Setting report to NDI pending push into 5.5.x.
[12 Nov 2009 8:21] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:alik@ibmvm-20091009130916-0ijstmmz3efzx20g) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 18:46] Paul DuBois
Noted in 5.5.0 changelog.
[15 Apr 2010 11:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105724
[29 Apr 2010 9:43] Bugs System
Pushed into 5.1.44-ndb-7.0.15 (revid:magnus.blaudd@sun.com-20100429073554-0wswepmdyx9mibt2) (version source revid:magnus.blaudd@sun.com-20100429073554-0wswepmdyx9mibt2) (merge vers: 5.1.44-ndb-7.0.15) (pib:16)
[4 May 2010 13:16] Jon Stephens
Also documented in the NDB-7.0.15 and NDB-7.1.4 changelogs. Amended changelog entry to read as follows:

        The MySQL client library mishandled EINPROGRESS errors for 
        connections in nonblocking mode. This could lead to replication 
        failures on hosts capable of resolving both IPv4 and IPv6 network 
        addresses, while trying to resolve localhost.

Closed.

SEEALSO BUG#44344.