Bug #37267 | connect() EINPROGRESS failures mishandled in client library | ||
---|---|---|---|
Submitted: | 8 Jun 2008 1:41 | Modified: | 4 May 2010 13:16 |
Reporter: | Kurtis Rader | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: C API (client library) | Severity: | S3 (Non-critical) |
Version: | 5.0.51b, 5.0, 5.1, 6.0 bzr | OS: | Linux |
Assigned to: | Ingo Strüwing | CPU Architecture: | Any |
Tags: | non-blocking connect einprogress poll |
[8 Jun 2008 1:41]
Kurtis Rader
[11 Jun 2008 12:03]
Susanne Ebrecht
Many thanks for writing a bug report. Please, provide your C code.
[11 Jun 2008 18:31]
Kurtis Rader
The original problem was observed using the Python bindings and involves confidential code. But based on strace output the following program simulates the the behavior of the MySQL client library when compiled with "#define BROKEN 1". When BROKEN is set to zero the suggested fix is used. Simply save as test.c, "make test", "./test". When recreating the problem you should see "read: Connection refused" printed which indicates the read() call returned -1 with errno == ECONNREFUSED. Note the poll() results in that case which show POLLIN is set but so is POLLHUP and POLLERR. #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <memory.h> #include <errno.h> #include <sys/fcntl.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/poll.h> #include <netinet/in.h> #include <arpa/inet.h> #define TIMEOUT_SEC 5 #define BROKEN 1 int client() { int fd, flags, res; in_addr_t ip_addr; struct sockaddr_in sock_addr; struct pollfd ufds; memset(&sock_addr, sizeof(sock_addr), 0); sock_addr.sin_family = AF_INET; sock_addr.sin_port = htons(65123); ip_addr = inet_addr("127.0.0.1"); memcpy(&sock_addr.sin_addr, &ip_addr, sizeof(ip_addr)); usleep(100000); fd = socket(AF_INET, SOCK_STREAM, 0); printf("client socket fd=%d\n", fd); connect_again: flags = fcntl(fd, F_GETFL, 0); fcntl(fd, F_SETFL, flags | O_NONBLOCK); printf("client fd set nonblocking\n"); res = connect(fd, (struct sockaddr *)&sock_addr, sizeof(sock_addr)); if (res == 0) { printf("connect succeeded\n"); return; } if (errno == ECONNREFUSED) { usleep(100000); goto connect_again; } if (errno != EINPROGRESS) { perror("connect"); return; } printf("client fd set blocking\n"); fcntl(fd, F_SETFL, flags & ~O_NONBLOCK); ufds.fd = fd; #if BROKEN ufds.events = POLLIN | POLLPRI; #else ufds.events = POLLOUT; #endif printf("client connect in progress -- polling\n"); res = poll(&ufds, 1, TIMEOUT_SEC * 1000); printf("poll() res=%d revents=0x%x " "POLLHUP=0x%x POLLERR=0x%x POLLIN=0x%x POLLOUT=0x%x\n", res, ufds.revents, ufds.revents & POLLHUP, ufds.revents & POLLERR, ufds.revents & POLLIN, ufds.revents & POLLOUT); if (res == 0) { printf("client poll() timed out\n"); return; } #if BROKEN if (!(ufds.revents & POLLIN)) return; #else if (ufds.revents & POLLHUP) goto connect_again; if (!(ufds.revents & POLLOUT)) { printf("client poll() did not return POLLOUT as expected\n"); return; } #endif printf("client calling read\n"); res = read(fd, &flags, 1); if (res == -1) { perror("read"); return; } printf("client read=%d\n", res); printf("client returning\n"); } int server() { int fd, new_fd, opt; socklen_t addrlen; in_addr_t ip_addr; struct sockaddr_in sock_addr; memset(&sock_addr, sizeof(sock_addr), 0); sock_addr.sin_family = AF_INET; sock_addr.sin_port = htons(65123); ip_addr = inet_addr("127.0.0.1"); memcpy(&sock_addr.sin_addr, &ip_addr, sizeof(ip_addr)); fd = socket(AF_INET, SOCK_STREAM, 0); printf("server socket fd=%d\n", fd); opt = 1; setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); if (bind(fd, (struct sockaddr *)&sock_addr, sizeof(sock_addr)) != 0) { perror("bind"); return; } printf("server bound -- sleeping for 1 second\n"); sleep(1); printf("server calling listen\n"); if (listen(fd, 1)) { perror("listen"); return; } printf("server calling accept\n"); addrlen = sizeof(sock_addr); new_fd = accept(fd, (struct sockaddr *)&sock_addr, &addrlen); if (new_fd == -1) { perror("accept"); return; } printf("server accepted connection\n"); write(new_fd, "x", 1); printf("server returning\n"); } int main() { if (fork()) { server(); } else { client(); } exit(0); }
[23 Sep 2008 7:58]
Sveta Smirnova
Thank you for the feedback. Verified using code analysis and example provided.
[15 Sep 2009 7:53]
Ingo Strüwing
I stumbled over this when the test rpl_ssl1 failed on my Ubuntu machine with IPV6 and IPV6 enabled. The slave tried to connect over IPV6 first, which failed and should have tried over IPV4 thereafter. However, since wait_for_data() returned success, the second attempt was not done as mysql_real_connect() assumed, the first connection was successful. This was 100% repeatable on my machine. I have a patch for it, which I will publish soon.
[15 Sep 2009 13:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/83276 3605 Ingo Struewing 2009-09-15 Bug#37267 - connect() EINPROGRESS failures mishandled in client library We cann connect() in a non-blocking mode to be able to specify a non-standard timeout. The problem was that we did not fetch the status from the non-blocking connect(). We assumed that poll() would not return a POLLIN flag if the connect failed. But on some platforms this is not true. After a successful poll() we do now retrieve the status value from connect() with getsockopt(...SO_ERROR...). Now we do know if (and how) the connect failed. The test case for my investigation was rpl.rlp_ssl1 on an Ubuntu 9.04 x86_64 machine. Both, IPV4 and IPV6 were active. 'localhost' resolved first for IPV6 and then for IPV4. The connection over IPV6 was blocked. rpl.rlp_ssl1 timed out as it did not notice the failed connect(). The first read() failed, which was interpreted as a master crash and the connection was tried to reestablish with the same result until the retry limit was reached. With the fix, the connect() problem is immediately recognized, and the connect() is retried on the second resolution for 'localhost', which is successful. @ libmysqld/libmysqld.c Bug#37267 - connect() EINPROGRESS failures mishandled in client library Changed a DBUG print string to distinguish the two mysql_real_connect() implementations in DBUG traces. @ mysql-test/include/wait_for_slave_param.inc Bug#37267 - connect() EINPROGRESS failures mishandled in client library Made timeout value available in error message. @ mysql-test/suite/rpl/r/rpl_get_master_version_and_clock.result Bug#37267 - connect() EINPROGRESS failures mishandled in client library Fixed test result. Connect error is now detected as CR_CONN_HOST_ERROR (2003) instead of CR_SERVER_LOST (2013). @ sql-common/client.c Bug#37267 - connect() EINPROGRESS failures mishandled in client library Added retrieval of the error code from the non-blocking connect(). Added DBUG. Added comment.
[15 Sep 2009 19:08]
Jim Winstead
one small problem: in my_connect() you added a DBUG_RETURN to the windows/netware path that doesn't have a corresponding DBUG_ENTER.
[15 Sep 2009 20:03]
Ingo Strüwing
Ouch. Yes. Thanks. I will fix it.
[28 Sep 2009 16:54]
Vladislav Vaintroub
Looks fine - good to push after fixing windows/netware DBUG_RETURN
[29 Sep 2009 15:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/85068 3605 Ingo Struewing 2009-09-29 Bug#37267 - connect() EINPROGRESS failures mishandled in client library We cann connect() in a non-blocking mode to be able to specify a non-standard timeout. The problem was that we did not fetch the status from the non-blocking connect(). We assumed that poll() would not return a POLLIN flag if the connect failed. But on some platforms this is not true. After a successful poll() we do now retrieve the status value from connect() with getsockopt(...SO_ERROR...). Now we do know if (and how) the connect failed. The test case for my investigation was rpl.rlp_ssl1 on an Ubuntu 9.04 x86_64 machine. Both, IPV4 and IPV6 were active. 'localhost' resolved first for IPV6 and then for IPV4. The connection over IPV6 was blocked. rpl.rlp_ssl1 timed out as it did not notice the failed connect(). The first read() failed, which was interpreted as a master crash and the connection was tried to reestablish with the same result until the retry limit was reached. With the fix, the connect() problem is immediately recognized, and the connect() is retried on the second resolution for 'localhost', which is successful. @ libmysqld/libmysqld.c Bug#37267 - connect() EINPROGRESS failures mishandled in client library Changed a DBUG print string to distinguish the two mysql_real_connect() implementations in DBUG traces. @ mysql-test/include/wait_for_slave_param.inc Bug#37267 - connect() EINPROGRESS failures mishandled in client library Made timeout value available in error message. @ mysql-test/suite/rpl/r/rpl_get_master_version_and_clock.result Bug#37267 - connect() EINPROGRESS failures mishandled in client library Fixed test result. Connect error is now detected as CR_CONN_HOST_ERROR (2003) instead of CR_SERVER_LOST (2013). @ sql-common/client.c Bug#37267 - connect() EINPROGRESS failures mishandled in client library Added retrieval of the error code from the non-blocking connect(). Added DBUG. Added comment.
[29 Sep 2009 15:48]
Ingo Strüwing
Both reviewers agreed - setting this to "patch approved". Note that this patch will not go to any production version or release canditate. The risk of this patch to break existing applications has been estimated as high.
[30 Sep 2009 10:28]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/85172 2880 Ingo Struewing 2009-09-30 Bug#37267 - connect() EINPROGRESS failures mishandled in client library We cann connect() in a non-blocking mode to be able to specify a non-standard timeout. The problem was that we did not fetch the status from the non-blocking connect(). We assumed that poll() would not return a POLLIN flag if the connect failed. But on some platforms this is not true. After a successful poll() we do now retrieve the status value from connect() with getsockopt(...SO_ERROR...). Now we do know if (and how) the connect failed. The test case for my investigation was rpl.rlp_ssl1 on an Ubuntu 9.04 x86_64 machine. Both, IPV4 and IPV6 were active. 'localhost' resolved first for IPV6 and then for IPV4. The connection over IPV6 was blocked. rpl.rlp_ssl1 timed out as it did not notice the failed connect(). The first read() failed, which was interpreted as a master crash and the connection was tried to reestablish with the same result until the retry limit was reached. With the fix, the connect() problem is immediately recognized, and the connect() is retried on the second resolution for 'localhost', which is successful. @ libmysqld/libmysqld.c Bug#37267 - connect() EINPROGRESS failures mishandled in client library Changed a DBUG print string to distinguish the two mysql_real_connect() implementations in DBUG traces. @ mysql-test/include/wait_for_slave_param.inc Bug#37267 - connect() EINPROGRESS failures mishandled in client library Made timeout value available in error message. @ mysql-test/suite/rpl/r/rpl_get_master_version_and_clock.result Bug#37267 - connect() EINPROGRESS failures mishandled in client library Fixed test result. Connect error is now detected as CR_CONN_HOST_ERROR (2003) instead of CR_SERVER_LOST (2013). @ sql-common/client.c Bug#37267 - connect() EINPROGRESS failures mishandled in client library Added retrieval of the error code from the non-blocking connect(). Added DBUG. Added comment.
[30 Sep 2009 12:28]
Ingo Strüwing
Queued to mysql-next-mr-bugfixing and mysql-6.0-backup-merge.
[2 Oct 2009 17:20]
Ingo Strüwing
Also merged to mysql-6.0-codebase-bugfixing.
[9 Oct 2009 8:47]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@ibmvm-20091009083208-0o0f0i9w1sq3c1kn) (version source revid:ingo.struewing@sun.com-20091001194257-a6xrw9u0ccx24qgn) (merge vers: 6.0.14-alpha) (pib:12)
[13 Oct 2009 13:19]
Paul DuBois
Noted in 6.0.14 changelog. The client library mishandled EINPROGRESS errors for connections in nonblocking mode. Setting report to NDI pending push into 5.5.x.
[16 Oct 2009 6:38]
Tonci Grgin
Bug#47451 was marked as duplicate of this report.
[25 Oct 2009 13:38]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091025133616-ca4inerav4vpdnaz) (version source revid:ingo.struewing@sun.com-20090930110144-qnqbtc3kfsr1200h) (merge vers: 6.0.14-alpha) (pib:13)
[25 Oct 2009 20:54]
Paul DuBois
Setting report to NDI pending push into 5.5.x.
[12 Nov 2009 8:21]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:alik@ibmvm-20091009130916-0ijstmmz3efzx20g) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 18:46]
Paul DuBois
Noted in 5.5.0 changelog.
[15 Apr 2010 11:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/105724
[29 Apr 2010 9:43]
Bugs System
Pushed into 5.1.44-ndb-7.0.15 (revid:magnus.blaudd@sun.com-20100429073554-0wswepmdyx9mibt2) (version source revid:magnus.blaudd@sun.com-20100429073554-0wswepmdyx9mibt2) (merge vers: 5.1.44-ndb-7.0.15) (pib:16)
[4 May 2010 13:16]
Jon Stephens
Also documented in the NDB-7.0.15 and NDB-7.1.4 changelogs. Amended changelog entry to read as follows: The MySQL client library mishandled EINPROGRESS errors for connections in nonblocking mode. This could lead to replication failures on hosts capable of resolving both IPv4 and IPv6 network addresses, while trying to resolve localhost. Closed. SEEALSO BUG#44344.