Bug #72509 mysql_ping spins when mysqld is stopped via SIGSTOP
Submitted: 1 May 2014 22:18 Modified: 5 May 2014 14:18
Reporter: John Moore Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Command-line Clients Severity:S2 (Serious)
Version:5.5.37, 5.1.73 OS:Linux
Assigned to: CPU Architecture:Any

[1 May 2014 22:18] John Moore
Any application that calls mysql_ping will spin indefinitely when the mysqld is sent a SIGSTOP. It will continue to run if SIGCONT is sent, but in the mean time it spins. Programs that are monitoring the mysql server will not report this and are stuck in the spinning, retrying the read from the socket.

Looking at the code, the problem seems like an obvious omission of exiting code:
(th EXTRA_DEBUG states "read looped error, aborting thread", but then doesn't do it!!!

How to repeat:
1) Have mysqld running.

2) Have a program call mysql_ping() in a loop, with one second sleep, echoing "Hello"

3) send SIGSTOP to the pid  (killall -STOP mysqld)

4) now that your program is in a stalled state, gdb it and break in vio_read();

Results: you will break over and over. go up and print retry_count. This number will get very large!


Suggested fix:
Here's a patch that corrects the problem:

--- a/sql/net_serv.cc
+++ b/sql/net_serv.cc
@@ -865,6 +865,12 @@
            fprintf(stderr, "%s: read looped with error %d, aborting thread\n",
#endif /* EXTRA_DEBUG */
+               len= packet_error;
+               net->error= 2;                 /* Close socket */
+               net->last_errno= ER_NET_FCNTL_ERROR;
+               goto end;
#if defined(THREAD_SAFE_CLIENT) && !defined(MYSQL_SERVER)
          if (vio_errno(net->vio) == SOCKET_EINTR)
[1 May 2014 23:17] John Moore
Looking at this again, it seems that a check for EINTR was inserted which trumps the retry_count check and keeps on trying forever.

Perhaps a better fix would be to add a retry_count_for_EINTR and allow for "n" retries of this kind?

Alternatively, perhaps a mysql_ping_without_eintr_retry() function is needed so it can be called without having this problem.