Bug #70026 Auto reconnect does not work with 5.6 libmysqlclient
Submitted: 14 Aug 2013 7:26 Modified: 9 Jun 2014 17:28
Reporter: Yoshinori Matsunobu (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S2 (Serious)
Version:5.6.13 OS:Any
Assigned to: CPU Architecture:Any

[14 Aug 2013 7:26] Yoshinori Matsunobu
Description:
Automatic reconnection does not work for MySQL client programs linked with 5.6 libmysqlclient, even if MYSQL_OPT_RECONNECT is enabled.

How to repeat:
Build the following simple C program with 5.1/5.5/5.6 libmysqlclient, and run the program. The program reconnects as expected if linked with 5.1/5.5 libmysqlclient, but it does not reconnect if linked with 5.6 libmysqlclient.

----
#include <stdio.h>
#include <mysql.h>
#include <unistd.h>

int main(int argc, char** argv) {
  MYSQL mysql;
  mysql_init(&mysql);
  my_bool reconnect = 1;
  mysql_options(&mysql, MYSQL_OPT_RECONNECT, &reconnect);
  if ( !mysql_real_connect(&mysql, "127.0.0.1", "root", "", "test",
                             3306, NULL, 0) ){
    fprintf(stderr, "can't connect\n");
    return 1;
  }
  sleep(10); // terminate client session by KILL command
  if(mysql_query(&mysql, "SELECT 1")){
    const char *err= mysql_error(&mysql);
    fprintf(stderr, "got error %s\n", err);
  }
}
----

In 5.1 and 5.5, net_serv.cc::net_clear() set "net->error= 2" if connection was broken. This made net_write_command() return non-zero then client tried to reconnect within cli_advanced_command() function.

In 5.6, net_serv.cc::net_clear() did not set such error flag, and net_write_command() returned zero even though connection was broken.
[14 Aug 2013 7:29] MySQL Verification Team
easy to test, even with mysql client.

5.5.33 client:
mysql> select now();
+---------------------+
| now()               |
+---------------------+
| 2013-08-14 09:29:09 |
+---------------------+
1 row in set (0.00 sec)

mysql> select now();
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    1
Current database: test

+---------------------+
| now()               |
+---------------------+
| 2013-08-14 09:29:15 |
+---------------------+
1 row in set (0.01 sec)

-------------------------
5.6.13 client:

mysql> select now();
+---------------------+
| now()               |
+---------------------+
| 2013-08-14 09:28:49 |
+---------------------+
1 row in set (0.00 sec)

mysql> select now();
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql> select now();
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    1
Current database: test

+---------------------+
| now()               |
+---------------------+
| 2013-08-14 09:29:00 |
+---------------------+
1 row in set (0.01 sec)
[16 Aug 2013 0:26] Rahul Gulati
The bug might have been introduced from 
revno: 3134
committer: Davi Arnaut <davi.arnaut@oracle.com>
branch nick: 11762221-trunk
timestamp: Tue 2011-05-31 10:52:09 -0300.
[16 Aug 2013 0:49] Davi Arnaut
Yeah, the fact that the socket was being drained in net_clear had the side effect of detecting whether the connection was alive. Given that a write to a TCP socket can succeed even if the connection has been closed by the peer, reconnect won't work because the client only attempts a reconnection when sending (writing) a command fails.

A simple way to restore the previous behavior is to add a connection check (vio_is_connected) to cli_advanced_command. Just keep in mind that this is inherently racy, if the connection gets closed after the check (or after net_clear in earlier releases), the mysql_query() call will fail even if reconnect is enabled.

Also, as a historical perspective, net_clear used to drain the socket because the server had bugs where it would send more data than the client expected (e.g. drop database sending two "ok" packets).
[16 Aug 2013 2:03] Davi Arnaut
Patch that restores the previous behavior: https://gist.github.com/darnaut/6246624
[16 Aug 2013 2:12] Davi Arnaut
BTW, the change implies that poll() gets called every time a command is to be sent to the server. I guessing this could have some (perhaps minimal?) performance impact.
[9 Jun 2014 17:28] Paul DuBois
Noted in 5.6.20, 5.7.5 changelogs.

Client auto-reconnect did not work for clients linked against
libmysqlclient, even with MYSQL_OPT_RECONNECT enabled.
[6 Aug 2014 17:35] Laurynas Biveinis
$ bzr log -n0 -r 5948
------------------------------------------------------------
revno: 5948
committer: Venkata Sidagam <venkata.sidagam@oracle.com>
branch nick: 5.6
timestamp: Mon 2014-05-19 22:01:55 +0530
message:
  Bug #17309863 AUTO RECONNECT DOES NOT WORK WITH 5.6 LIBMYSQLCLIENT
  
  Problem Statement: Automatic reconnection does not work for MySQL client
  programs linked with 5.6 libmysqlclient, even if MYSQL_OPT_RECONNECT is enabled.
  
  Analysis:
  When we have two connections (say con_1 and con_2) in which 'con_1' has
  auto-reconnect enabled. In such case if 'con_2' sends 'KILL <con_1 id>'
  (killing 'con_1'), then the server closes the socket for 'con_1'.
  After that when we send any query to 'con_1' it is failing with "Lost
  connection to MySQL server during query" error, even though auto-reconnect
  is enabled for 'con_1'.
  This is because send() which sends query might still succeed on client
  even though connection has been already closed on server. Since send()
  returns success at client side, client tries to recv() the data. Now
  client receives '0' means that the peer has closed the connection.
  Hence the query fails with the error mentioned above.
  
  Problem didn't exist in 5.5 and earlier versions because in them we tried
  to read-up all data remaining from previous command before sending new one
  to the server. As result we detected that connection was closed before
  query was sent and re-established connection.
  
  Fix:
  Check if socket is alive using vio_is_connected() call in case if
  auto-reconnect is enabled before sending query to server. If socket
  was disconnected by server set net->error to 2 so the socket on the
  client will be closed as well and reconnect will be initiated before
  sending the query. Reconnect doesn't make sense in case of COM_QUIT
  so skip the connection check for this command.
  
  Note: This fix doesn't solve the problem fully as bug still can
  occur if connection is killed exactly after this check and before
  query is sent. But this is acceptable since similar problem
  exists in 5.5.
  
  Also note that this patch might cause slight performance degradation,
  but it affects only auto-reconnect mode and therefore acceptable.
[18 Nov 2014 21:23] Paul DuBois
Noted in Connector/C 6.1.6 changelog.