MySQL Bugs: #107852: MySQL client does not detect closed connection properly

Bug #107852	MySQL client does not detect closed connection properly
Submitted:	12 Jul 2022 19:18	Modified:	17 Feb 2024 19:44
Reporter:	Isobel Smith	Email Updates:
Status:	Open	Impact on me:	None
Category:	MySQL Server: Command-line Clients	Severity:	S3 (Non-critical)
Version:	8.0.26	OS:	Any (Linux and OSX)
Assigned to:	Geir Høydalsvik	CPU Architecture:	x86

Description:
When using the command line mysql client to establish a server connection, and for whatever reason the server terminates the connection, the client does not handle it until it tries to execute a SQL Query.

It then fails with "ERROR 2013 (HY000): Lost connection to MySQL server during query" even though this is technically incorrect.
The connection was already lost before the query was transmitted to the server.
The client machine definetely received the information, that the connection was terminated, as "netstat -an" reveals, that the socket migrates from "ESTABLISHED" into "CLOSE_WAIT", indicating that the kernel waits for mysql client to close the socket.

We were also able to observe this behaviour with the MySQL2 ruby gem/library, which is binding to libmysql. So it might very well be, that the bug is actually in libmysql.

How to repeat:
1) use the mysql cli client on OSX or Ubuntu(Linux) to connect to a server.
2) Execute a simple select to ensure the connection is established (SELECT @@VERSION;)
("netstat" should now show you that the connection is "ESTABLISHED")
3) through a separate MySQL connection identify your connection ID (using "SHOW PROCESSLIST") and terminate it ("KILL <ID>")
("netstat" should now show you that the connection is in "CLOSED_WAIT" state)
4) In the initial MySQL connection execute "SELECT @@VERSION;" again.
("netstat should now show you that the connection is closed)
The second select will produce "ERROR 2013 (HY000): Lost connection to MySQL server during query".

Suggested fix:
I would expect that

1) The error message says "MySQL server has gone away" instead of falsely indicating, that during the transmission of the query to the server, the connection was still intact.
Especially as the MySQL documentation states, that in case a connection is killed, you should expect the "Gone away" error:
https://dev.mysql.com/doc/refman/8.0/en/gone-away.html#:~:text=has%20killed%20the%20runnin...

2) The error is raised immediately once the connection is closed. (To allow applications to cleanly re-open connections / handle connection loss cleanly)
As soon as a FIN-TCP-Packet is received, I would expect mysql to handle it and marks the connection as closed.
It looks like, this is actually only handled, once a new query is about to be executed.

Hi Mr. Smith,

Thank you, very much, for your bug report.

However, this is not a bug. This is expected and default  behaviour. If you wish that broken connection are automatically re-established, you should start MySQL CLI with --reconnect argument. You can also configure MySQL CLI in my.cnf or .my.cnf to automatically reconnect, by using the same option, without leading dashes.

There are similar options for most of our API's and Connectors.

This is all explained in our Reference Manual.

Not a bug.

Hi, 

Thanks a lot for your reply. I think there has been a misunderstanding - we do not want to automatically re-establish connections, we want proper termination of connections. 

Cheers,
Isobel

Hi Isobel,

My understanding is that you want the client to separate between:
a) The query was sent to the server but then the connection was closed (server might have crashed)
b) The query was never sent to the server (because the connection had already been closed)

In the first case (a) the current error messages is ok:
  "ERROR 2013 (HY000): Lost connection to MySQL server during query".

In the second case (b) you suggest to have a separate error code, right?

Note that the application (client) only learns about the actual socket state when it READS from the socket.

So for case (b) we could poll the socket state before sending the query.
We would then detect that the connection is closed and respond with a different error code "connection closed , query has not been sent". 

Would it be a solution to expose an API to you that polls the socket state?
In this case you could check before you send the query.

-geir

Hey Geir,

Yes, option B is exactly what we wanted, and the reason for opening this ticket.

As I have never used "low level" TCP-Sockets, I am not too familiar with possibilities on them. So please forgive me in case I write nonsense.

My hope was, that the OS would actually detect the socket being broken, as I hoped a TCP RST-Packet would be received. After further investigation, I realised, that unless the client sends data, the server will (of course) never send a RST, as it is not aware of the clients state.

I guess exposing an API to check the sockets state would do the trick, depending on how polling the sockets state is implemented.
If polling the state does not rely on transmitting / receiving TCP packets (which would introduce additional latency), for the reason of user convenience, I would prefer the polling to happen automatically before sending a query.
In case this would introduce extra round-trips between client and server, I understand, if you decide against it.

I read about the SIGPIPE signal, that the client would receive when writing to a non-existing socket. I was not able to verify, if in situation B, the SIGPIPE signal is actually sent, or if the OS does not yet know, that the socket is dead.
Regarding the SIGPIPE signal, I was wondering, if it would be possible to increase the "likeliness" of it being triggered when using TCP-Keepalive.

Maybe I can visualise my idea by some pseudo-C-code:

bool sigpipeTriggered = false

void sigpipe_handler(int signo) {
    if (signo == SIGPIPE) {
        sigpipeTriggered = true
    }
}

// Create a socket and set up signal handler to catch SIGPIPE
int connect() {
  int sockfd;

  struct sigaction sa;

  // Set up signal handler for SIGPIPE
  sigemptyset(&sa.sa_mask);
  sa.sa_flags = 0;
  sa.sa_handler = sigpipe_handler;
  if (sigaction(SIGPIPE, &sa, NULL) == -1) {
      perror("Error setting up signal handler");
      exit(1);
  }

  // Create a TCP socket ()
  sockfd = socket(AF_INET, SOCK_STREAM, 0);
  if (sockfd < 0) {
      perror("Error opening socket");
      exit(1);
  }
  return sockfd;
}

int writeDataToSocket(int sockfd, const char *data) {
  if(sigpipeTriggered) {
    // if we land here, then we deal with situation B
    // Would using TCP-Keepalive help to detect a closed socket faster / before trying to write to it?

    return ERROR_CONNECTION_CLOSED_QUERY_NOT_SENT;
  }

  // Send data through the socket
  if (send(sockfd, data, strlen(data), 0) < 0) {
    // In case we land here, we deal with situation A
    return ERROR_MYSQL_HAS_GONE_AWAY;
  }

  return 0;
}

What would be your approach to the function to poll the socket status? Maybe that solution is easier / works better. 

Thanks a lot, 
Isobel & Marno

My 2 cents on the topic (as requested by Geir)(disclaimer, I can't claim to know every intricacy of low level TCP either):

libmysql does do the following with the TCP connection:

1. it's being opened and its state is being checked during mysql_real_connect()
2. It's sitting idle (and nothing is doing anything on the socket) unless a command (https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_command_phase.html) packet is to be sent to the server.
3. the result of trying to send the packet is being checked and the generic connection lost error is being thrown if sending the thing failed. Note that the sending is usually buffered so most often than not a whole command is sent using a single write(). So we also would get a single error code.
4. some commands might have a follow up dialog, e.g. COM_QUERY (see the state diagram in https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_com_query_response.html). These will be sent via separate writes() and their results checked and the connection lost error thrown as needed
5. the server's response from is read (if any is expected). And an (possibly disconnect) error is produced if read() fails.
6. When the session is to close mysql_close() will disconnect and report errors if the disconnection failed for some reason.

So, as you can see, a single connection lost error can be produced in a number of stages during the lifetime of the connection.

And it's not really clear by the error code itself when the error was produced.

This is considered OK since (by default) libmysql runs in auto-commit mode. This means that, if some sort of error occurs while the query is being processed by the server, the whole server transaction will be rolled back. This is actually correct for all of the DML statements (INSERT/UPDATE etc). But statements that have side effects (e.g. setting user variables) or auto-committing statements (e.g. most of the DDL) stand a chance of actually persisting some of their results even for disconnected connections. Depending on when the connection gets disconnected of course.

As for passively detecting it without trying to send a command I'm afraid that this is not very possible, unless you do the SO_KEEPALIVE TCP option. This will cause the TCP stack to "ping" the connection every so often with a packet just to see if it's still alive and well. But we don't do this for libmysql connections because it takes resources out unnecessarily on reliable connections (the most common mysqld use case, where it sits very close to the web server acting as its client).
There's one other problem with this too: there's no background thread in libmysql that monitors things. Unless the application calls libmysql APIs nothing is running. And all of the relevant libmysql APIs it can call do result in sending a command to the server anyway.

Thus, the most reliable way to verify a TCP connection's health is to actually try to send and receive something on it. This is exactly what mysql_ping/COM_PING does. It does send a no-op command to the server just to verify the connection is alive. Thus, if your application wants to do some background monitoring, it can call mysql_ping() every now and again.

So, do I understand your request correctly that you want a different error when a loss of TCP connection is detected depending on the state the mysql protocol command processing is?
I.e. one error for a failed command write, another for a failed result read, and possibly a third one for failure in the middle of a long statefull interaction like e.g. the COM_QUERY RPC? Or the prepared statements execution, https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_command_phase_ps.html

I'm afraid that a distinct error code might break a lot of existing applications that do rely on the connection lost error to detect a connection lost ...

And, unlike server errors, I can't have more than a single error code returned per libmysql API call.

So, we need to think of other options it seems.

To help you in that I would need to know what exactly do you need this information for?
Are there different ways to track how exactly did the execution of your query failed due to a connection lost event?
Please share your thoughts on the topic.