MySQL Bugs: #75161: MySQL does not have a reliable way to close session for remote clients

Bug #75161	MySQL does not have a reliable way to close session for remote clients
Submitted:	10 Dec 2014 11:21	Modified:	10 Dec 2014 15:37
Reporter:	Valeriy Kravchuk	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Connection Handling	Severity:	S4 (Feature request)
Version:	5.6, any	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	disconnect, TCP/IP, wait_timeout

Description:
It seems MySQL server does not have any reliable way to find out in time that remote client session is disconnected/had gone for whatever reason, unless it tries to send something to client or expects something from client (in these cases wait_timeout/interactive_timeout apply). You've probably seen sessions "doing something" for many days, like this (on 5.5):

---TRANSACTION 14A47661CF, ACTIVE 4767164 sec
mysql tables in use 3, locked 0
MySQL thread id 2668632, OS thread handle 0x7f3597be9700, query id 3324782292 XXX.YYY.0.ZZZ user preparing
SELECT something FROM t1 p, t2 i WHERE some_conditions AND col1 in ( SELECT col1 FROM huge_table WHERE some_condition_not_covered_by_index) ...

even though timeouts are set to default values and server where client session was started is down/unavailable for many days. Same applies to some stored routine call that "loops" forever and even to SELECT sleep(N) statements - the fact that client is disconnected will be noted wait_timeout or interactive_timeout seconds after current statement completes.

I'd like to see this fixed at server side, as the best thing one can do today for cases like this is to use pt-kill tool or check and kill "running away" sessions manually from server side.

How to repeat:
Connect to some MySQL server remotely, set wait_timeout and interactive_timeout to 30 seconds at session level (or have them set to that globally before connections), then run something like this:

select sleep(120);

Connect locally as root, check with SHOW PROCESSLIST that session is there. Now, break network between first session and MySQL server (put eth0 down, disconnect cable, get power out of client etc). Check from the client host that you can not connect to MySQL server any more. Check from local connection that SELECT is still visible in the processlist. Wait for 120 seconds to pass, note that session is still there, doing nothing. Line in SHOW PROCESSLIST output for the session will disappear only after 30 more seconds passed.

Suggested fix:
Implement some MySQL server thread that checks client sessions for being "alive". Probably some "keepalive" packages should be introduced for this at protocol level.

Alternatively, implement some server side timeout-based check of sessions that kills/closes them unconditionally after N seconds, no matter what their thread status is.

I find this a very welcome feature request.

Changing a client - server protocol to a two-way street could make this feature request much easier to implement. Hence, these improvements should obtain attention that they deserve.

Actually, it does if this hasn't been broken since Bug#10374 was fixed.

> Alternatively, implement some server side timeout-based check of sessions that kills/closes them unconditionally after N seconds, no matter what their thread status is.

Like, hum, the server-side statement timeout in MySQL 5.7?

The server side statement timeout does not seem to help, at least in my tests. 

I hit a similar problem as this one with queries reading from a fifo, when there's an error with the fifo. They seem to get stuck forever in 'Query' or, after a KILL <pid>, 'Killed' state, even after the statement timeout, wait_timeout and net read/write timeouts have passed. 

To repeat, you can just: 

mkfifo /tmp/fifo

Then create a simple table on mysql. I used this one: 

CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB

Then set the timeouts to a low enough value to test. I set the statement timeout to 10000 and the interactive/wait/net timeouts to 30. 

Then run this from mysql: 

load data infile '/tmp/fifo' into table test.test;

Then rm -f /tmp/fifo

And then you have a stuck query. 

Here's how my variables look like: 

mysql> show processlist;
+----+------+-----------+------+---------+------+-------------+---------------------------------------------------+
| Id | User | Host      | db   | Command | Time | State       | Info                                              |
+----+------+-----------+------+---------+------+-------------+---------------------------------------------------+
|  2 | root | localhost | NULL | Killed  |  264 | System lock | load data infile '/tmp/fifo' into table test.test |
|  7 | root | localhost | NULL | Query   |    0 | starting    | show processlist                                  |
+----+------+-----------+------+---------+------+-------------+---------------------------------------------------+
2 rows in set (0.00 sec)

mysql> show variables like '%timeout%';
+-----------------------------+----------+
| Variable_name               | Value    |
+-----------------------------+----------+
| connect_timeout             | 10       |
| delayed_insert_timeout      | 300      |
| have_statement_timeout      | YES      |
| innodb_flush_log_at_timeout | 1        |
| innodb_lock_wait_timeout    | 50       |
| innodb_rollback_on_timeout  | OFF      |
| interactive_timeout         | 30       |
| lock_wait_timeout           | 31536000 |
| net_read_timeout            | 30       |
| net_write_timeout           | 30       |
| rpl_stop_slave_timeout      | 31536000 |
| slave_net_timeout           | 3600     |
| wait_timeout                | 30       |
+-----------------------------+----------+
13 rows in set (0.00 sec)

mysql> show variables like 'max_statement_time';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| max_statement_time | 10000 |
+--------------------+-------+
1 row in set (0.00 sec)

And it seems  if the thread is not stuck, then the statement timeout does its job ok: 

mysql> select sleep(12);
+-----------+
| sleep(12) |
+-----------+
|         1 |
+-----------+
1 row in set (10.00 sec)

This is on 5.7.4-m14

I first hit this on 5.5.33 and as a result of a failed fifo, I've had stuck queries for over 12 hours (in 'Killed' state) and it seems the only way out of it will be to restart mysqld. While the queries are stuck in this state, they held the MDL for the table and thus prevent any DDL from running.