Bug #78809 MySQL keeps running queries for disconnected users
Submitted: 12 Oct 2015 19:26 Modified: 14 Sep 2023 20:32
Reporter: Domas Mituzas Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Connection Handling Severity:S3 (Non-critical)
Version:all of them, 5.6.26, 5.6.27 OS:Any
Assigned to: CPU Architecture:Any

[12 Oct 2015 19:26] Domas Mituzas
Description:
if a user disconnects from MySQL with a query still running, MySQL will keep running that query forever as long as no data is being returned interim (e.g. data is being written to temptable or sort file or simply no data is being returned). 

that often wastes lots of resources and MySQL could know better.

How to repeat:
connect to MySQL with any API, run long query that doesn't return data, run e.g. mysql_use_result(), die. 

Suggested fix:
check for closed connections, kill queries for those connections
[13 Oct 2015 8:07] MySQL Verification Team
Hello Domas,

Thank you for the report.
Observed this with 5.6.26/5.6.27 builds.

Thanks,
Umesh
[13 Oct 2015 8:35] MySQL Verification Team
test results

Attachment: 78809.results (application/octet-stream, text), 9.14 KiB.

[15 Oct 2015 9:35] Andrii Nikitin
Posted by developer:
 
In my understanding (I may be wrong):
- such behavior is by design;
- such behavior is common for most socket programs;
- the easiest way to work around such problem is to use TCP keepalive feature.
[15 Oct 2015 16:02] Domas Mituzas
TCP keepalive only handles TCP socket destruction when another endpoint is dead (and hence the silence). Either socket gets destroyed by keepalive, or by other client shutting down TCP socket, MySQL will ignore both. This has nothing to do with TCP keep-alive. 

It is not much of a "by design" as by "lack of implementation". Many socket programs will gladly stop whatever they were doing, if client disconnects (whole logic around unix SIGHUP is around that, try disconnecting your SSH session with anything running).

Even "do nothing" is a design, it is not consistent with other behaviors. Query will not stop running if it is able to send data, and query will stop running if it is not able to send the data, which means that queries with side effects are not defined well. 

Various APIs like poll() allow polling just for errors, so even if you don't expect bytes to arrive over a socket, you can still pull information whether there was a disconnect - especially for long running queries. 

Which socket programs did you have in mind that are ignoring whether client is waiting or not?
[4 Oct 2021 10:41] MySQL Verification Team
Bug #105104 is marked as duplicate of this bug
[14 Sep 2023 18:53] Aaron Burgemeister
A problem we have seen as a result of this (or similar) defect/bug is that the UNDO tablespace gets used. The client disconnects for whatever reason, MySQL keeps on running the SELECT way longer than expected (normally returns in a second, but in this case runs for the better part of two (2) days), and the system dutifully keeps things working filling the UNDO (ibdata1) space until it reaches it max, and then nothing works because the system reports it is out of space for a "sequence" table with one row that gets updated regularly.

The table 'some_table_here_sequence' is full.

As a result, it would be REALLY nice if MySQL would accept what the operating system (OS) is presumably passing to it when the socket is closed and cancel running queries, unless there is some very compelling reason to do otherwise. If nothing else this wastes cycles, and causes outages/downtime of applications when the UNFO space is filled and either the stuck connections need to be killed or else the database service restarted.

Stuck queries for days follow, and I confirmed there was no TCP connection on the client side (I lack DB server access to check there) and presumably had not been for over a day based on the service logs.

| Id | User | Host | db | Command | Time | State | Info |
+---------+------------+------------------------------------+-----------------+---------+--------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1402377 | serviceacct | server.company.com:50192 | useriq | Execute | 151022 | Sending data | select count(user0_.id) as col_0_0_ from pfx_user user0_ where (user0_.id not in (select link1_.user_id from pfx_link link1_ left outer join pfx_application applicatio2_ on link1_.application=applicatio2_.id where applicatio2_.name='eDirectory' or link1_.native_user='abc')) and user0_.workgroup<>1 |
| 1402462 | serviceacct | server.company.com:51928 | useriq | Execute | 150872 | Sending data | select count(user0_.id) as col_0_0_ from pfx_user user0_ where (user0_.id not in (select link1_.user_id from pfx_link link1_ left outer join pfx_application applicatio2_ on link1_.application=applicatio2_.id where applicatio2_.name='eDirectory' or link1_.native_user='abc')) and user0_.workgroup<>1 |
[14 Sep 2023 20:32] Domas Mituzas
yeh, one way is to have a reaper that checks TCP tables and wipes pending transactions/conns for disconnected users - I have written this once or twice, not sure we use in prod right now.