Bug #95893 A client process may be blocked permanently on network congestion
Submitted: 20 Jun 2019 9:29 Modified: 5 Sep 2019 12:31
Reporter: Wei Zhao (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Connection Handling Severity:S2 (Serious)
Version:5.6,5.7,8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[20 Jun 2019 9:29] Wei Zhao
Description:
A mysql client process can be blocked permanently if the session net_write_timeout is short(e.g. 1 second) and the query result takes a lot of packets to transmit and there is a network congestion during the transmission from mysqld to the client process.

How to repeat:
At server side prepare a table my_big_table with a lot of data(e.g. 32MB), and set global and session net_write_timeout=1. And to imitate a network congestion that surely happens, we have to use gdb to block the execution of mysqld and client mysql at the right place. Use gdb to attach to the mysqld process and set a breakpoint at function vio_write() and vio_io_wait().Then in the client, issue a ‘select * from my_big_table’, and almost immediately use gdb to attach to the client mysql process, then you will most probably be blocked at such a callstack, and keep it blocked.

Then in the gdb attached to mysqld process, you will meet many breakpoint hits in vio_write() (server sending result packets to client) and then finally you will see vio_io_wait() (send() would block because OS socket write buffer is full) is called and in vio_io_wait(), this statement is executed:

errno= SOCKET_ETIMEDOUT;

Then the statement execution at server side finishes successfully, but only partial results are sent to the client.At client side, however, the query statement blocks forever even if you quit the attached gdb.

Suggested fix:
The cause of this issue is that if the poll() called in vio_io_wait() times out, net_write_raw_loop() return error and the execution of the sql statement completes. The timeout error is simply ignored, and this is wrong! Server side should have disconnected the socket connection so that client side can return from blocked recv() syscall.
[20 Jun 2019 9:31] Wei Zhao
this patch fixes the bug

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: client-permanent-block-fix.diff (application/octet-stream, text), 1.06 KiB.

[20 Jun 2019 9:34] MySQL Verification Team
Hello Wei Zhao,

Thank you for the report and contribution.

regards,
Umesh
[26 Jul 2019 4:10] Thayumanavar Sachithanantham
Following the steps mentioned in the bug, the issue does not reproduce itself in 8.0. 
Also if there is a timeout as result of write buffer getting full and partial 
write happening, timeout happens, then it is considered an error (since not 
specified bytes have been written to socket buffer). Thus the connection is 
aborted.