Bug #24793 Cluster nodes keep dead connections forever, should use SO_KEEPALIVE
Submitted: 4 Dec 2006 11:25 Modified: 19 Jun 2007 9:54
Reporter: Hartmut Holzgraefe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.0 (probably any) OS:Any (*)
Assigned to: li zhou CPU Architecture:Any

[4 Dec 2006 11:25] Hartmut Holzgraefe
Description:
If a cluster node dies hard (power failure, network cable unplugged/cut ...) TCP sockets to the vanished node are kept up, and due to no traffic happening on the sockets (as the vanished node was the one to initiate further communication) the surviving end of the socket stays around forever.

For management node connections a PURGE STALE SESSIONS seems to solve this, but we've observed a case where a data<->sql node connection was affected due to a sql node crash/disconnect and the sql node could only join the cluster again after identifying and restarting the data node holding the stale conneciton in state ESTABLISHED on its end.

How to repeat:
I was not able to repeat the data<->api node case yet, but the mgmt<->sql node case can easily be reproduced by unplugging the network cable on a sql node in the cluster. The sql node will rapidly be declared dead by the remaining nodes but its connection to port 1186 of the management server will stay in ESTABLISHED state forever ... or at least up to the next PURGE STALE SESSIONS command

Suggested fix:
1) enable setsockopt(..., SO_KEEPALIVE, ...) on all socket
   based transports, this will at least check for the other 
   end still being alive after 2 hours of no communication 
   on a socket. This can be done quickly and does at least 
   ensure that the situation does not stick forever.

2) have all nodes actively close any remaining socket
   connections to a node declared dead. We do expect a 
   node declared dead to reconnect anyway, aren't we?
   So there should be no further valid communication on
   the remaining sockets and so no harm can be expected
   from closing them on the still active nodes.
[1 Feb 2007 14:36] Hartmut Holzgraefe
see also bug #26008
[12 Mar 2007 9:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21701

ChangeSet@1.2389, 2007-03-12 16:56:19+00:00, lzhou@dev3-63.(none) +2 -0
  BUG#24793 Add SO_KEEPALIVE socket option to avoid cluster nodes keeping dead connections forever.
[29 Mar 2007 8:15] Stewart Smith
since we're always setting the option, just use local variable. Also, should be errno and not InetErrno, right?
[29 Mar 2007 8:19] Stewart Smith
and the option to setsockopt should be void* not char* ,right?
[30 Mar 2007 5:36] Stewart Smith
Patch okay after the agreed changes discussed on IRC:

<lzhou> stewart: ok, so i just need to use local variable instead of global to save memory. if right, i will push new patch. 
<stewart> lzhou: and remove the #ifdef around the error printing.
 lzhou: then okay to push
<lzhou> stewart: all of #ifdef or just SO_KEEPALIVE
<stewart> lzhou: just the SO_KEEPALIVE error
<lzhou> stewart: ok. will do it. thanks.
[30 Mar 2007 7:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23389

ChangeSet@1.2389, 2007-03-30 15:01:03+00:00, lzhou@dev3-63.(none) +1 -0
  BUG#24793 Add SO_KEEPALIVE socket option to avoid cluster nodes keeping dead connections forever.
[7 Apr 2007 7:00] Bugs System
Pushed into 5.0.40
[7 Apr 2007 7:01] Bugs System
Pushed into 5.1.18-beta
[10 Apr 2007 4:16] Jon Stephens
Does the fix implement Hartmut's suggestion #1 or #2?

In either case, how much time elapses after a node dies before something is done about it?
[10 Apr 2007 12:00] li zhou
Implemented suggest #1. Added SO_KEEPALIVE.
It will persist about 2 hours.
[19 Jun 2007 9:54] Jon Stephens
Thank you for your bug report. This issue has already been fixed in the latest released version of that product, which you can download at

  http://www.mysql.com/downloads/

Documented in 5.0.40 and 5.1.18 changelogs.