Bug #78771 Router running out of resources; not closing socket properly
Submitted: 9 Oct 2015 7:22 Modified: 20 Oct 2015 2:21
Reporter: Geert Vanderkelen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Router Severity:S2 (Serious)
Version:2.0.1 OS:Any
Assigned to: CPU Architecture:Any

[9 Oct 2015 7:22] Geert Vanderkelen
Description:
When running lots of client connections through Router, Router will become unavailable because of resource problems. The following is seen in the (debug) logs:

...
2015-10-09 09:18:26 DEBUG   [10c0f4000] Trying server 127.0.0.1:3307 (index 1)
2015-10-09 09:18:26 DEBUG   [10c0f4000] MySQL Server 127.0.0.1:3307: Invalid argument (22)
2015-10-09 09:18:26 DEBUG   [10c0f4000] Quarantine destination server 127.0.0.1:3307 (index 1)
2015-10-09 09:18:26 DEBUG   [10c0f4000] Trying server 127.0.0.1:3306 (index 0)
2015-10-09 09:18:26 DEBUG   [10c0f4000] MySQL Server 127.0.0.1:3306: Invalid argument (22)
2015-10-09 09:18:26 DEBUG   [10c0f4000] Quarantine destination server 127.0.0.1:3306 (index 0)
2015-10-09 09:18:26 DEBUG   [10c0f4000] No more destinations: all quarantined

How to repeat:
Start router with configuration similar to:

[logger]
level = debug
[routing:B]
bind_address = 0.0.0.0:7002
destinations = 127.0.0.1:3306,127.0.0.1:3307
mode = read-only

Run something in a while loop using the MySQL CLI:

while true; do mysql -uroot -BN --protocol tcp --host 127.0.0.1 -e "SHOW VARIABLES LIKE 'port'" --port 7002; done

Suggested fix:
Use close() after shutdown().
[9 Oct 2015 15:34] Vitor Oliveira
Even with close after the sockets remain lingering for a while, which for a high number of connections and re-connections is still a problem, as sockets stay in TIMED_WAIT state and cannot be used for while.

The suggestion is to use SO_LINGER before the close with a timeout of 0 so that sockets are immediately available.
[20 Oct 2015 2:21] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Router 2.0.2 release, and here's the changelog entry:

Socket connections are now properly closed after calling shutdown(), as to
reclaim resources.

Thank you for the bug report.