Description:
mysqlrouter routes connections correctly via a running fabric.
Once the master in the HA group fails, fabric will initiate a failover to a new master. The client connection breaks. The application is trying to reconnect via mysqlrouter but mysqlrouter fails to route the connection to the new master. Instead mysqlrouter spams its errorlogfile with:
2015-11-11 19:56:31 ERROR [7f11142b0700] Too many open files
This message gets repeated until the logfile fills the filesystem (10GB/h, at least 70000 repeats per second).
Killing mysqlrouter and restarting works fine.
How to repeat:
Here is the mysqlrouter.cnf:
[DEFAULT]
logging_folder = /var/log/mysqlrouter/
plugin_folder = /usr/lib64/mysqlrouter
runtime_folder = /var/run/mysqlrouter
config_folder = /etc/mysqlrouter
[logger]
level = info
[keepalive]
interval = 60
[fabric_cache:mydemo]
address = 127.0.0.1
user = admin
[routing:myfabric]
destinations = fabric+cache://mydemo/group/mygroup
bind_port=3400
mode = read-write
Output of mysqlfabric -group lookup_servers mygroup after the failover:
server_uuid address status mode weight
------------------------------------ -------------- ------- ---------- ------
eb5f95c2-885a-11e5-a1a8-080027003c09 localhost:3402 PRIMARY READ_WRITE 1.0
f79d3aff-885a-11e5-b185-080027003c09 localhost:3403 FAULTY READ_WRITE 1.0
lsof of routerprocess when it hangs:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mysqlrout 31626 root cwd DIR 252,0 90 68484819 /home/testy/fabric-5.7/scripts
mysqlrout 31626 root rtd DIR 252,0 4096 128 /
mysqlrout 31626 root txt REG 252,0 11504 71543950 /usr/sbin/mysqlrouter
mysqlrout 31626 root mem REG 252,0 58288 67201308 /usr/lib64/libnss_files-2.17.so
mysqlrout 31626 root mem REG 252,0 144208 71543949 /usr/lib64/mysqlrouter/routing.so
mysqlrout 31626 root mem REG 252,0 11528 71543947 /usr/lib64/mysqlrouter/keepalive.so
mysqlrout 31626 root mem REG 252,0 28360 71543948 /usr/lib64/mysqlrouter/logger.so
mysqlrout 31626 root mem REG 252,0 4505032 71543946 /usr/lib64/mysqlrouter/fabric_cache.so
mysqlrout 31626 root mem REG 252,0 2099360 67201290 /usr/lib64/libc-2.17.so
mysqlrout 31626 root mem REG 252,0 88720 67165000 /usr/lib64/libgcc_s-4.8.3-20140911.so.1
mysqlrout 31626 root mem REG 252,0 1141552 67201298 /usr/lib64/libm-2.17.so
mysqlrout 31626 root mem REG 252,0 991744 67214100 /usr/lib64/libstdc++.so.6.0.19
mysqlrout 31626 root mem REG 252,0 141616 67213764 /usr/lib64/libpthread-2.17.so
mysqlrout 31626 root mem REG 252,0 128072 71543942 /usr/lib64/libmysqlharness.so.0
mysqlrout 31626 root mem REG 252,0 165128 71543944 /usr/lib64/libmysqlrouter.so.1
mysqlrout 31626 root mem REG 252,0 19512 67201296 /usr/lib64/libdl-2.17.so
mysqlrout 31626 root mem REG 252,0 164336 67201283 /usr/lib64/ld-2.17.so
mysqlrout 31626 root 0u CHR 136,1 0t0 4 /dev/pts/1
mysqlrout 31626 root 1u CHR 136,1 0t0 4 /dev/pts/1
mysqlrout 31626 root 2u CHR 136,1 0t0 4 /dev/pts/1
mysqlrout 31626 root 3w REG 252,0 53164354 2325328 /var/log/mysqlrouter/mysqlrouter.log
mysqlrout 31626 root 4u IPv4 254925267 0t0 TCP localhost:csms2 (LISTEN)
mysqlrout 31626 root 5u IPv4 254983679 0t0 TCP localhost:60718->localhost:32275 (ESTABLISHED)
mysqlrout 31626 root 6u sock 0,7 0t0 254986341 protocol: TCPv6
mysqlrout 31626 root 7u sock 0,7 0t0 254986030 protocol: TCPv6
...
mysqlrout 31626 root 1023u sock 0,7 0t0 254986862 protocol: TCPv6
There are more than 1000 IPv6 sockets opened. But I don't use IPv6.
My application is Java, when the connections breaks it will automatically reconnect via JDBC.
In 10% of the cases failover works fine with router. In most cases mysqlrouter will die.
Suggested fix:
Seems like during reconnect phase with master not yet available in fabric there are ressources not freed again. But pure speculation.
I expect failover to work.
I also suggest to reduce error log entries in the usual way:
2015-11-11 19:56:31 ERROR [7f11142b0700] Too many open files
Last message repeated 10000 times