Bug #80260 MySQL Router is down with more than 1000 concurrent connections
Submitted: 4 Feb 2016 6:08 Modified: 5 Jan 2018 3:12
Reporter: Shinya Sugiyama Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Router Severity:S2 (Serious)
Version:2.0.2,2.1.4 OS:Linux
Assigned to: CPU Architecture:Any
Tags: MySQL Router

[4 Feb 2016 6:08] Shinya Sugiyama
Description:
MySQL Router is down with segfault if connection is increased.

Using: mysql-router-2.0.2-linux-glibc2.17-x86-64bit.tar.gz

【Router configuration】

[DEFAULT]
logging_folder = /home/mysql/mysql-router
plugin_folder  = /home/mysql/mysql-router/lib/mysqlrouter 

[logger]
level = DEBUG

[routing:master]
bind_address = 0.0.0.0:7001
destinations = 127.0.0.1:3301
mode = read-write
max_connections = 8192
connect_timeout = 1
# wait_timeout = 1

[routing:slave]
bind_address=0.0.0.0:7002
destinations = 127.0.0.1:3302,127.0.0.1:3303,127.0.0.1:3304
mode = read-only
max_connections = 8192
connect_timeout = 1
# wait_timeout = 1

[keepalive]
interval = 120

【Sample Script】Using DBI and Execute following command

	for i in `seq 1 2000`
	do
	    mysql -h [mysqlrouter host] -uroot -P [mysqlrouter port] -e "SELECT SLEEP(120)" & 
	done

【Issue】 

[999] 10234
[1000] 10235
[1001] 10236
ERRORERROR 2013 (HY000) 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104
: Lost connection to MySQL server at 'reading initial communication packet', system error: 104
ERRORERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104
 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104

How to repeat:
【Without Router】Direct connection to MySQL

we can handle more than 1000 connection normally.

 [root@GA01 admin]# /usr/local/mysql/bin/mysqlslap -h 192.168.56.111 -P 3301 --no-drop --create-schema=SLAP --engine=InnoDB --concurrency=1000 -q 'SELECT SLEEP(120)' -u admin -p
Benchmarksword: 
        Running for engine InnoDB
        Average number of seconds to run all queries: 121.525 seconds
        Minimum number of seconds to run all queries: 121.525 seconds
        Maximum number of seconds to run all queries: 121.525 seconds
        Number of clients running queries: 1000
        Average number of queries per client: 1
[root@GA01 admin]# 

【With Router】

--concurrency=500, --concurrency=600, --concurrency=700, --concurrency=800, --concurrency=900
were working fine without error. (connection closed normally without error)

[root@GA01 admin]# /usr/local/mysql/bin/mysqlslap -h 192.168.56.111 -P 7001 --no-drop --create-schema=SLAP --engine=InnoDB --concurrency=900 -q 'SELECT SLEEP(120)' -u admin -p
Enter password: 
Benchmark
        Running for engine InnoDB
        Average number of seconds to run all queries: 123.219 seconds
        Minimum number of seconds to run all queries: 123.219 seconds
        Maximum number of seconds to run all queries: 123.219 seconds
        Number of clients running queries: 900
        Average number of queries per client: 1
[root@GA01 admin]#

When we test 1000 connection through MySQL Router, it was down with segfault error.

[root@GA01 admin]# /usr/local/mysql/bin/mysqlslap -h 192.168.56.111 -P 7001 --no-drop --create-schema=SLAP --engine=InnoDB --concurrency=1000 -q 'SELECT SLEEP(120)' -u admin -p
Enter password: 
/usr/local/mysql/bin/mysqlslap: Cannot run query SELECT SLEEP(120) ERROR : Lost connection to MySQL server during query
[root@GA01 admin]# 

/var/log/messages
Feb  3 13:50:13 fabric02 kernel: mysqlrouter[14580]: segfault at 8 ip 0000000000000008 sp 00007f5b8c487ba0 error 14 in mysqlrouter[400000+2000]

Suggested fix:
-----------------------------------------------
There is following function inside of routing.cc
-----------------------------------------------

 int get_mysql_socket(TCPAddress addr, int connect_timeout, bool log) noexcept {

 At here creating sockets for mysql server....

    if ((sock = socket(info->ai_family, info->ai_socktype, info->ai_protocol)) == -1) {
      continue;
    }
    FD_ZERO(&readfds);
    FD_SET(sock, &readfds);

After this function, MySQL Router will use "select" for confirm if sockets is exists or not.
At that time "FD_SET" is used.
 
FD_SET is group of flags that confirm file descriptor is ready or not one by one.
This flag can monitor "0-1024" file descriptor as a default.
In case, MySQL Router use more than 1024, FD_SET Macro will use unintended memory as flags.

-----------------------------------------------

Please increase FD_SETSIZE for handling more concurrent connection.
[4 Feb 2016 8:16] Umesh Shastry
Hello Sugiyama,

Thank you for the report.
Verified as described.

Thanks,
Umesh
[4 Feb 2016 8:17] Umesh Shastry
-- backtrace and log details

[umshastr@hod03]/export/umesh/utils/mysql-router-2.0.2-linux-glibc2.17-x86-64bit: bin/mysqlrouter -c ./etc/sample-router.ini
.
2016-02-04 09:09:48 DEBUG   [7f60aa8d0700] routing:read_only [::1]:38559 - [::1]:38559
2016-02-04 09:09:48 DEBUG   [7f608148e700] routing:read_only [::1]:39045 - [::1]:39045
2016-02-04 09:09:48 DEBUG   [7f6071a75700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f6071a75700] routing:read_only [::1]:39271 - [::1]:39271
2016-02-04 09:09:48 DEBUG   [7f6079281700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f6079281700] routing:read_only [::1]:40237 - [::1]:40237
2016-02-04 09:09:48 DEBUG   [7f607ba85700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f607ba85700] routing:read_only [::1]:39047 - [::1]:39047
2016-02-04 09:09:48 DEBUG   [7f6074c7a700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f607607c700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f6074c7a700] routing:read_only [::1]:39051 - [127.0.0.1]:3306
2016-02-04 09:09:48 DEBUG   [7f607e289700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f607e289700] routing:read_only [::1]:40232 - [127.0.0.1]:3306
2016-02-04 09:09:48 DEBUG   [7f607f68b700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f607b084700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f607b084700] routing:read_only [::1]:39046 - [127.0.0.1]:3306
2016-02-04 09:09:48 DEBUG   [7f607d888700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f607d888700] routing:read_only [::1]:40233 - [127.0.0.1]:3306
2016-02-04 09:09:48 DEBUG   [7f6074279700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f6074279700] routing:read_only [::1]:40243 - [127.0.0.1]:3306
2016-02-04 09:09:48 DEBUG   [7f607a683700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f607a683700] routing:read_only [::1]:39048 - [127.0.0.1]:3306
2016-02-04 09:09:48 DEBUG   [7f607747e700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f607747e700] routing:read_only [::1]:39049 - [127.0.0.1]:3306
2016-02-04 09:09:48 DEBUG   [7f607c486700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f607c486700] routing:read_only [::1]:40235 - [127.0.0.1]:3306
2016-02-04 09:09:48 DEBUG   [7f608008c700] Trying server localhost:3306 (index 0)
2016-02-04 09:09:48 DEBUG   [7f6079c82700] Trying server localhost:3306 (index 0)
Segmentation fault (core dumped)

-- config
[umshastr@hod03]/export/umesh/utils/mysql-router-2.0.2-linux-glibc2.17-x86-64bit: cat  ./etc/sample-router.ini
[logger]
level = DEBUG

[routing:read_only]
bind_address = localhost
bind_port = 7001
destinations = localhost:3306
mode = read-only
max_connections = 8192

[routing:read_write]
bind_address = localhost
bind_port = 7002
destinations = localhost:15001
mode = read-write
max_connections = 8192

--

(gdb) bt
#0  0x00007f66240001b8 in ?? ()
#1  0x00007f6634f252af in RouteDestination::get_server_socket (this=0x7f6624000f00, connect_timeout=1)
    at /export/home2/pb2/build/sb_0-16849190-1445535218.07/mysql-router-2.0.2/src/routing/src/destination.cc:120
#2  0x00007f6634f21fda in MySQLRouting::thd_routing_select (this=0x7f6634d00b90, client=1456)
    at /export/home2/pb2/build/sb_0-16849190-1445535218.07/mysql-router-2.0.2/src/routing/src/mysql_routing.cc:103
#3  0x00007f6634f23810 in operator()<int, void> (__object=<optimized out>, this=<optimized out>) at /usr/include/c++/4.8.2/functional:601
#4  _M_invoke<0ul, 1ul> (this=<optimized out>) at /usr/include/c++/4.8.2/functional:1732
#5  operator() (this=<optimized out>) at /usr/include/c++/4.8.2/functional:1720
#6  std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (MySQLRouting::*)(int)> (MySQLRouting*, int)> >::_M_run() (this=<optimized out>) at /usr/include/c++/4.8.2/thread:115
#7  0x00007f6635cc41e0 in ?? () from /lib64/libstdc++.so.6
#8  0x00007f6635f1ddf5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f663542e60d in clone () from /lib64/libc.so.6
(gdb)
[4 Feb 2016 8:39] Shinya Sugiyama
Umesh-san,

Thank you for your quick confirmation.
MySQL Router is nice function, so I hope it will be fixed soon.

Best Regard
Shinya
[10 Feb 2016 8:28] Shinya Sugiyama
【In Short Term Workaround】

If we set "max_connections" less than 1,000, it seems we can avoid the MySQL Router Process down.

I tested value with 800 max connections.

【Configuration】
--------------------------------------------
-bash-4.2$ cat /etc/mysql/mysqlrouter_maxcon.ini 
# MySQL Router configuration
#

[DEFAULT]
logging_folder = /home/mysql/mysql-router
plugin_folder  = /home/mysql/mysql-router/lib/mysqlrouter 

[logger]
level = DEBUG

[routing:master]
bind_address = 0.0.0.0:7001
destinations = 127.0.0.1:3301
mode = read-write
max_connections = 800
connect_timeout = 1
# wait_timeout = 1

[routing:slave]
bind_address=0.0.0.0:7002
destinations = 127.0.0.1:3302,127.0.0.1:3303,127.0.0.1:3304
mode = read-only
max_connections = 800
connect_timeout = 1
# wait_timeout = 1

[keepalive]
interval = 120
-bash-4.2$ 

【Confirmation】
--------------------------------------------

-bash-4.2$ /usr/local/mysql57/bin/mysql -h 127.0.0.1 -P 7001 -u root --password=root -e "set @@global.max_connections = 2000;" 
mysql: [Warning] Using a password on the command line interface can be insecure.

-bash-4.2$ /usr/local/mysql57/bin/mysqlslap -h 127.0.0.1 -P 7001 --no-drop --create-schema=SLAP --engine=InnoDB --concurrency=500 -q 'select now()' -u root -p
Benchmarksword: 
        Running for engine InnoDB
        Average number of seconds to run all queries: 31.100 seconds
        Minimum number of seconds to run all queries: 31.100 seconds
        Maximum number of seconds to run all queries: 31.100 seconds
        Number of clients running queries: 500
        Average number of queries per client: 1

-bash-4.2$ /usr/local/mysql57/bin/mysqlslap -h 127.0.0.1 -P 7001 --no-drop --create-schema=SLAP --engine=InnoDB --concurrency=1000 -q 'select now()' -u root -p
Enter password: 
Benchmark
        Running for engine InnoDB
        Average number of seconds to run all queries: 130.463 seconds
        Minimum number of seconds to run all queries: 130.463 seconds
        Maximum number of seconds to run all queries: 130.463 seconds
        Number of clients running queries: 1000
        Average number of queries per client: 1

-bash-4.2$ 

-bash-4.2$ /usr/local/mysql57/bin/mysql -h 127.0.0.1 -P 7001 -u root --password=root -e "set @@global.max_connections = 2500;" 
mysql: [Warning] Using a password on the command line interface can be insecure.
-bash-4.2$ 

-bash-4.2$ /usr/local/mysql57/bin/mysqlslap -h 127.0.0.1 -P 7001 --no-drop --create-schema=SLAP --engine=InnoDB --concurrency=1500 -q 'select now()' -u root -p
Enter password: 
Benchmark
        Running for engine InnoDB
        Average number of seconds to run all queries: 128.377 seconds
        Minimum number of seconds to run all queries: 128.377 seconds
        Maximum number of seconds to run all queries: 128.377 seconds
        Number of clients running queries: 1500
        Average number of queries per client: 1

-bash-4.2$ 

-bash-4.2$ /usr/local/mysql57/bin/mysqlslap -h 127.0.0.1 -P 7001 --no-drop --create-schema=SLAP --engine=InnoDB --concurrency=2400 -q 'select now()' -u root -p
Enter password: 
Benchmark
        Running for engine InnoDB
        Average number of seconds to run all queries: 317.968 seconds
        Minimum number of seconds to run all queries: 317.968 seconds
        Maximum number of seconds to run all queries: 317.968 seconds
        Number of clients running queries: 2400
        Average number of queries per client: 1

-bash-4.2$ 

I recommend this value as temporary solution.
Regard
Shinya
[11 Feb 2016 17:01] Zurab Tutberidze
I changed the MySQLRouting::thd_routing_select and get_mysql_socket functions and use poll instead of select and it works fine
[11 Feb 2016 19:38] Zurab Tutberidze
select changed with poll

Attachment: routing.cc (text/x-c++src), 4.08 KiB.

[11 Feb 2016 19:38] Zurab Tutberidze
select changed with poll

Attachment: mysql_routing.cc (text/x-c++src), 10.41 KiB.

[12 Feb 2016 12:27] Shinya Sugiyama
Hi Zurab,

Thank you for nice patches.
[12 Feb 2016 12:42] Shinya Sugiyama
My Last Comments about "【In Short Term Workaround】"is not right.

Router is not down with max_connection option; however, connections that exceed max_connections became error.

I executed wrong query for the last confirmation. (need to confirm concurrency)
× select now()
〇  select sleep(120)

【Confirmation】

-bash-4.2$ /usr/local/mysql57/bin/mysqlslap -h 127.0.0.1 -P 7001 --no-drop --create-schema=SLAP --engine=InnoDB --concurrency=2400 -q 'SELECT SLEEP(120)
' -u root -p
Enter password: 
/usr/local/mysql57/bin/mysqlslap: Error when connecting to server: 2013 Lost connection to MySQL server at 'reading initial communication packet', syste
m error: 0

Snip....

/usr/local/mysql57/bin/mysqlslap: Error when connecting to server: 2013 Lost connection to MySQL server at 'reading 
initial communication packet', system error: 110
Benchmark
        Running for engine InnoDB
        Average number of seconds to run all queries: 391.207 seconds
        Minimum number of seconds to run all queries: 391.207 seconds
        Maximum number of seconds to run all queries: 391.207 seconds
        Number of clients running queries: 2400
        Average number of queries per client: 1

-bash-4.2$ 

【Router Log】

2016-02-12 21:21:57 DEBUG   [7f085ed85700] routing:master [127.0.0.1]:16656 - [127.0.0.1]:3301
2016-02-12 21:21:57 DEBUG   [7f0865d93700] routing:master [127.0.0.1]:16642 - [127.0.0.1]:3301
2016-02-12 21:21:57 DEBUG   [7f0867d97700] routing:master [127.0.0.1]:16638 - [127.0.0.1]:3301
2016-02-12 21:21:57 DEBUG   [7f0862d8d700] routing:master [127.0.0.1]:16648 - [127.0.0.1]:3301
2016-02-12 21:21:57 WARNING [7f0a18bc1700] routing:master reached max active connections (800)
2016-02-12 21:21:57 WARNING [7f0a18bc1700] routing:master reached max active connections (800)
2016-02-12 21:21:57 WARNING [7f0a18bc1700] routing:master reached max active connections (800)
2016-02-12 21:21:57 WARNING [7f0a18bc1700] routing:master reached max active connections (800)
2

memo
--------------------
1) Increase max connection for router.
2) need to change manual for explain about limitation. (low and max number)

Best Regard
Shinya
[10 Nov 2017 23:15] Shinya Sugiyama
Add MySQL Router 2.1.4

res = select(sock + 1, &readfds, &writefds, &errfds, &timeout_val);
[5 Jan 2018 3:12] Philip Olson
Posted by developer:
 
This change is now documented in several places, including: the release notes, the FAQ, and Router's max_connections documentation that previously referenced this known ~500 connection limitation.

This is documented as a v8.0.4 change that is tracked via WL #9857.