Bug #87574 Can not create new connection(errno 11) if mysql has 32300+ connections
Submitted: 29 Aug 2017 9:22 Modified: 30 Aug 2017 1:58
Reporter: Wang Wen'an Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.7.17, 5.7.19 OS:Debian (Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86)
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: Can't create thread to handle new connection(errno= 11)

[29 Aug 2017 9:22] Wang Wen'an
Description:
Our production DB(MySQL-5.7.17) Crashed(10+times) when it holds more than 32200+connections, most of them are Sleep, and I have tried some ways like : 
increase the "file open limits" to 655350;
increase the "performance_schema_max_file_handles" to 655350;
set "performance_schema" to OFF;
set "innodb_numa_interleave" to ON;
set numa to OFF in BIOS;

Nothing helps, and I get lastest crash info as follow:

2017-08-25T11:13:36.642571+08:00 380080 [ERROR] /usr/sbin/mysqld: Out of memory (Needed 432032 bytes)
2017-08-25T11:13:36.642578+08:00 380080 [ERROR] /usr/sbin/mysqld: Out of memory (Needed 432032 bytes)
**a lot of **
2017-08-25T11:13:36.642585+08:00 380080 [ERROR] /usr/sbin/mysqld: Out of memory (Needed 432032 bytes)
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
03:13:36 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=32287
max_threads=50000
thread_count=32292
connection_count=32286
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 19875379 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f53528cae80
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f664ccf3e80 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xe7fdcc]
/usr/sbin/mysqld(handle_fatal_signal+0x459)[0x7a9d39]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f6dfd5408d0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f6dfbed2067]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f6dfbed3448]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x7f6dfc7f007d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8f046)[0x7f6dfc7ee046]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8f091)[0x7f6dfc7ee091]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8f2a9)[0x7f6dfc7ee2a9]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8f7cc)[0x7f6dfc7ee7cc]
/usr/sbin/mysqld(_ZN23table_session_variables6createEv+0x11)[0xeed711]
/usr/sbin/mysqld(_ZN13ha_perfschema8rnd_initEb+0x6a)[0xe965ba]
/usr/sbin/mysqld(_ZN7handler11ha_rnd_initEb+0x19)[0x7f6d89]
/usr/sbin/mysqld(_Z16init_read_recordP11READ_RECORDP3THDP5TABLEP7QEP_TABibb+0x576)[0xbc3336]
/usr/sbin/mysqld(_Z21join_init_read_recordP7QEP_TAB+0xad)[0xc25bad]
/usr/sbin/mysqld(_Z10sub_selectP4JOINP7QEP_TABb+0x11b)[0xc2977b]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x3b8)[0xc22398]
/usr/sbin/mysqld(_ZN10TABLE_LIST19materialize_derivedEP3THD+0x54)[0xc1e714]
/usr/sbin/mysqld(_Z24join_materialize_derivedP7QEP_TAB+0x2f)[0xc22d0f]
/usr/sbin/mysqld(_ZN7QEP_TAB12prepare_scanEv+0x3d)[0xc227dd]
/usr/sbin/mysqld(_Z10sub_selectP4JOINP7QEP_TABb+0x40)[0xc296a0]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x3b8)[0xc22398]
/usr/sbin/mysqld(_Z12handle_queryP3THDP3LEXP12Query_resultyy+0x238)[0xc922e8]
/usr/sbin/mysqld[0x772f97]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x33c1)[0xc54d11]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3bd)[0xc56d9d]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x854)[0xc57664]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x18f)[0xc58d2f]
/usr/sbin/mysqld(handle_connection+0x270)[0xd15fa0]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xe97824]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)[0x7f6dfd5390a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6dfbf8587d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f5210e5b9e0): /* mysql-connector-java-5.1.14 ( Revision: ${bzr.revision-id} ) */SHOW VARIABLES WHERE Variable_name ='language' OR Variable_name = 'net_write_timeout' OR Variable_name = 'interactive_timeout' OR Variable_name = 'wait_timeout' OR Variable_name = 'character_set_client' OR Variable_name = 'character_set_connection' OR Variable_name = 'character_set' OR Variable_name = 'character_set_server' OR Variable_name = 'tx_isolation' OR Variable_name = 'transaction_isolation' OR Variable_name = 'character_set_results' OR Variable_name = 'timezone' OR Variable_name = 'time_zone' OR Variable_name = 'system_time_zone' OR Variable_name = 'lower_case_table_names' OR Variable_name = 'max_allowed_packet' OR Variable_name = 'net_buffer_length' OR Variable_name = 'sql_mode' OR Variable_name = 'query_cache_type' OR Variable_name = 'query_cache_size' OR Variable_name = 'init_connect'
Connection ID (thread ID): 380081
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2017-08-25T03:13:39.002735Z mysqld_safe Number of processes running now: 0
2017-08-25T03:13:39.005091Z mysqld_safe mysqld restarted

Our production env:
LXC, 6cores and 72GB memory, and the size of user data is around 4GB;

How to repeat:
I tried three scenes:
1. Install MySQL-5.7.17 with apt-get;
2. Install MySQL-5.7.19 with apt-get;
3. Compile MySQL-5.7.19 with debug=ON;

And use python script as follows:
-----------------------------------------------------------
import MySQLdb
import sys
import time

loop = 10000
conn_list = []

def my_conn(ip) :
    return MySQLdb.connect(host=ip
            ,port=3306
            ,user='temp'
            ,passwd='test')

def conn_test(ip) :
    for i in range(1,loop) :
        conn = my_conn(ip)
        conn_list.append(conn)
    num = 0
    while(True) :
        print num
        if num == loop - 1 :
            num = 0
            time.sleep(10)
        num = num + 1
        time.sleep(1)
    print rst
    return True

if __name__ == '__main__' :
    conn_test("192.168.100.1")
-----------------------------------------------------------

Run the scripts 4 times by using "nohup python <script name> &"

All three scenes can not create new thread but didn't has memory problem;

error info in mysql error log is "[ERROR] Can't create thread to handle new connection(errno= 11)";
code exception is (1135, "Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug");

And I get the debug logs when mysql can not create a new thread:
----------------------------------------------------------------------------
T@0: >Per_thread_connection_handler::add_connection
T@0: | info: Call pfs_spawn_thread_v1 @wwan ####This is added before complie the source code
T@0: | >my_raw_malloc
T@0: | | my: size: 232  my_flags: 16
T@0: | | exit: ptr: 0x44d43e90
T@0: | <my_raw_malloc 219
T@0: | >my_free
T@0: | | my: ptr: 0x44d43e90
T@0: | <my_free 292
T@0: | >mysql_socket_vio_new
T@0: | | enter: sd: 32421
T@0: | | >my_raw_malloc
T@0: | | | my: size: 528  my_flags: 16
T@0: | | | exit: ptr: 0x44cf1a80
T@0: | | <my_raw_malloc 219
T@0: | | >vio_init
T@0: | | | enter: type: 1  sd: 32421  flags: 0
T@0: | | <vio_init 175
T@0: | <mysql_socket_vio_new 280
T@0: | >my_net_init
T@0: | | >my_net_set_read_timeout
T@0: | | | enter: timeout: 30
T@0: | | | >vio_socket_timeout
T@0: | | | <vio_socket_timeout 322
T@0: | | <my_net_set_read_timeout 1046
T@0: | | >my_net_set_write_timeout
T@0: | | | enter: timeout: 60
T@0: | | | >vio_socket_timeout
T@0: | | | <vio_socket_timeout 322
T@0: | | <my_net_set_write_timeout 1057
T@0: | | >my_raw_malloc
T@0: | | | my: size: 16423  my_flags: 16
T@0: | | | exit: ptr: 0x44e26f00
T@0: | | <my_raw_malloc 219
T@0: | | >vio_fastsend
T@0: | | | exit: 0
T@0: | | <vio_fastsend 357
T@0: | <my_net_init 123
T@0: | >net_send_error
T@0: | | enter: sql_errno: 1135  err:
T@0: | | >net_send_error_packet
T@0: | | <net_send_error_packet 547
T@0: | | >net_write_command
T@0: | | | enter: length: 2
T@0: | | | >net_flush
T@0: | | | | >net_write_packet
T@0: | | | | | >vio_write
T@0: | | | | | <vio_write 214
T@0: | | | | <net_write_packet 648
T@0: | | | <net_flush 228
T@0: | | <net_write_command 403
T@0: | <net_send_error 196
T@0: | >net_end
T@0: | | >my_free
T@0: | | | my: ptr: 0x44e26f00
T@0: | | <my_free 292
T@0: | <net_end 132
T@0: | >my_free
T@0: | | my: ptr: 0x44cf1a80
T@0: | <my_free 292
T@0: <Per_thread_connection_handler::add_connection 420
----------------------------------------------------------------------------

PS: Error occurs when I create the 32372th connections each time in the 3rd scenes so I can get the debug log easily.

Suggested fix:
None
[29 Aug 2017 9:24] Wang Wen'an
my.cnf

Attachment: my_test.cnf (application/octet-stream, text), 3.01 KiB.

[29 Aug 2017 9:27] Wang Wen'an
A mistake in my.cnf : production DB's innodb_buffer_pool_size is 21474836480(20GB)
[29 Aug 2017 14:04] MySQL Verification Team
Hi,

Your config file allows for 600+G of ram to be allocated. You might want to contact our MySQL Support team to help you configure that server properly.

Or for starters, reconsider the size of your binlog_cache_size
https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#sysvar_binlog_...

...A binary log cache is allocated for each client...

all best
Bogdan
[30 Aug 2017 1:58] Wang Wen'an
@Bogdan Kecman

Thanks for the replay, the memory problem should be the mistake of binlog_cache_size;

And the another problem is the connections, why mysql throw errors when it holds more than 32300+ connections?