MySQL Bugs: #68064: mysql crashes when Solaris resource controls prevent memory from allocation

Bug #68064	mysql crashes when Solaris resource controls prevent memory from allocation
Submitted:	10 Jan 2013 4:17	Modified:	22 Jan 2013 19:40
Reporter:	Eugene Zheganin	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.5.16, 5.5.31	OS:	Solaris
Assigned to:		CPU Architecture:	Any

Description:
mysql cannot start with large innodb_buffer_pool_size. it start fine when innodb_buffer_pool_size is set to 16G, and crashes when it's set to 24G. I'm launchin it on the machine with 36G of memory, and I'm pretty sure there's still enough memory to allocate.

Valid config:

[mysqld]
max_connections = 1024
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
skip-name-resolve

key_buffer_size = 16M
max_allowed_packet = 128M

table_open_cache = 8192
net_buffer_length = 512K

sort_buffer_size = 232K
read_buffer_size = 16M
read_rnd_buffer_size = 1M
myisam_sort_buffer_size = 16M

thread_cache_size = 256

query_cache_size=1024M
max_heap_table_size=1024M
tmp_table_size=1024M

innodb_buffer_pool_size=16384M
innodb_flush_log_at_trx_commit=1
innodb_additional_mem_pool_size=512M
innodb_log_buffer_size=256M
innodb_thread_concurrency=32

relay-log=janus-relay-bin
replicate-ignore-db=mysql
replicate-ignore-db=information_schema
replicate-ignore-db=test
replicate-ignore-db=performance_schema
slave-skip-errors = 1051, 1060, 1062, 1064

server-id = 3
binlog_format = mixed

When changing innodb_buffer_pool_size to 24576M I got the following log:

130110 07:58:52 mysqld_safe Starting mysqld daemon with databases from /usr/local/mysql/data
130110  7:58:52 InnoDB: The InnoDB memory heap is disabled
130110  7:58:52 InnoDB: Mutexes and rw_locks use GCC atomic builtins
130110  7:58:52 InnoDB: Compressed tables use zlib 1.2.3
130110  7:58:52 InnoDB: Initializing buffer pool, size = 24.0G
130110  7:59:04  InnoDB: Assertion failure in thread 1 in file ut0mem.c line 107
InnoDB: Failing assertion: ret || !assert_on_error
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
130110  7:59:04 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=16777216
read_buffer_size=16777216
max_used_connections=0
max_threads=1024
thread_count=0
connection_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17042576 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
/usr/local/mysql/bin/mysqld'my_print_stacktrace+0x2a [0xa649bb]
/usr/local/mysql/bin/mysqld'handle_segfault+0x27c [0x76dfc9]
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fff222616]
/lib/amd64/libc.so.1'call_user_handler+0x2aa [0xfffffd7fff215de6]
/lib/amd64/libc.so.1'_lwp_kill+0xa [0xfffffd7fff22b82a] [Signal 6 (ABRT)]
/lib/amd64/libc.so.1'raise+0x19 [0xfffffd7fff1d1169]
/lib/amd64/libc.so.1'abort+0x5d [0xfffffd7fff1a7b65]
/usr/local/mysql/bin/mysqld'ut_malloc_low+0x6d [0xb41b81]
/usr/local/mysql/bin/mysqld'ut_malloc+0x22 [0xb41d9f]
/usr/local/mysql/bin/mysqld'hash0_create+0x49 [0xb9cb12]
/usr/local/mysql/bin/mysqld'ha_create_func+0x1d [0xb9bc66]
/usr/local/mysql/bin/mysqld'btr_search_sys_create+0xa2 [0xb5cc2e]
/usr/local/mysql/bin/mysqld'buf_pool_init+0xee [0xb62902]
/usr/local/mysql/bin/mysqld'innobase_start_or_create_for_mysql+0x638 [0xb221d3]
/usr/local/mysql/bin/mysqld'_ZL13innobase_initPv+0x70e [0xae883c]
/usr/local/mysql/bin/mysqld'_Z24ha_initialize_handlertonP13st_plugin_int+0x81 [0x9230bb]
/usr/local/mysql/bin/mysqld'_ZL17plugin_initializeP13st_plugin_int+0x5d [0x7fb2af]
/usr/local/mysql/bin/mysqld'_Z11plugin_initPiPPci+0x597 [0x7fbb1f]
/usr/local/mysql/bin/mysqld'_ZL22init_server_componentsv+0x426 [0x76fd20]
/usr/local/mysql/bin/mysqld'_Z11mysqld_mainiPPc+0x423 [0x77074d]
/usr/local/mysql/bin/mysqld'main+0x20 [0x76af77]
/usr/local/mysql/bin/mysqld'_start+0x6c [0x76ae5c]
Please read http://dev.mysql.com/doc/refman/5.1/en/resolve-stack-dump.html
and follow instructions on how to resolve the stack trace.
Resolved stack trace is much more helpful in diagnosing the
problem, so please do resolve it
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
130110 07:59:04 mysqld_safe mysqld from pid file /usr/local/mysql/data/janus.pid ended

How to repeat:
Use a innodb_buffer_pool_size around 24G.

Thank you for the report.

What is the output of ulimit -a and vm.overcommit_ratio on your system?

I think vm.overcommit_ratio is a Linux kernel tunable.

[root@janus ~]# su - mysql
 $ ulimit -a
address space limit (kbytes)   (-M)  unlimited
core file size (blocks)        (-c)  unlimited
cpu time (seconds)             (-t)  unlimited
data size (kbytes)             (-d)  unlimited
file size (blocks)             (-f)  unlimited
locks                          (-x)  not supported
locked address space (kbytes)  (-l)  not supported
message queue size (kbytes)    (-q)  not supported
nice                           (-e)  not supported
nofile                         (-n)  256
nproc                          (-u)  29995
pipe buffer size (bytes)       (-p)  5120
max memory size (kbytes)       (-m)  not supported
rtprio                         (-r)  not supported
socket buffer size (bytes)     (-b)  5120
sigpend                        (-i)  32
stack size (kbytes)            (-s)  10240
swap size (kbytes)             (-w)  not supported
threads                        (-T)  not supported
process size (kbytes)          (-v)  unlimited

However, I believe you're asking about Solaris resource controls:

[root@janus ~]# ps -ef | grep mysql
    root 29304 29289   0 22:21:11 pts/5       0:00 grep mysql
   mysql 22263 21644   3   Jan 10 ?        3672:48 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mys
    root 21644     1   0   Jan 10 ?           0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --basedir=/usr/local/mysql --datadir=/

[root@janus ~]# prctl -n project.max-shm-memory -i process 22263
process: 22263: /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      9.00GB      -   deny                                 -
        system          16.0EB    max   deny                                 -
[root@janus ~]# prctl -n project.max-shm-memory -i process 21644
process: 21644: /bin/sh /usr/local/mysql/bin/mysqld_safe --basedir=/usr/local/mysql --
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      9.00GB      -   deny                                 -
        system          16.0EB    max   deny

I agree that due to an error in a /etc/project the actual shared memory limit was far below the requested value.

However, I suppose that mysql should handle this situation correctly. This is still my mistake, but I believe the bug should stand.

Thanks.

How do you want MySQL server to handle this situation? It can not start with a buffer pool of the size you've requested (as it can not allocate the memory). So, you'd like to see a clear error message instead of assertion failure and a stack trace or something else?

Yup.

At least that's what oracle db and pgsql do, so I could say "it's a standard approach". Every developer has a right to keep his own point of view (no questions here) but as engineer I usually report everything that's producing a backtrace or an assertion failure.

Thank you for the feedback.

Your request is reasonable. Verified as described.