MySQL Bugs: #59602: MySQL Server crashes with signal 11 (segfault)

Bug #59602	MySQL Server crashes with signal 11 (segfault)
Submitted:	19 Jan 2011 6:43	Modified:	30 Apr 2015 11:24
Reporter:	Andy Knuts	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Query Cache	Severity:	S1 (Critical)
Version:	5.5.8	OS:	Linux
Assigned to:		CPU Architecture:	Any
Tags:	segfault

Description:
We installed a new MySQL server a couple of days ago and we're using the latest 5.5.8 MySQL GA release. We also use filesystem based snapshots to take backups and in order to do that we issue a "FLUSH TABLES WITH READ LOCK" right before we create filesystem snapshots and "UNLOCK TABLES" as soon as the snapshots are created.

Now, last night MySQL seems to have crashed right after it received the "FLUSH TABLES WITH READ LOCK" SQL command and it wrote this to MySQL error log:

110119  3:13:40 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1073741824
read_buffer_size=1048576
max_used_connections=1001
max_threads=1000
thread_count=43
connection_count=32
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3107943 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x7fc96c022550
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fc98fd3be60 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x39)[0x916839]
/usr/local/mysql/bin/mysqld(handle_segfault+0x359)[0x4fc0d9]
/lib/libpthread.so.0(+0xfb40)[0x7fcad5e5cb40]
/usr/local/mysql/bin/mysqld(_ZN11Query_cache17free_memory_blockEP17Query_cache_block+0x2b)[0x5423eb]
/usr/local/mysql/bin/mysqld(_ZN11Query_cache19free_query_internalEP17Query_cache_block+0xb7)[0x5436a7]
/usr/local/mysql/bin/mysqld(_ZN11Query_cache11flush_cacheEv+0x7f)[0x545c9f]
/usr/local/mysql/bin/mysqld(_ZN11Query_cache5flushEv+0x48)[0x545d58]
/usr/local/mysql/bin/mysqld(_Z20reload_acl_and_cacheP3THDmP10TABLE_LISTPb+0x460)[0x62aca0]
/usr/local/mysql/bin/mysqld(_Z21mysql_execute_commandP3THD+0x7e2)[0x574192]
/usr/local/mysql/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x15a)[0x57801a]
/usr/local/mysql/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xe8d)[0x5796fd]
/usr/local/mysql/bin/mysqld(_Z24do_handle_one_connectionP3THD+0x137)[0x60c647]
/usr/local/mysql/bin/mysqld(handle_one_connection+0x54)[0x60c724]
/lib/libpthread.so.0(+0x7971)[0x7fcad5e54971]
/lib/libc.so.6(clone+0x6d)[0x7fcad52e892d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x7fc8dc436280 is an invalid pointer
thd->thread_id=24849161
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
110119 03:13:41 mysqld_safe Number of processes running now: 0
110119 03:13:41 mysqld_safe mysqld restarted

Can anyone tell me what did go wrong here? Is there a fix or is it a unknown bug that we met?

How to repeat:
Don't know. Just do "flush tables with read lock" on a big database?

Thank you for the problem report. Please, send your my.cnf file content and the results of:

uname -a
free

Linux commands.

# free
             total       used       free     shared    buffers     cached
Mem:       8192760    8091108     101652          0       9836    3365572
-/+ buffers/cache:    4715700    3477060
Swap:      3903756     247584    3656172

# uname -a
Linux SDB1 2.6.35-24-server #42-Ubuntu SMP Thu Dec 2 03:58:11 UTC 2010 x86_64 GNU/Linux

# cat /etc/my.cnf | awk '{if (/^\#/ || /^$/) {} else {print}}' 
[client]
port            = 3306
socket          = /tmp/mysql.sock
[mysqld]
interactive_timeout=30
port            = 3306
socket          = /tmp/mysql.sock
basedir         = /usr/local/mysql
datadir         = /usr/local/mysql/data
tmpdir          = /tmp
log_error       = /var/log/mysql/error.log
skip-external-locking
skip-name-resolve
bind-address            = 0.0.0.0
key_buffer_size = 1G
max_allowed_packet = 32M
table_open_cache = 256
sort_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
max_connections = 1000
table_cache = 5000
thread_cache_size = 80
query_cache_size= 16M
thread_concurrency = 8
max_heap_table_size = 64M
tmp_table_size = 64M
innodb_file_per_table =  1
innodb_buffer_pool_size = 3G
innodb_flush_log_at_trx_commit = 0
innodb_open_files=2048
innodb_log_buffer_size=8M
innodb_log_file_size=128M
join_buffer_size=512K
low_priority_updates=1
concurrent_insert=2
query_cache_limit       = 20M
query_cache_size        = 800M
slow_query_log          = /var/log/mysql/slow.log
long_query_time         = 3
log_bin                 = /usr/local/mysql/BINLOGS/mysql-bin.log
expire_logs_days       = 100
max_binlog_size         = 1G
binlog_format=mixed
[mysqldump]
quick
max_allowed_packet = 32M
[mysql]
no-auto-rehash
[myisamchk]
key_buffer_size = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout

I see the following settings related to the query cache in your my.cnf:

query_cache_size= 16M
...
query_cache_limit       = 20M
query_cache_size        = 800M

Can you, please, try to remove/comment out the last 2 settings and leave only:

query_cache_size= 16M

and then check if this crash will ever appear again?

Okay. I commented out 'query_cache_size= 16M'. I didn't know I had two of these in my config file but is there realy a chance that this was the problem? Because "show variables like '%query_cache%'" showed the right value. Isn't the first query_cache_size just being overwritten by the second query_cache_size in the config file?

I think this setting

query_cache_size        = 800M

can be a part of the problem. Nobody really needs such a big query cache. I'd like to check if smaller query cache can prevent the problem.

With your other my.cnf settings it is possible that you had out of memory situation.

Oké. Which size would you like to test?

Ps: 

I also would like to add that this exact same configuration (even with the double query_cache_size setting) has been in use for 1.5 years on exactly the same hardware without any troubles (not even a single crash) while we were using MySQL 5.0.51a.

We didn't touch the hardware configuration but a couple of days ago we did install a new OS and installed the latest MySQL GA version. (We used the MySQL binaries, not from source)

So, do you still think it can be related to a OOM? The kernel didn't report anything about an OOM at least.

Please, try with

query_cache_size= 16M

Just to make sure query cache size does not really matter.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".