MySQL Bugs: #64690: non-reproducible occasional crashes

Bug #64690	non-reproducible occasional crashes
Submitted:	19 Mar 2012 11:41	Modified:	23 Mar 2012 9:31
Reporter:	A Sieferlinger	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: General	Severity:	S3 (Non-critical)
Version:	5.1.61-0+squeeze1-log	OS:	Linux (Debian 6.0.4)
Assigned to:		CPU Architecture:	Any

Description:
I experienced an issue now already for several weeks:

The mysqld crashes at a query with "mysqld got signal 11" (details see attachment). It restarts automatically, does a recover on the InnoDB and is running again then.
Sometime also tables are corrupted but could be fixed in most cases with an repair statement.

When the crashing query is executed again the server does not crash and just works fine.
As I did not find something in common between all the queries that caused a crash i suspect a different reason.

Details about the system:
Debian mySQL Package: 5.1.61-0+squeeze1-log
Debian 6.0.4
The server causing the problems is the master in a simple master slave setup. The slave has exactly the same specs.

Hardware specs:
32 GB Memory
2 Quadcores with Hyper Threading (Intel(R) Xeon(R) CPUL5520  @ 2.27GHz)
Hardware RAID 10 with BBU

How to repeat:
As mentioned above I did not found a way to reproduce the issues, as the same queries that caused a crash before works when i execute it manually.

Suggested fix:
None yet, mysqld should not crash.

Log showing the crash and recover

Attachment: mysql.log.1 (application/octet-stream, text), 2.70 KiB.

Please, send your my.cnf file content.

my.cnf from the affected server

Attachment: my.cnf (application/octet-stream, text), 4.75 KiB.

i have added the config file to the files

Looks like your settings are too high, both for per-thread and for some global buffers. Can you, please, check if crashes ever happens after the following changes in my.cnf:

tmp_table_size 	        = 16M # was 192M
sort_buffer_size        = 1M # was 8M
read_buffer_size        = 1M # was 2M
read_rnd_buffer_size    = 1M # was 16M 
join_buffer_size        = 1M # was 8M
max_heap_table_size     = 16M # was 64M

myisam_use_mmap = 0 # was 1

query_cache_size        = 128M # was 1024M, hardly anything > 128M ever makes sense...

All the above was on top of 25G just for InnoDB buffer pool, with up to 1000 concurrent connections allowed and with only 32G of RAM. Hardly your configuration was reasonable or robust. 

IMHO you were hit by some kind of out of memory condition. You may even want to check OS level logs for any evidence.

thanks for the hints:
as this is a running production system it may take some days until i can test the new settings.
I took over maintenance for this system just recently so not many checks were done on the system, it will only be online for about two more months

just a short note regarding the memory issues:
in systems logs there were no infos that the OOM killer startet his job, only visible thing was a segfault in libc

It seems like this was the issue. We had no more crash in the last days. If there should be crashes again I will reopnen this ticket.

For now I assume this problem was not caused by the bug, but by wrong configuration.