MySQL Bugs: #29254: MySQL crashes frequently

Bug #29254	MySQL crashes frequently
Submitted:	20 Jun 2007 22:10	Modified:	23 Jul 2007 9:14
Reporter:	Ryan Ordway	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server	Severity:	S1 (Critical)
Version:	5.0.22	OS:	Linux (CentOS 5)
Assigned to:		CPU Architecture:	Any

Description:
I have been fighting with this issue for quite some time now. 

Every so often, at least once per day MySQL will crash. It seems to come back up
ok, but the database tables have to be repaired each time as some minor damage happens each time the DB crashes.

This is a stock CentOS 5 mysql package, which is based on MySQL 5.0.22. I have not made any changes to the default RPM shipped with CentOS. FYI, I have also tried running with a similar configuration using 5.1.x with the same crashes.

Both database servers are Sun Fire V40z servers with a pair of AMD Opteron 885 dual core processors and 8GB of RAM. My databases and logs are all on SAN-attached storage, in the case of these databases that is a Hitachi 9570V box with about 4TB of storage. Both systems are using ext3 filesystems on LVM volumes on the SAN LUNs.

I had previously been running in a master-master cross-replication setup with each master being a master of the other and a slave of the other. This doesn't seem to be the problem. I have disabled the replication and turned one system off and the problem persists.

Last night I completely rebuilt all of the databases, doing mysqldumps to file, wiping out all of the databases and rebuilding from the dump file. This doesn't seem to have helped.

Nothing seems to be causing it. I have turned on query logging to see if I can track down which query might be killing mysqld. 

Here is what shows up in the error log:

mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=536870912
read_buffer_size=8384512
max_used_connections=38
max_connections=200
threads_connected=38
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 3800286 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Number of processes running now: 0
070620 13:18:10  mysqld restarted
070620 13:18:10  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
070620 13:18:10  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 174234649.
InnoDB: Doing recovery: scanned up to log sequence number 0 174234649
InnoDB: Last MySQL binlog file position 0 21600151, file name /san/lun02/log/mysql-bin.000015
070620 13:18:11  InnoDB: Started; log sequence number 0 174234649
070620 13:18:11 [Note] Recovering after a crash using /san/lun02/log/mysql-bin
070620 13:18:11 [Note] Starting crash recovery...
070620 13:18:11 [Note] Crash recovery finished.
070620 13:18:13 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.22-log'  socket: '/san/lun01/mysql/mysql.sock'  port: 3306  Source distribution

Here is my my.cnf:

[mysqld]
port                    = 3306
datadir                 = /san/lun01/mysql
socket                  = /san/lun01/mysql/mysql.sock
pid-file                = /san/lun02/log/mysql.pid
key_buffer              = 512M
max_allowed_packet      = 16M
table_cache             = 2048
sort_buffer_size        = 8M
join_buffer_size        = 8M
read_buffer_size        = 8M
read_rnd_buffer_size    = 32M
myisam_sort_buffer_size = 64M
thread_cache_size       = 8
query_cache_size        = 64M
query_cache_limit       = 2M
binlog_cache_size       = 1M
max_heap_table_size     = 64M
ft_min_word_len         = 2
default_table_type      = MyISAM
thread_stack            = 192K
tmp_table_size          = 64M

# Try number of CPU's*2 for thread_concurrency
thread_concurrency      = 8
back_log                = 50
max_connections         = 200
memlock

#InnoDB
innodb
innodb_data_home_dir     = /san/lun01/ibdata
innodb_log_group_home_dir = /san/lun01/ibdata
innodb_additional_mem_pool_size = 16M
innodb_buffer_pool_size  = 2G
innodb_data_file_path    = ibdata1:10M:autoextend
innodb_file_io_threads   = 4
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size   = 8M
innodb_log_file_size     = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 120
innodb_fast_shutdown=2

# Clustering
#ndbcluster
server-id                = 2
log-bin                  = /san/lun02/log/mysql-bin
log-bin-index            = /san/lun02/log/mysql-bin.index
#log-output              = TABLE,FILE
#log                     = /san/lun02/log/hermes-query.log
log-error                = /san/lun02/log/hermes-error.log
log-slow-queries         = /san/lun02/log/hermes-slow.log
long_query_time          = 2
log_long_format
log_warnings
relay-log                = /san/lun02/log/relay-log
relay-log-info-file      = /san/lun02/log/relay-log.info
relay-log-index          = /san/lun02/log/relay-log.index
master-info-file         = /san/lun02/log/master.info
auto_increment_increment = 5
auto_increment_offset    = 2
report-host              = hermes
master-host              = 192.168.10.20
master-user              = replicate
master-password          = <replication password>
replicate-same-server-id = 0
log-slave-updates

############################################################################
[mysqld_safe]
pid-file                 = /san/lun02/log/mysql.pid
err-log                  = /san/lun02/log/hermes-error.log

############################################################################

[mysql.server]
user=mysql
basedir=/san/lun02

############################################################################

[mysqldump]
quick
max_allowed_packet = 16M

############################################################################

[mysql]
no-auto-rehash

############################################################################

# The following options will be passed to all MySQL clients
[client]
port            = 3306
socket          = /san/lun01/mysql/mysql.sock

############################################################################

[isamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

############################################################################

[myisamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

############################################################################

[mysqlhotcopy]
interactive-timeout

############################################################################

How to repeat:
Wait a few hours and watch it crash again.

Thank you for the report.

There is error in the error log: "It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 3800286 K" Please be sure you have physical RAM more than 3800286K (~3G). Otherwise you should optimize settings before reporting a bug.

Also version 5.0.22 is quite old. Please upgrade to current 5.0.41 version, try with it and say us result.

These systems both have 8GB of RAM plus another 8GB of swap. Right now mysqld is using about 3.2GB of RAM and there is over 4 GB available. 

I would like to avoid moving to a non-CentOS MySQL package if possible. Do you think that upgrading to the latest 5.0 package will fix this issue?

Thanks,

Ryan

Thank you for the feedback.

> Do you think that upgrading to the latest 5.0 package will fix this issue?

You haven't provided repeatable test case for this report. So we can't see what is the real cause of such behaviour; version 5.0.22 has been released more than year ago and many similar bugs have been fixed since. So I'd recommend you to upgrade first.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I have upgraded to the latest in the 5.0 series, 5.0.41 at the time I downloaded it. I've been running it live for nearly a month now, and it seems to be much more stable. Do you have any ideas what may have been causing it? It would be helpful for downstream users, especially Enterprise Linux users (Redhat, SuSE, etc.) to get a patch that can be included with the vendor package.

In my case, my issue is fixed. Go ahead and close the ticket.

Thanks!

Ryan

Ryan,

thank you for the feedback.

It is hard to say which bug fix fixed your issue, because description is not clear and many bugs have been fixed since version 5.0.22. So I simple close this report.