Bug #47733 CentOS 5.3 - MySQL suddenly crashes server (kernel panic)
Submitted: 29 Sep 2009 22:41 Modified: 3 Feb 2010 8:59
Reporter: William Sweat Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Errors Severity:S2 (Serious)
Version:5.0.45-7.el5 OS:Linux (CentOS 5.3)
Assigned to: CPU Architecture:Any

[29 Sep 2009 22:41] William Sweat
Description:
Hello,

I've been running MySQL 5.0 on a CentOS 5.3 system for a few months now.  Suddenly, the MySQL binary is crashing the system.  At the console, I am seeing this on the screen:
Pid: 16718, comm: mysqld Tainted: G     2.6.18-128.el5 #1
RIP: 0010:[<ffffffff882225c9>]  [<ffffffff882225c9>]
RSP: 0018:ffff8100ce75de18  EFLAGS:  00010297
RAX: 00002aad9a06e000 RBX:  00002aada46fb179  RCX:  0000000000000018
RDX: 0000000000000063 RSI:  0000000000000296  RDI:  ffff81082f90b16c
RBP: ffff8100ce75df78 R08:  0000000000000000  R09:  0000000000000000
R10: ffff8107ad9899a8 R11:  0000000000000000  R12:  00002aada46fb170
R13: 0000000048b71a90 R14:  0000000048b71a88  R15:  ffffffffffffffa0
FS:  0000000048b74940(0063)  GS: ffff81082fca53c0(0000)  knlGs: 0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aada9a06e000 CR3: 0000000678d7c000 CR4: 00000000000006e0
Process mysqld (pid: 16718, threadinfo ffff8100ce75c000, task ffff810052ec8080
Stack:  ffff8107ef8e1488 0000000000000241 0000000000000001 0000000080019919
00000000000000001 0000010008001a93e 00002aada46fb1a8 00000241014b8063
612f6636f72702f2f 00000000061616161 00002aad9a06c000 00002aad9a06e000
Call Trace:
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
 
 
 Code: 88 10 48 83 45 e0 01 48 83 45 d8 01 83 45 d0 01 8b 45 d0 3b
 RIP [<fffffff882225c9>]
  RSP <ffff8100ce75de18>
 CR2: 00002aad9a06e000
  <0>Kernel panic - not syncing: Fatal exception

== system info ==
The hardware is Intel Quad-core Xeon (2 CPUs) at 2.66, 32GB RAM, RAID 1 for the OS and RAID6 for the MySQL data.

Addional info:
getconf GNU_LIBPTHREAD_VERSION => NPTL 2.5
mysql  Ver 14.12 Distrib 5.0.45, for redhat-linux-gnu (x86_64) using readline 5.0

There's nothing out of the ordinary in the /var/log/messages or /var/log/mysqld.log

=== my.cnf ===
[client]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
[mysqld]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
skip-locking
key_buffer = 4000M
max_allowed_packet = 16M
table_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache = 8
query_cache_size = 32M
thread_concurrency = 16
bind-address=10.1.1.10
max_connections=1200
key_buffer_size=4000M
set-variable=tmp_table_size=268435456
thread_cache_size=400
old_passwords=1
expire_logs_days=3
skip_name_resolve
log-bin
server-id       = 1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[isamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout

What makes this strange is because this server is unable to run mysql, I moved the data to a spare server that was using MySQL 5 on Fedora Core 6 and there are no issues.

How to repeat:
Starting MySQL and let it run for 10 minutes

Suggested fix:
unsure
[30 Sep 2009 3:51] Valeriy Kravchuk
Thank you for the problem report. Please, send the error log of MySQL server. Check also in the logs if there are any traces of hardware failures.
[30 Sep 2009 4:56] William Sweat
I've uploaded the file.  Unfortunately, the system logs show no further errors at the moment.  I ordered RAM replacement for the server, as this may be faulty system memory.
[30 Sep 2009 6:06] Valeriy Kravchuk
I see only normal messages in the log (normal shutdowns among them)... When exactly that kernel panic happened?
[30 Sep 2009 15:49] William Sweat
The panic happened Sept 28 22:22:11 (PST).  These were the last 3 messages from /var/log/messages

Sep 28 22:21:33 lf-db0 snmpd[8907]: Connection from UDP: [10.1.1.45]:51223
Sep 28 22:21:33 lf-db0 last message repeated 43 times
Sep 28 22:22:11 lf-db0 kernel: Program Xnest tried to read /dev/mem between 0->8000000.

The last message in mysqld was 090922 8:14:23 in the attached log file.
[30 Sep 2009 16:05] Valeriy Kravchuk
Maybe this is a result of some Xnest bug then (if it runs as root)?
[1 Oct 2009 3:38] William Sweat
Screenshot of latest crash

Attachment: photo.jpg (image/jpeg, text), 246.12 KiB.

[1 Oct 2009 3:43] William Sweat
Hi Valeriy,

Unfortunately, the Xnest doesn't cause the issue.  I've run stress test with the old RAM, which revealed errors, and new RAM, that was error free.  The mysql server crashed immediately upon startup, I attached a screenshot.  This crash is different with the new RAM installed.

The server is stable when mysql is not running.

I also grabbed this from the mysqlbug util (may help):
C compiler:    gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14)
C++ compiler:  g++ (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14)
Environment:
        <machine, os, target, libraries (multiple lines)>
System: Linux lf-db0.adfirmative.com 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
Architecture: x86_64

Some paths:  /usr/bin/perl /usr/bin/make /usr/bin/gmake /usr/bin/gcc /usr/bin/cc
GCC: Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)
Compilation info: CC='gcc'  CFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fwrapv'  CXX='g++'  CXXFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fwrapv -fno-rtti -fno-exceptions'  LDFLAGS=''  ASFLAGS=''
LIBC:
lrwxrwxrwx 1 root root 11 May 15 20:18 /lib/libc.so.6 -> libc-2.5.so
-rwxr-xr-x 1 root root 1606808 Feb 18  2009 /lib/libc-2.5.so
-rw-r--r-- 1 root root 2811674 Jan 21  2009 /usr/lib/libc.a
-rw-r--r-- 1 root root 238 Jan 21  2009 /usr/lib/libc.so
Configure command: ./configure '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-readline' '--with-openssl' '--without-debug' '--enable-shared' '--with-bench' '--localstatedir=/var/lib/mysql' '--with-unix-socket-path=/var/lib/mysql/mysql.sock' '--with-mysqld-user=mysql' '--with-extra-charsets=all' '--with-innodb' '--with-berkeley-db' '--enable-local-infile' '--enable-largefile' '--enable-thread-safe-client' '--disable-dependency-tracking' '--with-named-thread-libs=-lpthread' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fwrapv' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fwrapv -fno-rtti -fno-exceptions' 'FFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'target_alias=x86_64-redhat-linux-gnu'

Unfortunately, CentOS (Redhat) MySQL isn't compiled with the '--without-debug', is there a way to do a debug using the precompiled binaries?

Thanks.
[7 Oct 2009 0:21] William Sweat
Update,

This is looking to be related to the irqbalance, acpid, and cpuspeed services that run on CentOS 5.3.  I have disabled the services and the errors have completely disappeared.  Still working on the root cause within mysql
[16 Oct 2009 0:36] William Sweat
It looks like replacing the RAM and reinstalling the OS corrected the issue.  We are doing more tests with replication, as it is was originally thought to cause the problem, and will keep this post updated.
[27 Nov 2009 9:07] Valeriy Kravchuk
Do you have any results from your testing to share? Do you still think that your problem was a result of some bug in MySQL code?
[30 Nov 2009 18:52] William Sweat
Unfortunately, the root problem is still unknown.  There seems to be a persistent symptom, which is a dead process will occur if 'show table status' is run against the database.  Unfortunately, it's inconsistent on the type of hardware and MySQL version that causes the error to occur and the data in the db's has been checked for any weird escaping characters.

mysql> show full processlist\G
*************************** 1. row ***************************
     Id: 259408465
   User: user
   Host: localhost
     db: some_database
Command: Killed
   Time: 1763129
  State: *** DEAD ***
   Info: SHOW TABLES LIKE 'some table'
*************************** 2. row ***************************
     Id: 260248953
   User: user
   Host: localhost
     db: another_database
Command: Query
   Time: 1706378
  State: *** DEAD ***
   Info: show tables
[24 Dec 2009 18:16] William Sweat
Just as a follow-up, we hired a MySQL guru to help look at the problem.

He found that the thread is killed when running 'show tables from...', here is the strace he did:
if run strace on mysqld, that we can see that thread gets killed 
 
31307 getdents(28,  <unfinished ...>
31307 <... getdents resumed> /* 85 entries */, 4096) = 4080
31307 getdents(28,  <unfinished ...>
31307 <... getdents resumed> /* 85 entries */, 4096) = 4080
31307 getdents(28,  <unfinished ...>
31307 +++ killed by SIGKILL +++

The only solution seems to be upgrading to CentOS 5.4 (although 5.3 seems stable).
[3 Feb 2010 8:59] Sveta Smirnova
Thank you for the feedback.

> It looks like replacing the RAM and reinstalling the OS corrected the issue. 
...
> The only solution seems to be upgrading to CentOS 5.4 (although 5.3 seems stable).

This shows this is not MySQL bug, so closing it as such.