Bug #47733 | CentOS 5.3 - MySQL suddenly crashes server (kernel panic) | ||
---|---|---|---|
Submitted: | 29 Sep 2009 22:41 | Modified: | 3 Feb 2010 8:59 |
Reporter: | William Sweat | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Errors | Severity: | S2 (Serious) |
Version: | 5.0.45-7.el5 | OS: | Linux (CentOS 5.3) |
Assigned to: | CPU Architecture: | Any |
[29 Sep 2009 22:41]
William Sweat
[30 Sep 2009 3:51]
Valeriy Kravchuk
Thank you for the problem report. Please, send the error log of MySQL server. Check also in the logs if there are any traces of hardware failures.
[30 Sep 2009 4:56]
William Sweat
I've uploaded the file. Unfortunately, the system logs show no further errors at the moment. I ordered RAM replacement for the server, as this may be faulty system memory.
[30 Sep 2009 6:06]
Valeriy Kravchuk
I see only normal messages in the log (normal shutdowns among them)... When exactly that kernel panic happened?
[30 Sep 2009 15:49]
William Sweat
The panic happened Sept 28 22:22:11 (PST). These were the last 3 messages from /var/log/messages Sep 28 22:21:33 lf-db0 snmpd[8907]: Connection from UDP: [10.1.1.45]:51223 Sep 28 22:21:33 lf-db0 last message repeated 43 times Sep 28 22:22:11 lf-db0 kernel: Program Xnest tried to read /dev/mem between 0->8000000. The last message in mysqld was 090922 8:14:23 in the attached log file.
[30 Sep 2009 16:05]
Valeriy Kravchuk
Maybe this is a result of some Xnest bug then (if it runs as root)?
[1 Oct 2009 3:38]
William Sweat
Screenshot of latest crash
Attachment: photo.jpg (image/jpeg, text), 246.12 KiB.
[1 Oct 2009 3:43]
William Sweat
Hi Valeriy, Unfortunately, the Xnest doesn't cause the issue. I've run stress test with the old RAM, which revealed errors, and new RAM, that was error free. The mysql server crashed immediately upon startup, I attached a screenshot. This crash is different with the new RAM installed. The server is stable when mysql is not running. I also grabbed this from the mysqlbug util (may help): C compiler: gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14) C++ compiler: g++ (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14) Environment: <machine, os, target, libraries (multiple lines)> System: Linux lf-db0.adfirmative.com 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux Architecture: x86_64 Some paths: /usr/bin/perl /usr/bin/make /usr/bin/gmake /usr/bin/gcc /usr/bin/cc GCC: Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20080704 (Red Hat 4.1.2-44) Compilation info: CC='gcc' CFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fwrapv' CXX='g++' CXXFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fwrapv -fno-rtti -fno-exceptions' LDFLAGS='' ASFLAGS='' LIBC: lrwxrwxrwx 1 root root 11 May 15 20:18 /lib/libc.so.6 -> libc-2.5.so -rwxr-xr-x 1 root root 1606808 Feb 18 2009 /lib/libc-2.5.so -rw-r--r-- 1 root root 2811674 Jan 21 2009 /usr/lib/libc.a -rw-r--r-- 1 root root 238 Jan 21 2009 /usr/lib/libc.so Configure command: ./configure '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-readline' '--with-openssl' '--without-debug' '--enable-shared' '--with-bench' '--localstatedir=/var/lib/mysql' '--with-unix-socket-path=/var/lib/mysql/mysql.sock' '--with-mysqld-user=mysql' '--with-extra-charsets=all' '--with-innodb' '--with-berkeley-db' '--enable-local-infile' '--enable-largefile' '--enable-thread-safe-client' '--disable-dependency-tracking' '--with-named-thread-libs=-lpthread' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fwrapv' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fwrapv -fno-rtti -fno-exceptions' 'FFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'target_alias=x86_64-redhat-linux-gnu' Unfortunately, CentOS (Redhat) MySQL isn't compiled with the '--without-debug', is there a way to do a debug using the precompiled binaries? Thanks.
[7 Oct 2009 0:21]
William Sweat
Update, This is looking to be related to the irqbalance, acpid, and cpuspeed services that run on CentOS 5.3. I have disabled the services and the errors have completely disappeared. Still working on the root cause within mysql
[16 Oct 2009 0:36]
William Sweat
It looks like replacing the RAM and reinstalling the OS corrected the issue. We are doing more tests with replication, as it is was originally thought to cause the problem, and will keep this post updated.
[27 Nov 2009 9:07]
Valeriy Kravchuk
Do you have any results from your testing to share? Do you still think that your problem was a result of some bug in MySQL code?
[30 Nov 2009 18:52]
William Sweat
Unfortunately, the root problem is still unknown. There seems to be a persistent symptom, which is a dead process will occur if 'show table status' is run against the database. Unfortunately, it's inconsistent on the type of hardware and MySQL version that causes the error to occur and the data in the db's has been checked for any weird escaping characters. mysql> show full processlist\G *************************** 1. row *************************** Id: 259408465 User: user Host: localhost db: some_database Command: Killed Time: 1763129 State: *** DEAD *** Info: SHOW TABLES LIKE 'some table' *************************** 2. row *************************** Id: 260248953 User: user Host: localhost db: another_database Command: Query Time: 1706378 State: *** DEAD *** Info: show tables
[24 Dec 2009 18:16]
William Sweat
Just as a follow-up, we hired a MySQL guru to help look at the problem. He found that the thread is killed when running 'show tables from...', here is the strace he did: if run strace on mysqld, that we can see that thread gets killed 31307 getdents(28, <unfinished ...> 31307 <... getdents resumed> /* 85 entries */, 4096) = 4080 31307 getdents(28, <unfinished ...> 31307 <... getdents resumed> /* 85 entries */, 4096) = 4080 31307 getdents(28, <unfinished ...> 31307 +++ killed by SIGKILL +++ The only solution seems to be upgrading to CentOS 5.4 (although 5.3 seems stable).
[3 Feb 2010 8:59]
Sveta Smirnova
Thank you for the feedback. > It looks like replacing the RAM and reinstalling the OS corrected the issue. ... > The only solution seems to be upgrading to CentOS 5.4 (although 5.3 seems stable). This shows this is not MySQL bug, so closing it as such.