Bug #2332 mysqld crash related to SMP system
Submitted: 9 Jan 2004 4:08 Modified: 14 Jan 2004 7:00
Reporter: [ name withheld ] Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.0.16 OS:Linux (Linux Gentoo)
Assigned to: Dean Ellis CPU Architecture:Any

[9 Jan 2004 4:08] [ name withheld ]
Description:
We have discovered a very serious bug in mysqld running on a SMP compiled 
kernel (2.4.22-gentoo-r2) which causes it to crash. The machine is a dual 
Pentium XEON, with hyper threading. The bug is not present when running on a 
none SMP compiled kernel. The SQL server is used in a cluster with 8 servers. 
 
We discovered the crash after replacing the SQL server, and quickly had to 
revert back to the old server. The crash occurs multiple times over and over 
again, during a brief test period the crash happend three times within 10 
seconds, so we can easily reproduce it. However we have no clue, what causes 
the crash in the first place. 
 
Further more, the bug does not occur when compiling mysql with debug enabled. 
Use flag +debug when emerging in Gentoo. 
 
Mysqld receives a signal 11, the backtrace is: 
0x810c39c handle_segfault + 472 
0x40020758 _end + 935053616 
0x4007c083 _end + 935428699 
0x4007bbf2 _end + 935427530 
0x4007b31d _end + 935425269 
0x4007b13b _end + 935424787 
0x810de24 handle_connections_sockets + 1002 
0x810d69e main + 3628 
0x402c38dc _end + 937818804 
0x80c0bf1 _start + 33 
 
The crash does not seem to be related to any specific query, but rather load 
comming from multiple different servers. The server does _not_ crash when 
running without any load. 
 
If anymore information is needed we will be happy to help. 

How to repeat:
There is no specific steps to reproduce the crash. 

Suggested fix:
None, posibly work around is compile with debug enabled.
[9 Jan 2004 5:00] Alexander Keremidarski
Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.mysql.com/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.

Additional info:

Please provde more information.

Do you use MySQL binary disztribution? If no please try repeating same with our binary.

If you have core file or backtrace?
[9 Jan 2004 5:04] Alexander Keremidarski
In my last comment 

s/If you have/Do you have
[9 Jan 2004 5:09] [ name withheld ]
Okay, it's a Gentoo distro so it's a source distribution of MySQL, we have no 
core dump (none is made) and the backtrace was included (if not, what 
backtrace are you talking about?).
[9 Jan 2004 5:12] [ name withheld ]
And I can't change the status back to open...
[9 Jan 2004 5:41] Alexander Keremidarski
You need to add --core-file to mysqld startup options.

About backtrace - sorry. I meant we need resolved backtrace if it is not our binary. For our binaries we have symbol files so we can attempt resolve, but as your mysqld is self-compiled I can only ask you to follow instructions at:

http://www.mysql.com/doc/en/Using_stack_trace.html

The fact that bug happens with debug mysqld only is one more clue that something is wrong with build environment (glibc, kernel or some ./configure options)

Please test with our binary. Download mysql-standard-4.0.17-pc-linux-i686.tar.gz and try repeating the crash with it.
[9 Jan 2004 5:53] [ name withheld ]
I did resolv it? I followed the instructions on your site: 
 
sql2 tmp # resolve_stack_dump -s mysql.sym -n mysql.err 
0x810c39c handle_segfault + 472 
0x40020758 _end + 935053616 
0x4007c083 _end + 935428699 
0x4007bbf2 _end + 935427530 
0x4007b31d _end + 935425269 
0x4007b13b _end + 935424787 
0x810de24 handle_connections_sockets + 1002 
0x810d69e main + 3628 
0x402c38dc _end + 937818804 
0x80c0bf1 _start + 33 
 
That was the output I got... Maybe it's the wrong symbols? (got them 
from /usr/sbin/mysqld) 
 
I'll se to it that we get a core file. 
 
It's not that easy to get a binary version installed in a Gentoo 
environment... Actually I'm not sure how we could do it.
[9 Jan 2004 5:56] [ name withheld ]
Symbol table from the mysqld, can you verify if it's correct?

Attachment: mysql.sym.gz (application/x-gzip, text), 99.60 KiB.

[9 Jan 2004 6:01] [ name withheld ]
Also I think you misunderstood my report, the crash happens _without_ debug. 
 
When we compile it _with_ debug, the crash do _not_ happen.
[9 Jan 2004 6:43] [ name withheld ]
I'm stuck in getting a core file?! I've passed the --core-file parameter to 
mysqld and set the ulimit -c to 1000000 (unlimited) for the shell, and in the 
error log I get: 
Writing a core file 
 
But where is it?! I've tried find -name core from / but found nothing!?
[9 Jan 2004 7:05] Lenz Grimmer
Some systems add the process ID to the core file name, e.g. core.3224 or 
similar. The core should usually be in the same directory where mysqld has 
been installed to. Also make sure that the ulimit has really been raised for 
mysqld as well - you can do this by passing "--core-file-size=<size>" to the 
mysqld_safe script (<size> is being passed to "ulimit -c" in the script).
[10 Jan 2004 2:30] [ name withheld ]
Hmm, I'm pretty sure I did not find any core file in the mysqld directory, and I did pass --core-file-size=10000000 to the mysqld_safe script, I'll have a look at it when I get back to work monday.
[10 Jan 2004 12:18] MySQL Verification Team
I have a Linux SMP system quite similar to yours. 

I have a MySQL binary built with all optimisaiotns but also with enough 
debugging symbols to diagnose the bug.

We need the exact query (or combo thereof) that crashes the server in order
to fix a bug.

Try to find core file with this command (run as root);

find / -name 'core*'

it should be named just core or core.number.

If you can't find it, try to set ulimit -c unlimited and try to  repeat a 
core dump.
[12 Jan 2004 3:01] [ name withheld ]
Okay, I _still_ can't get it to make a core file, even when running mysqld as 
root from command line with ulimit -c unlimited! 
 
However I got it to crash in gdb, and I got the following backtrace: 
(gdb) backtrace full 
#0  0x4007c083 in get_field () from /usr/lib/libwrap.so.0 
No symbol table info available. 
#1  0x4007bbf2 in process_options () from /usr/lib/libwrap.so.0 
No symbol table info available. 
#2  0x4007b31d in table_match () from /usr/lib/libwrap.so.0 
No symbol table info available. 
#3  0x4007b15b in hosts_access () from /usr/lib/libwrap.so.0 
No symbol table info available. 
#4  0x0810de24 in handle_connections_sockets () 
No symbol table info available. 
#5  0x0810d69e in main () 
No symbol table info available. 
#6  0x402c38dc in __libc_start_main () from /lib/libc.so.6 
No symbol table info available. 
 
Is this of any help?!
[12 Jan 2004 3:21] Martin Mokrejs
Read my comments to the manual at http://www.mysql.com/doc/en/Using_gdb_on_mysqld.html and http://www.mysql.com/doc/en/Using_stack_trace.html
[12 Jan 2004 6:09] MySQL Verification Team
Are you using tcpd ???

Who has built mysqld with libwrap ???

And does it happen with access from other host or from the localhost ??
[12 Jan 2004 7:40] [ name withheld ]
> Are you using tcpd ??? 
 
Yes 
 
> Who has built mysqld with libwrap ??? 
 
Gentoo source 
 
> And does it happen with access from other host or from the localhost ?? 
 
It's from other hosts, as I stated initialy the server is part of a web 
cluster of 9 web servers (seperate machines and IP adresses) that connect to 
the SQL server.
[12 Jan 2004 11:59] MySQL Verification Team
I have tested this on SMP system with tcpd enabled and with several remote 
connections established / refused ...

Build was with maximum optimisations and all worked fine.

A conclusion is that this is a consequence of the bad build.
[12 Jan 2004 13:31] Dean Ellis
Did you use the MySQL source distribution or an ebuild?  If an ebuild, which one specifically, and what CFLAGS/CXXFLAGS are you using?
[12 Jan 2004 23:27] [ name withheld ]
It's compiled from an ebuild 
 
dev-db/mysql-4.0.16 (mysql-4.0.16.ebuild) 
 
CFLAGS and CXXFLAGS are the same: 
CFLAGS="-O3 -march=pentium4 -mcpu=pentium4 -funroll-loops 
-fprefetch-loop-arrays -pipe"
[14 Jan 2004 7:00] Dean Ellis
Bad build, as above.  (Identified via email discussion.)