Bug #24611 mysqld crashes when connecting from remote host, and compiled from source
Submitted: 27 Nov 2006 9:02 Modified: 12 Jun 2007 18:35
Reporter: Morgan Tocker Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Compiling Severity:S1 (Critical)
Version:5.0.27 OS:Linux (Linux, Ubuntu)
Assigned to: Alexey Kopytov CPU Architecture:Any

[27 Nov 2006 9:02] Morgan Tocker
Description:
When compiling from source, MySQL crashes with the following stack trace:

mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=1
max_connections=100
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x8a247f0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbedff358, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:

(inserting resolved strace)

0x80dbe3a handle_segfault + 462
0x8382252 __pthread_sighandler + 158
0xb6b2d3d8 __stop___libc_freeres_ptrs + -1369693192
0xb6b1c0bf __stop___libc_freeres_ptrs + -1369763617
0xb6b1c17d __stop___libc_freeres_ptrs + -1369763427
0xb6bfa285 __stop___libc_freeres_ptrs + -1368853851
0xb6bfb424 __stop___libc_freeres_ptrs + -1368849340
0x83b96e5 __new_gethostbyaddr_r + 309
0x83b9515 gethostbyaddr + 141
0x80e25d8 _Z14ip_to_hostnameP7in_addrPj + 272
0x80eed90 _Z16check_connectionP3THD + 348
0x80fe6bb handle_one_connection + 381
0x837d15f pthread_start_thread + 211
0x83b72aa __clone + 106

Note: I am not using a my.cnf file, just running the defaults.

Information about my environment is:

Ubuntu Dapper (6.06)

morgo@morguntu:~$ uname -a
Linux morguntu 2.6.15-27-386 #1 PREEMPT Sat Sep 16 01:51:59 UTC 2006 i686 GNU/Linux
morgo@morguntu:~$ gcc --version
gcc (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

morgo@morguntu:~$ getconf GNU_LIBC_VERSION
glibc 2.3.6

How to repeat:
1) Compile

shell> CFLAGS='-g' CXXFLAGS='-g' ./configure \
--prefix=/home/morgo/mysql/install \
--with-client-ldflags=-all-static \
--with-mysqld-ldflags=-all-static

2) Startup
shell> bin/mysqld_safe # you can add --thread-stack=256k, it makes no difference.

3) Connect from a remote host (Note: It must be a remote host, not connecting locally via TCP/IP).

4) Watch it fail immediately on connect:
shell> mysql -h 192.168.0.127
ERROR 2013 (HY000): Lost connection to MySQL server during query

The workaround is to start mysqld with --skip-name-resolve.
[27 Nov 2006 9:37] MySQL Verification Team
just noting I've seen this crash a few times on my suse 9.3, but typically only under higher load.

sbester@linux:~> getconf GNU_LIBC_VERSION
glibc 2.3.4
sbester@linux:~> uname -a
Linux linux 2.6.11.4-21.7-smp #1 SMP Thu Jun 2 14:23:14 UTC 2005 i686 i686 i386 GNU/Linux
[16 Jan 2007 11:25] Alexey Kopytov
gdb fails to work with binaries compiled with -all-static. Here is the resolved stacktrace on my machine:

0x80db41e handle_segfault + 440
0x837b03d __pthread_sighandler + 105
(nil)
0xb5757d1e _end + -1390444258
0xb5757df1 _end + -1390444047
0x83d9a90 call_init + 204
0x83d9b2f _dl_init + 108
0x83b3793 dl_open_worker + 653
0x83d9911 _dl_catch_error + 100
0x83b3d1f _dl_open + 149
0x83b4f67 do_dlopen + 29
0x83d9911 _dl_catch_error + 100
0x83b4f24 dlerror_run + 32
0x83b50d2 __libc_dlopen_mode + 31
0x83af753 __nss_lookup_function + 611
0x83af852 __nss_lookup + 38
0x83d2b92 __nss_hosts_lookup + 98
0x83b0c96 gethostbyaddr_r + 518
0x83b09f5 gethostbyaddr + 141
0x80e182b _Z14ip_to_hostnameP7in_addrPj + 327
0x80fcb41 _Z16check_connectionP3THD + 343
0x80fd4dd handle_one_connection + 375
0x8376974 pthread_start_thread + 480
0x83aebca clone + 106

Note that despite -all-static the resulting binary is not actually static and depends on external libnss. strace output:

set_thread_area({entry_number:6, base_addr:0x8953580, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, use
getpid()                                = 22466
rt_sigprocmask(SIG_SETMASK, [HUP INT QUIT PIPE ALRM TERM TSTP RTMIN], NULL, 8) = 0
sched_setscheduler(22466, SCHED_OTHER, { 0 }) = 0
time(NULL)                              = 1168946526
rt_sigprocmask(SIG_UNBLOCK, [], [HUP INT QUIT PIPE ALRM TERM TSTP RTMIN], 8) = 0
getpeername(30, {sa_family=AF_INET, sin_port=htons(45193), sin_addr=inet_addr("192.168.0.101")}, [16]) = 0
fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 10), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb58cf000
socket(PF_FILE, SOCK_STREAM, 0)         = 31
fcntl64(31, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(31, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(31, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(31)                               = 0
socket(PF_FILE, SOCK_STREAM, 0)         = 31
fcntl64(31, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(31, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(31, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(31)                               = 0
open("/etc/nsswitch.conf", O_RDONLY)    = 31
fstat64(31, {st_mode=S_IFREG|0644, st_size=503, ...}) = 0
mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb58af000
read(31, "# /etc/nsswitch.conf:\n# $Header:"..., 131072) = 503
read(31, "", 131072)                    = 0
close(31)                               = 0
munmap(0xb58af000, 131072)              = 0
open("/etc/ld.so.cache", O_RDONLY)      = 31
fstat64(31, {st_mode=S_IFREG|0644, st_size=100816, ...}) = 0
mmap2(NULL, 100816, PROT_READ, MAP_PRIVATE, 31, 0) = 0xb58b6000
close(31)                               = 0
open("/lib/libnss_files.so.2", O_RDONLY) = 31
read(31, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\33\0\000"..., 512) = 512
fstat64(31, {st_mode=S_IFREG|0755, st_size=35284, ...}) = 0
mmap2(NULL, 37512, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 31, 0) = 0xb58ac000
madvise(0xb58ac000, 37512, MADV_SEQUENTIAL|0x1) = 0
mmap2(0xb58b4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 31, 0x7) = 0xb58b4000
close(31)                               = 0
open("/lib/tls/libc.so.6", O_RDONLY)    = 31
read(31, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\266O\1"..., 512) = 512
fstat64(31, {st_mode=S_IFREG|0755, st_size=1190424, ...}) = 0
mmap2(NULL, 1133788, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 31, 0) = 0xb5797000
madvise(0xb5797000, 1133788, MADV_SEQUENTIAL|0x1) = 0
mmap2(0xb58a6000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 31, 0x10e) = 0xb58a6000
mmap2(0xb58aa000, 7388, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb58aa000
close(31)                               = 0
open("/lib/ld-linux.so.2", O_RDONLY)    = 31
read(31, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\7\0"..., 512) = 512
fstat64(31, {st_mode=S_IFREG|0755, st_size=104031, ...}) = 0
mmap2(NULL, 91360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 31, 0) = 0xb69ca000
madvise(0xb69ca000, 91360, MADV_SEQUENTIAL|0x1) = 0
mmap2(0xb69df000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 31, 0x14) = 0xb69df000
close(31)                               = 0
mprotect(0xb69df000, 4096, PROT_READ)   = 0
mprotect(0xb58a6000, 4096, PROT_READ)   = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
[6 Apr 2007 17:28] Alexey Kopytov
The reason for crash is the conflict between LinuxThreads and NPTL. When building with -all-static, some (or all) Linux distributions default to LinuxThreads instead of NPTL. The problems start when a statically built binary uses some libnss functions (and therefore, is not actually static). libnss does not deal with threads in itself, but it calls libc functions, which are loaded by dynamic linker from a NPTL version of libc.

Here is the snippet from the strace output:

open("/lib/libnss_files.so.2", O_RDONLY) = 31
read(31, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\33\0\000"..., 512) =
512
fstat64(31, {st_mode=S_IFREG|0755, st_size=35284, ...}) = 0
mmap2(NULL, 37512, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 31, 0) =
0xb58ac000
madvise(0xb58ac000, 37512, MADV_SEQUENTIAL|0x1) = 0
mmap2(0xb58b4000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 31, 0x7) = 0xb58b4000
close(31)                               = 0
open("/lib/tls/libc.so.6", O_RDONLY)    = 31

On my system /lib/libc.so.6 is a LT version of libc, and /lib/tls/libc.so.6 is a NPTL one:

$ eu-readelf -n /lib/libc.so.6

Note segment of 32 bytes at offset 0x194:
  Owner          Data size  Type
  GNU                   16  VERSION
    OS: Linux, ABI: 2.4.1

$ eu-readelf -n /lib/tls/libc.so.6

Note segment of 32 bytes at offset 0x194:
  Owner          Data size  Type
  GNU                   16  VERSION
    OS: Linux, ABI: 2.6.9

I see to possible ways to fix this:

1. When building a static binary, force linking with NPTL on those platforms that support it.

I have adding an explicit "-L /usr/lib/tls" to LDFLAGS, and the resulting binary does not crash.

2. Force using LinuxThreads at runtime for statically linked binaries.

Exporting "LD_ASSUME_KERNEL=2.4.1" prevent the dynamic linker from using /lib/tls/* libraries, so the crash does not occur.
[22 May 2007 16:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/27147

ChangeSet@1.2494, 2007-05-22 20:39:44+04:00, kaa@polly.local +1 -0
  Fix for bug #24611 "mysqld crashes when connecting from remote host, and compiled from source".
  
  On some Linux distributions with both LinuxThreads and NPTL glibc versions available, statically built binaries can crash, because linker defaults to LinuxThreads when linking statically, but calls to external libraries (like libnss) are resolved to NPTL versions.
  
  Since there is nothing we can do in the code to work that around, just give user an advice on how to fix that, if a crash happened on such a binary/OS combination.
[25 May 2007 16:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/27352

ChangeSet@1.2494, 2007-05-25 20:52:01+04:00, kaa@polly.local +1 -0
  Fix for bug #24611 "mysqld crashes when connecting from remote host, and compiled from source".
  
  On some Linux distributions with both LinuxThreads and NPTL glibc versions available, statically built binaries can crash, because linker defaults to LinuxThreads when linking statically, but calls to external libraries (like libnss) are resolved to NPTL versions.
  
  Since there is nothing we can do in the code to work that around, just give user an advice on how to fix that, if a crash happened on such a binary/OS combination.
[31 May 2007 13:46] Nicklas Bondesson
I have tried adding an explicit "-L/lib/tls" to LDFLAGS, and then recompile mysql (5.0.41). The server does not report a crash in the logs, however it fails to start.

See notes on this bug: http://bugs.mysql.com/bug.php?id=27168

What can I do next to resolv this issue?
[31 May 2007 14:23] Alexey Kopytov
Nicklas,

The "-L/lib/tls" trick does not work on all Linux distributions. Gentoo allows static linking with NPTL, Debian-based distributions do not. That's why the suggested workaround (see the patch for this bug) mentions only LD_ASSUME_KERNEL. Try "export LD_ASSUME_KERNEL=2.4.1" before starting the server.
[1 Jun 2007 18:47] Nicklas Bondesson
Thanks for your reply.

Correct me if I'm wrong here, but setting this env variable switches to using LinuxThreads and not NPTL. Since I'm on a 2.6.x kernel NPTL should be used, since the kernel is "optimized" for this threading model.

If --with-mysqld-ldflags=-all-static is left out I get a working binary. But I have read that compiling mysql staticly will produce a 10%+ faster binary. Is this actually true in every case?

I have tried compiling without --with-mysqld-ldflags=-all-static and exporting LD_ASSUME_KERNEL="2.4.1" the server still crashes under Debian (etch). It uses LinuxThreads (i see multiple processes with ps -ef). Same thing if I don't set LD_ASSUME_KERNEL.

My second attempt was to staticly build mysql (5.0.41) on a Debian (sarge) installation. This went fine, it went on using LinuxThreads (ps -ef gives multiple processes).

Something must have changed between the two different versions of libc, 2.3.2 on sarge and 2.3.6 on etch.

Any new ideas?

Thanks
Nicklas
[6 Jun 2007 16:55] Bugs System
Pushed into 5.1.20-beta
[6 Jun 2007 16:58] Bugs System
Pushed into 5.0.44
[8 Jun 2007 14:11] Trudy Pelzer
Restoring original tagged bug priority.
[12 Jun 2007 18:35] Paul DuBois
Noted in 5.0.44, 5.1.20 changelogs.