Bug #24611 | mysqld crashes when connecting from remote host, and compiled from source | ||
---|---|---|---|
Submitted: | 27 Nov 2006 9:02 | Modified: | 12 Jun 2007 18:35 |
Reporter: | Morgan Tocker | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Compiling | Severity: | S1 (Critical) |
Version: | 5.0.27 | OS: | Linux (Linux, Ubuntu) |
Assigned to: | Alexey Kopytov | CPU Architecture: | Any |
[27 Nov 2006 9:02]
Morgan Tocker
[27 Nov 2006 9:37]
MySQL Verification Team
just noting I've seen this crash a few times on my suse 9.3, but typically only under higher load. sbester@linux:~> getconf GNU_LIBC_VERSION glibc 2.3.4 sbester@linux:~> uname -a Linux linux 2.6.11.4-21.7-smp #1 SMP Thu Jun 2 14:23:14 UTC 2005 i686 i686 i386 GNU/Linux
[16 Jan 2007 11:25]
Alexey Kopytov
gdb fails to work with binaries compiled with -all-static. Here is the resolved stacktrace on my machine: 0x80db41e handle_segfault + 440 0x837b03d __pthread_sighandler + 105 (nil) 0xb5757d1e _end + -1390444258 0xb5757df1 _end + -1390444047 0x83d9a90 call_init + 204 0x83d9b2f _dl_init + 108 0x83b3793 dl_open_worker + 653 0x83d9911 _dl_catch_error + 100 0x83b3d1f _dl_open + 149 0x83b4f67 do_dlopen + 29 0x83d9911 _dl_catch_error + 100 0x83b4f24 dlerror_run + 32 0x83b50d2 __libc_dlopen_mode + 31 0x83af753 __nss_lookup_function + 611 0x83af852 __nss_lookup + 38 0x83d2b92 __nss_hosts_lookup + 98 0x83b0c96 gethostbyaddr_r + 518 0x83b09f5 gethostbyaddr + 141 0x80e182b _Z14ip_to_hostnameP7in_addrPj + 327 0x80fcb41 _Z16check_connectionP3THD + 343 0x80fd4dd handle_one_connection + 375 0x8376974 pthread_start_thread + 480 0x83aebca clone + 106 Note that despite -all-static the resulting binary is not actually static and depends on external libnss. strace output: set_thread_area({entry_number:6, base_addr:0x8953580, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, use getpid() = 22466 rt_sigprocmask(SIG_SETMASK, [HUP INT QUIT PIPE ALRM TERM TSTP RTMIN], NULL, 8) = 0 sched_setscheduler(22466, SCHED_OTHER, { 0 }) = 0 time(NULL) = 1168946526 rt_sigprocmask(SIG_UNBLOCK, [], [HUP INT QUIT PIPE ALRM TERM TSTP RTMIN], 8) = 0 getpeername(30, {sa_family=AF_INET, sin_port=htons(45193), sin_addr=inet_addr("192.168.0.101")}, [16]) = 0 fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 10), ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb58cf000 socket(PF_FILE, SOCK_STREAM, 0) = 31 fcntl64(31, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(31, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(31, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(31) = 0 socket(PF_FILE, SOCK_STREAM, 0) = 31 fcntl64(31, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(31, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(31, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(31) = 0 open("/etc/nsswitch.conf", O_RDONLY) = 31 fstat64(31, {st_mode=S_IFREG|0644, st_size=503, ...}) = 0 mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb58af000 read(31, "# /etc/nsswitch.conf:\n# $Header:"..., 131072) = 503 read(31, "", 131072) = 0 close(31) = 0 munmap(0xb58af000, 131072) = 0 open("/etc/ld.so.cache", O_RDONLY) = 31 fstat64(31, {st_mode=S_IFREG|0644, st_size=100816, ...}) = 0 mmap2(NULL, 100816, PROT_READ, MAP_PRIVATE, 31, 0) = 0xb58b6000 close(31) = 0 open("/lib/libnss_files.so.2", O_RDONLY) = 31 read(31, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\33\0\000"..., 512) = 512 fstat64(31, {st_mode=S_IFREG|0755, st_size=35284, ...}) = 0 mmap2(NULL, 37512, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 31, 0) = 0xb58ac000 madvise(0xb58ac000, 37512, MADV_SEQUENTIAL|0x1) = 0 mmap2(0xb58b4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 31, 0x7) = 0xb58b4000 close(31) = 0 open("/lib/tls/libc.so.6", O_RDONLY) = 31 read(31, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\266O\1"..., 512) = 512 fstat64(31, {st_mode=S_IFREG|0755, st_size=1190424, ...}) = 0 mmap2(NULL, 1133788, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 31, 0) = 0xb5797000 madvise(0xb5797000, 1133788, MADV_SEQUENTIAL|0x1) = 0 mmap2(0xb58a6000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 31, 0x10e) = 0xb58a6000 mmap2(0xb58aa000, 7388, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb58aa000 close(31) = 0 open("/lib/ld-linux.so.2", O_RDONLY) = 31 read(31, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\7\0"..., 512) = 512 fstat64(31, {st_mode=S_IFREG|0755, st_size=104031, ...}) = 0 mmap2(NULL, 91360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 31, 0) = 0xb69ca000 madvise(0xb69ca000, 91360, MADV_SEQUENTIAL|0x1) = 0 mmap2(0xb69df000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 31, 0x14) = 0xb69df000 close(31) = 0 mprotect(0xb69df000, 4096, PROT_READ) = 0 mprotect(0xb58a6000, 4096, PROT_READ) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
[6 Apr 2007 17:28]
Alexey Kopytov
The reason for crash is the conflict between LinuxThreads and NPTL. When building with -all-static, some (or all) Linux distributions default to LinuxThreads instead of NPTL. The problems start when a statically built binary uses some libnss functions (and therefore, is not actually static). libnss does not deal with threads in itself, but it calls libc functions, which are loaded by dynamic linker from a NPTL version of libc. Here is the snippet from the strace output: open("/lib/libnss_files.so.2", O_RDONLY) = 31 read(31, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\33\0\000"..., 512) = 512 fstat64(31, {st_mode=S_IFREG|0755, st_size=35284, ...}) = 0 mmap2(NULL, 37512, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 31, 0) = 0xb58ac000 madvise(0xb58ac000, 37512, MADV_SEQUENTIAL|0x1) = 0 mmap2(0xb58b4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 31, 0x7) = 0xb58b4000 close(31) = 0 open("/lib/tls/libc.so.6", O_RDONLY) = 31 On my system /lib/libc.so.6 is a LT version of libc, and /lib/tls/libc.so.6 is a NPTL one: $ eu-readelf -n /lib/libc.so.6 Note segment of 32 bytes at offset 0x194: Owner Data size Type GNU 16 VERSION OS: Linux, ABI: 2.4.1 $ eu-readelf -n /lib/tls/libc.so.6 Note segment of 32 bytes at offset 0x194: Owner Data size Type GNU 16 VERSION OS: Linux, ABI: 2.6.9 I see to possible ways to fix this: 1. When building a static binary, force linking with NPTL on those platforms that support it. I have adding an explicit "-L /usr/lib/tls" to LDFLAGS, and the resulting binary does not crash. 2. Force using LinuxThreads at runtime for statically linked binaries. Exporting "LD_ASSUME_KERNEL=2.4.1" prevent the dynamic linker from using /lib/tls/* libraries, so the crash does not occur.
[22 May 2007 16:40]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27147 ChangeSet@1.2494, 2007-05-22 20:39:44+04:00, kaa@polly.local +1 -0 Fix for bug #24611 "mysqld crashes when connecting from remote host, and compiled from source". On some Linux distributions with both LinuxThreads and NPTL glibc versions available, statically built binaries can crash, because linker defaults to LinuxThreads when linking statically, but calls to external libraries (like libnss) are resolved to NPTL versions. Since there is nothing we can do in the code to work that around, just give user an advice on how to fix that, if a crash happened on such a binary/OS combination.
[25 May 2007 16:54]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27352 ChangeSet@1.2494, 2007-05-25 20:52:01+04:00, kaa@polly.local +1 -0 Fix for bug #24611 "mysqld crashes when connecting from remote host, and compiled from source". On some Linux distributions with both LinuxThreads and NPTL glibc versions available, statically built binaries can crash, because linker defaults to LinuxThreads when linking statically, but calls to external libraries (like libnss) are resolved to NPTL versions. Since there is nothing we can do in the code to work that around, just give user an advice on how to fix that, if a crash happened on such a binary/OS combination.
[31 May 2007 13:46]
Nicklas Bondesson
I have tried adding an explicit "-L/lib/tls" to LDFLAGS, and then recompile mysql (5.0.41). The server does not report a crash in the logs, however it fails to start. See notes on this bug: http://bugs.mysql.com/bug.php?id=27168 What can I do next to resolv this issue?
[31 May 2007 14:23]
Alexey Kopytov
Nicklas, The "-L/lib/tls" trick does not work on all Linux distributions. Gentoo allows static linking with NPTL, Debian-based distributions do not. That's why the suggested workaround (see the patch for this bug) mentions only LD_ASSUME_KERNEL. Try "export LD_ASSUME_KERNEL=2.4.1" before starting the server.
[1 Jun 2007 18:47]
Nicklas Bondesson
Thanks for your reply. Correct me if I'm wrong here, but setting this env variable switches to using LinuxThreads and not NPTL. Since I'm on a 2.6.x kernel NPTL should be used, since the kernel is "optimized" for this threading model. If --with-mysqld-ldflags=-all-static is left out I get a working binary. But I have read that compiling mysql staticly will produce a 10%+ faster binary. Is this actually true in every case? I have tried compiling without --with-mysqld-ldflags=-all-static and exporting LD_ASSUME_KERNEL="2.4.1" the server still crashes under Debian (etch). It uses LinuxThreads (i see multiple processes with ps -ef). Same thing if I don't set LD_ASSUME_KERNEL. My second attempt was to staticly build mysql (5.0.41) on a Debian (sarge) installation. This went fine, it went on using LinuxThreads (ps -ef gives multiple processes). Something must have changed between the two different versions of libc, 2.3.2 on sarge and 2.3.6 on etch. Any new ideas? Thanks Nicklas
[6 Jun 2007 16:55]
Bugs System
Pushed into 5.1.20-beta
[6 Jun 2007 16:58]
Bugs System
Pushed into 5.0.44
[8 Jun 2007 14:11]
Trudy Pelzer
Restoring original tagged bug priority.
[12 Jun 2007 18:35]
Paul DuBois
Noted in 5.0.44, 5.1.20 changelogs.