Bug #4872 Can't start mysqld as root when using multiple nss sources
Submitted: 3 Aug 2004 14:53 Modified: 25 Feb 2005 19:31
Reporter: Alexandre Boeglin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0.18, verified on 4.1.5 OS:Linux (Linux (or actually glibc?))
Assigned to: Jim Winstead CPU Architecture:Any

[3 Aug 2004 14:53] Alexandre Boeglin
Description:
System is an updated (2004-08-03) Mandrake 10.0.

 [alex@dls alex]$ uname -a
Linux dls.nexedi.org 2.6.3-15mdk #1 Fri Jul 2 22:09:29 MDT 2004 i686 unknown unknown GNU/Linux
[alex@dls alex]$ rpm -q MySQL-Max
MySQL-Max-4.0.18-1.1.100mdk
[alex@dls alex]$ ldd /usr/sbin/mysqld-max
        linux-gate.so.1 =>  (0xffffe000)
        librt.so.1 => /lib/tls/librt.so.1 (0x4002b000)
        libdl.so.2 => /lib/libdl.so.2 (0x40040000)
        libssl.so.0.9.7 => /usr/lib/libssl.so.0.9.7 (0x40043000)
        libcrypto.so.0.9.7 => /usr/lib/libcrypto.so.0.9.7 (0x40075000)
        libz.so.1 => /lib/libz.so.1 (0x40177000)
        libcrypt.so.1 => /lib/libcrypt.so.1 (0x40188000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x401b5000)
        libpthread.so.0 => /lib/tls/libpthread.so.0 (0x401c9000)
        libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x401d9000)
        libm.so.6 => /lib/tls/libm.so.6 (0x40299000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x402bc000)
        libc.so.6 => /lib/tls/libc.so.6 (0x402c5000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

Here is an extract from my /etc/nsswitch.conf :
passwd:     files ldap
shadow:     files ldap
group:      files ldap

I have system users (including mysql) in /etc/passwd and normal users in a working ldap directory.

a "getent passwd" gives me both system and normal users, and every other part of my system works like a charm, so I assume it's a bug in mysql.

when starting mysqld as root, and watching oit with strace, I can clearly see it looking for nss data, first in files, then in ldap. But then, mysqld does a geteuid32() which returns 0 instead of the uid of the mysql user, and it exits with a signal 11.

when starting mysqld as root with nss_ldap disabled, or when starting it logged in as mysql, there is no problem.

Of course, I'm available for further infos or tests.

How to repeat:
start mysqld as root :

# strace /usr/sbin/mysqld-max -u mysql
[...] (mainly opening libraries, and looking for nss infos)
getpid()                                = 24651
geteuid32()                             = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
write(2, "mysqld got signal 11;\nThis could"..., 248mysqld got signal 11;
[...] (displaying crash message, and exiting)

Suggested fix:
for the moment, i added a "sudo -u mysql" in my init script on the line that launches mysqld_safe, but the nss code needs to be fixed.
[20 Aug 2004 12:20] Hartmut Holzgraefe
I couldn't reproduce the problem, but my setup is slightly different.

I tested 4.0.20 on SuSE 9.0 with nsswitch settings

  passwd: files nis
  shadow: files nis 
  group:  files nis

Could you try 4.0.20 to see if the problem persists?
[20 Aug 2004 14:45] Alexandre Boeglin
Okay, I tried with Mandrake Cooker packages :
MySQL-common-4.0.20-3mdk.i586.rpm, MySQL-Max-4.0.20-3mdk.i586.rpm.

I got exactly the same result.

btw, I don't think your nss setup will be taken in account if you don't have nis specific fields in /etc/passwd and /etc/group. So, to reproduce this bug, maybe you'll have to set up a real ldap or nis server.

Regards,
Alex
[20 Aug 2004 17:10] Hartmut Holzgraefe
Can you add your complete strace log so that i can compare it to mine?

Btw: the geteuid32() call happens *before* setuid(...) in my strace,
so it is expected to return 0
[24 Aug 2004 14:35] Alexandre Boeglin
strace output of the command "strace mysqld-max -u mysql"

Attachment: strace.log.bz2 (application/x-bzip, text), 4.28 KiB.

[24 Aug 2004 14:36] Alexandre Boeglin
Okay, I attached a file containing the strace log.

Regards, Alex.
[28 Sep 2004 10:04] Hartmut Holzgraefe
Ok, i have verified the crash on some of my local server installations now.

A running LDAP server is not needed to reproduce the bug, 
those server binaries that show the problem crash as soon
as LDAP is added to the NSS configuration.

I have no idea yet what difference between my installations
causes only some of them to crash. I especially haven't been
able to create a debug binary that crashes yet so i can't yet
single step through the code to identify the actual cause of
the crash. 

I'm trying out different combinations of configure options right
now to create a debugable binary that actually crashes. As 
soon as i have succeeded in that it should be easy to identify
and fix the cause of the crash.
[17 Oct 2004 22:42] Hartmut Holzgraefe
i have now been able to produce a crashing debug binary using the following configure line on freshly unpacked 4.1.5 source:

CC='gcc'  CFLAGS='-O1 -g'  CXX='gcc'  CXXFLAGS='-O1 -g'  LDFLAGS=''  ASFLAGS='' ./configure '--prefix=/usr/local/mysql' '--enable-assembler' '--with-extra-charsets=complex' '--enable-thread-safe-client' '--with-readline' '--enable-local-infile' --with-debug '--with-mysqld-ldflags=-all-static' '--with-client-ldflags=-all-static'

all my recent test builds crashed when configured as static binaries with the 'all-static' ldflag options but i'm pretty sure i had dynamic libraries crashing in the past, too (sorry, i lost the logs i took for these older builds)

the actual crash happens when set_user() in sql/mysqld.cc calls the libc function initgroups(). the parameters passed to initgroups() look perfectly valid.

so the actual problem is either within glibc, libnss_ldap, libldap or (IMHO most likely)
related to nss shared library handling

gdb isn't able to create a backtrace after the crash
[17 Oct 2004 22:52] Hartmut Holzgraefe
some google results regarding "nss initgroups segfault"

this one seconds my theory regarding static builds
http://lists.gnu.org/archive/html/bug-parted/2001-08/msg00116.html

Zope seems to be suffering from this, too
http://gossamer-threads.com/lists/zope/users/173947?search_string=initgroups;#173947
http://mail.zope.org/pipermail/zope-collector-monitor/2004-August/003985.html

a similar problem with µlibc, maybe the same code is in glibc?
http://www.uclibc.org/lists/uclibc/2002-July/003998.html

The glibc bugzilla didn't show any entries when searching for initgroups.
[17 Oct 2004 23:08] Hartmut Holzgraefe
i've now tried the other nss method available on my system and none of these crashed so it seems to be an LDAP only problem. 

As a first workaround i would suggest to temporarily change the segfault signal handler when calling initgroups() so that we can at least bail out with a meaningfull error message in this case that recommends to either drop ldap from /etc/nsswitch.conf or to start as user 'mysql' instead of root right away.

Btw: Our x86_64 startup crash problems might be related to this, too.
Having a real error message for this would help to verify this, too.
[18 Oct 2004 10:27] Ingo Strüwing
Some time ago I found a similar problem on Debian. It turned out that an entry of 'db' in /etc/nsswitch.conf activates a nss library with Sleepycats BerkeleyDB in it, which is also contained in MySQL Max. These versions of BerkeleyDB in the same executable disturbed each other. But I do not see an exact match in this case, as it happens only with 'ldap' and not with 'db'. It would be nice to have a stack backtrace. This might inspire an idea for what's going on.
[24 Nov 2004 17:23] Lenz Grimmer
BUG#3037 was marked as a duplicate of this bug.
[16 Feb 2005 20:07] Jim Winstead
This is a bug in glibc's NSS support when linked statically. This can be avoid by not linking statically, not having LDAP in nsswitch.conf, or using a newer glibc with nscd.

The patch outputs a message to this effect when a segfault occurs during the call to initgroups().
[23 Feb 2005 0:51] Jim Winstead
Fix pushed, will be in 4.1.11.
[25 Feb 2005 19:31] Paul DuBois
Noted in 4.1.11 changelog.
[1 Dec 2005 11:47] Bernardo Innocenti
4.0.26 still seems to be affected.
Is it possible to backport the fix to the 4.0 branch?