Bug #29283 Ndb_cluster_connection seg faults
Submitted: 21 Jun 2007 19:26 Modified: 5 May 2009 7:39
Reporter: richard horan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: NDB API Severity:S2 (Serious)
Version:5.1.22rc, ndb-6.2.15 OS:Other (AIX 5.3.0, OpenSolaris/x86_64 with Sun CC)
Assigned to: Hartmut Holzgraefe CPU Architecture:Any
Tags: mysqlcom-5.1.16-ndb-6.2.0 ndbapi, Ndb_cluster_connection

[21 Jun 2007 19:26] richard horan
Description:
Ndb_cluster_connection cluster_connection;

this line segfaults using Visual Age Compiler 8.0 on AIX 5.3

below is a stack trace:

Segmentation fault in LogHandlerList::add(LogHandler*) at line 44 in file ""
could not read "LogHandlerList.cpp"
(dbx) t
LogHandlerList::add(LogHandler*)(this = (nil), pNewHandler = 0x000000011067c670), line 44 in "LogHandlerList.cpp"
Logger.Logger::addHandler(LogHandler*)(this = 0x00000001100095a0, pHandler = 0x000000011067c670), line 171 in "Logger.cpp"
Logger::createConsoleHandler()(this = 0x00000001100095a0), line 79 in "Logger.cpp"
Ndb_cluster_connection_impl::Ndb_cluster_connection_impl(const char*)(this = 0x000000011067c450, connect_string = "192.4.108.52"), line 282 in "ndb_cluster_connection.cpp"
testclient.Ndb_cluster_connection::Ndb_cluster_connection(const char*)(this = 0x000000011011d4a0, connect_string = "192.4.108.52"), line 49 in "ndb_cluster_connection.cpp"
__sinit80000000_x_2fpurifyplus_2fmysql_2ftestprogram_2ftestclient_2ecpp()(), line 39 in "testclient.cpp"
__C_runtime_startup() at 0x100000e90
(dbx) x
  $r0:0x000000005b5b5b5b  $stkp:0x0ffffffffffff4b0   $toc:0x0000000110125118  
  $r3:0x000000011067cb50    $r4:0x0000000000000000    $r5:0x0000000000000000  
  $r6:0x0000000000000000    $r7:0x0000000000000000    $r8:0x0000000000000000  
  $r9:0x0000000000000000   $r10:0x0000000000000000   $r11:0x0000000000000000  
 $r12:0x00000001000d566c   $r13:0x00000001106765b8   $r14:0x0000000000000001  
 $r15:0x0ffffffffffffc78   $r16:0x0000000000000000   $r17:0x000000011011f928  
 $r18:0x0000000000000000   $r19:0x000000010015cebc   $r20:0x0ffffffffffff9c0  
 $r21:0x00000000000000a0   $r22:0xffffffff800003ff   $r23:0x0000000080000000  
 $r24:0x0ffffffffffff920   $r25:0x09001000a0004648   $r26:0x0000000000000001  
 $r27:0x000000010015cf9c   $r28:0x0000000110675d50   $r29:0x0000000110009588  
 $r30:0x0000000100160d78   $r31:0x0000000100169fc8  
 $iar:0x00000001000d56bc   $msr:0xa00000000000d0b2    $cr:0x24444228  
$link:0x00000001000d566c   $ctr:0x090000000004b7d0   $xer:0x2000000b  

          Condition status = 0:e 1:g 2:g 3:g 4:g 5:e 6:e 7:l 
        [unset $noflregs to view floating point registers]
        [unset $novregs to view vector registers]
in LogHandlerList::add(LogHandler*) at line 44 in file ""
0x1000d56bc (LogHandlerList::add(LogHandler*)+0x74) f8640008        std   r3,0x8(r4)
(dbx) 

How to repeat:
IT seems the easiest way to reproduce this is 

Ndb_cluster_connection cluster_connection;

have this a a global variable.

compile with VACC and statically link in the ndb libs.
[22 Jun 2007 12:42] richard horan
the problem is the code was creating an Ndb_cluster_connection before doing in ndb_init(). therefore i think the stack was getting corrupt.
[22 Aug 2007 9:39] Eri Koira
Pls check this out:
http://forums.mysql.com/read.php?25,158541,168767#msg-168767

?Maybe a bug after all?
[1 Dec 2007 13:30] Hartmut Holzgraefe
The following simple code fails on OpenSolaris x86_64 with Sun CC already, no matter whether ndb_init() is called or commented out

#include <NdbApi.hpp>

int main(int argc, char **argv)
{
        Ndb_cluster_connection *conn;

        if (ndb_init()) exit(3);

        conn = new Ndb_cluster_connection("localhost");

        exit(0);
}

the dbx backtrace for the crash looks like:

t@1 (l@1) program terminated by signal SEGV (no mapping at the fault address)
0xfecf57b5: add+0x002d: cmpl     %ebx,0x00000004(%ecx)
Current function is main
    9           conn = new Ndb_cluster_connection("localhost");
(dbx) where
current thread: t@1
  [1] LogHandlerList::add(0x0, 0x810e048, 0x1, 0xfecf498a), at 0xfecf57b5 
  [2] Logger::addHandler(0xfed33eb0, 0x810e048, 0x1, 0xfecf4552), at 0xfecf49d7 
  [3] Logger::createConsoleHandler(0xfed33eb0, 0xfed305bc, 0x8047c7c, 0xfecc886c), at 0xfecf45b8 
  [4] Ndb_cluster_connection_impl::Ndb_cluster_connection_impl(0x806c230, 0x805a344, 0x1, 0xfecc82aa), at 0xfecc8881 
  [5] Ndb_cluster_connection::Ndb_cluster_connection(0x80b41c8, 0x805a344), at 0xfecc82d3 
=>[6] main(argc = 1, argv = 0x8047d14), line 9 in "connect.cc"

I'm adding the sample project source in the file section
[1 Dec 2007 13:33] Hartmut Holzgraefe
test project

Attachment: ndbapi_connect-0.1.tar.gz (application/x-gzip, text), 81.24 KiB.

[1 Mar 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[5 Sep 2008 10:36] Hartmut Holzgraefe
Can still easily verify this on OpenSolaris with SunStudio ...
[5 Sep 2008 10:43] Hartmut Holzgraefe
updated test file

Attachment: connect-0.2.tar.gz (application/x-gzip, text), 307.08 KiB.

[5 Sep 2008 10:48] Hartmut Holzgraefe
How to reproduce:

On a 64bit x86 OpenSolaris (i was using the 2008.5 live CD to install)
with packages "sunstudio" and "SUNWgmake" installed

- tar -xvzf connect-0.2.tar.gt   # from the "Files" tab on this bug

- cd connect-0.2

- ./configure --with-mysql=/usr/local/mysql  
  # installed mysql ndb-6.2.15 was built from source with
  # /configure  --with-plugins=max --prefix=/usr/local/mysql

- make 

- ./connect
[5 Sep 2008 10:50] Hartmut Holzgraefe
Running ./connect will segfault, dbx backtrace is:

$ dbx connect core
[...]
t@1 (l@1) program terminated by signal SEGV (no mapping at the fault address)
0xfee84b05: add+0x002d:	cmpl     %ebx,0x00000004(%ecx)
Current function is example_init
   92   	conn = new Ndb_cluster_connection(connectstr);
(dbx) where
current thread: t@1
  [1] LogHandlerList::add(0x0, 0x80890e0, 0x1, 0xfee83cba), at 0xfee84b05 
  [2] Logger::addHandler(0xfeec4e50, 0x80890e0, 0x1, 0xfee83882), at 0xfee83d20 
  [3] Logger::createConsoleHandler(0xfeec4e50, 0xfeec1370, 0x80474ec, 0xfee55485), at 0xfee838e8 
  [4] Ndb_cluster_connection_impl::Ndb_cluster_connection_impl(0x8088fe0, 0x8058cbc, 0x0, 0xfee54e56), at 0xfee5549a 
  [5] Ndb_cluster_connection::Ndb_cluster_connection(0x8070a90, 0x8058cbc), at 0xfee54e81 
=>[6] example_init(pargc = 0x80479d0, pargv = 0x80479d4), line 92 in "connect.cc"
  [7] main(argc = 1, argv = 0x8047a00), line 131 in "connect.cc"
[5 Sep 2008 10:57] Hartmut Holzgraefe
$ uname -a
SunOS opensolaris 5.11 snv_86 i86pc i386 i86pc Solaris

$ CC -V
CC: Sun Ceres C++ 5.9 SunOS_i386 2008/04/04

$ file ./connect
./connect: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped
[5 Sep 2008 14:54] Hartmut Holzgraefe
backtrace from debug build with source line numbers

t@1 (l@1) program terminated by signal SEGV (no mapping at the fault address)
Current function is LogHandlerList::add (optimized)
   42     if (m_pHeadNode == NULL) 
(dbx) where
current thread: t@1
=>[1] LogHandlerList::add(this = ???, pNewHandler = ???) (optimized), at 0xfee403a1 (line ~42) in "LogHandlerList.cpp"
  [2] Logger::addHandler(this = ???, pHandler = ???) (optimized), at 0xfee3f0f4 (line ~171) in "Logger.cpp"
  [3] Logger::createConsoleHandler(this = ???) (optimized), at 0xfee3ec68 (line ~79) in "Logger.cpp"
  [4] Ndb_cluster_connection_impl::Ndb_cluster_connection_impl(this = ???, connect_string = ???, main_connection = ???) (optimized), at 0xfee00903 (line ~292) in "ndb_cluster_connection.cpp"
  [5] Ndb_cluster_connection::Ndb_cluster_connection(this = ???, connect_string = ???) (optimized), at 0xfee00091 (line ~49) in "ndb_cluster_connection.cpp"
  [6] main(argc = 1, argv = 0x8047a58), line 33 in "csc28587.cc"
[9 Oct 2008 6:03] Hartmut Holzgraefe
The problem seems to be related to whether libndbclient is linked in statically or dynamically ... (that's also why mysqld is not affected as it links it in statically).

Looks as if the SunStudio toolchain either does not add initialization code for global objects to the shared libraries startup code, or maybe extra compile/link options are needed to make sure the startup code is actually executed on loading the shared library at runtime ...

When adding an abort() call to the Logger::Logger() constructor on Linux i got the following backtrace on Linux with dynamic linking of libndbclient.so:

#0  0x00007fb04c633095 in raise () from /lib/libc.so.6
#1  0x00007fb04c634af0 in abort () from /lib/libc.so.6
#2  0x00007fb04d85627b in Logger (this=0x7fb04daca1b8) at Logger.cpp:49
#3  0x00007fb04d840ceb in EventLogger (this=0x7fb04daca1a0)
    at EventLogger.cpp:1122
#4  0x00007fb04d829d0f in __static_initialization_and_destruction_0 (
    __initialize_p=<value optimized out>, __priority=2140)
    at ndb_cluster_connection.cpp:35
#5  0x00007fb04d829d3f in global constructors keyed to g_eventLogger ()
    at ndb_cluster_connection.cpp:787
#6  0x00007fb04d867042 in __do_global_ctors_aux ()
   from /usr/local/mysql-ndb-6.2.15/lib/mysql/libndbclient.so.3
#7  0x00007fb04d7d044b in _init ()
   from /usr/local/mysql-ndb-6.2.15/lib/mysql/libndbclient.so.3

So the kibraries _init() handler calls its __do_global_ctors_aux() handler which then initializes the global 

  EventLogger g_eventLogger;

declared in storage/ndb/src/ndbapi/ndb_cluster_connection.cpp
on line 35.

Adding the same abort on OpenSolaris shows that when linking
libndbclient.so dynamically the contructor of this global
object is never called, only when linking in the static
libndbclient.a it is.

After removing the abort() from the constructor again and 
testing a statically linked NDBAPI test program all works
fine.
[9 Oct 2008 6:37] Hartmut Holzgraefe
There is some interesting information about this problem in

  http://www.fpx.de/fp/Software/tcl-c++/tcl-c++.html#CONSTR

It doesn't provide a solution for SunStudio compilers though ...
[9 Oct 2008 8:46] Hartmut Holzgraefe
Setting $LD to "CC" when configuring/compiling the server seems to do the trick when using Sun compilers ... (no idea about how to tweak this on AIX which the bug was originally reported against though)

See also 

http://docs.sun.com/app/docs/doc/819-5267/bkamn?a=view#bkamq

"You should use CC -G to build a dynamic library. When you use ld (the link-editor) or cc (the C compiler) to build a dynamic library, exceptions might not work and the global variables that are defined in the library are not initialized."
[9 Oct 2008 8:49] Jonas Oreland
can you test latest 6.2/6.3?
cause ndbj had same problem, so I think we removed *all* global object from ndbapi

dont remember if we did it 62 or 63 though ?
(i think we did it in 62)

/Jonas
[30 Dec 2008 18:30] John David Duncan
With MySQL Cluster 6.3.20, on OpenSolaris, using the Sun Studio 12 compiler, I do not see this problem anymore.  So I think it is indeed fixed in 6.3.20.
[16 Jan 2009 9:14] Hartmut Holzgraefe
Fixed in ndb-6.3 since at least 6.3.17, maybe earlier.

Last time i checked ndb-6.2 it was still there, need to retest with current 6.2.x ...
[5 May 2009 7:39] Hartmut Holzgraefe
Can't reproduce in 6.2 anymore either, closing ...