Bug #36745 falcon crash on solaris
Submitted: 15 May 2008 20:52 Modified: 30 Sep 2008 19:31
Reporter: Neel Nadgir Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0.5 OS:Solaris
Assigned to: Olav Sandstå CPU Architecture:Any
Tags: SPARC 64bit

[15 May 2008 20:52] Neel Nadgir
Description:
I get the following crash while running mysql_install_db with falcon.

              libc.so.1`_lwp_kill+0x4
              mysqld`Error::error(const char*, ...)+0x50
              mysqld`SyncObject::~SyncObject()+0x28
              mysqld`_GLOBAL__D__ZN10ConnectionC2EP13Confi.. +0x14
              mysqld`__do_global_dtors_aux+0x48
              mysqld`_fini+0x4
              libc.so.1`_exithandle+0x40
              libc.so.1`exit+0x4
              mysqld`handle_segfault+0x3f0
              libc.so.1`__sighndlr+0xc
              libc.so.1`call_user_handler+0x42c
              libc.so.1`sigacthandler+0x4c
              mysqld`SymbolManager::getSymbol(const char*)+0x1c0
              mysqld`RoleModel::RoleModel(Database*)+0x6c
              mysqld`Database::start()+0x4c
              mysqld`Database::createDatabase(const char*)+0x6c
              mysqld`Connection::createDatabase(const char.. +0x140
              mysqld`StorageDatabase::createDatabase()+0x74
              mysqld`StorageHandler::initialize()+0xf4
              mysqld`StorageInterface::falcon_init(void*)+0x224

I am using 6.0.5 compiled with gcc 4.0.4

mysqlbug reports..
>Release:	mysql-6.0.5-alpha-pb87 (Source distribution)

>C compiler:    sparc-sun-solaris2.10-gcc (GCC) 4.0.4 (gccfss)
>C++ compiler:  sparc-sun-solaris2.10-g++ (GCC) 4.0.4 (gccfss)
>Environment:
	<machine, os, target, libraries (multiple lines)>
System: SunOS frost 5.11 snv_88 sun4v sparc SUNW,Sun-Fire-T200
Architecture: sun4

Some paths:  /usr/bin/perl /usr/bin/make /usr/bin/gmake /tank/ted/tools/gcc/bin/gcc /ws/onnv-tools/SUNWspro/SS11/bin/cc
GCC: Using built-in specs.
Target: sparc-sun-solaris2.10
Configured with: /net/clpt-v490-1/export/data/bldmstr/20070711_mars_gcc/src/configure --prefix=/usr/sfw --enable-shared --with-system-zlib --enable-checking=release --disable-libmudflap --enable-languages=c,c++ --enable-version-specific-runtime-libs --with-cpu=v9 --with-ld=/usr/ccs/bin/ld --without-gnu-ld
Thread model: posix
gcc version 4.0.4 (gccfss)
Compilation info (call): CC='/tank/ted/tools/gcc/bin/gcc'  CFLAGS='-m64 -xmemalign=8i -O3 '  CXX='/tank/ted/tools/gcc/bin/g++'  CXXFLAGS='-m64 -xmemalign=8i -O3'  LDFLAGS=''  ASFLAGS=''
Compilation info (used): CC='/tank/ted/tools/gcc/bin/gcc'  CFLAGS=' -m64 -xmemalign=8i -O3    -DHAVE_RWLOCK_T -DUNIV_SOLARIS'  CXX='/tank/ted/tools/gcc/bin/g++'  CXXFLAGS=' -m64 -xmemalign=8i -O3   -fno-implicit-templates -fno-exceptions -fno-rtti -DHAVE_RWLOCK_T '  LDFLAGS=' '  ASFLAGS=''
LIBC: 
lrwxrwxrwx   1 root     root           9 Apr 25 15:10 /lib/libc.so -> libc.so.1
-rwxr-xr-x   1 root     bin      1825004 Apr 19 12:13 /lib/libc.so.1
lrwxrwxrwx   1 root     root          19 Apr 25 15:10 /usr/lib/libc.so -> ../../lib/libc.so.1
lrwxrwxrwx   1 root     root          19 Apr 25 15:10 /usr/lib/libc.so.1 -> ../../lib/libc.so.1
Configure command: ./configure '--prefix=/tank/ted/tools/mysql-6.0.5-falcon' '--enable-dtrace' '--with-plugin-falcon' '--with-plugin-innobase' '--with-readline' 'CC=/tank/ted/tools/gcc/bin/gcc' 'CFLAGS=-m64 -xmemalign=8i -O3 ' 'CXXFLAGS=-m64 -xmemalign=8i -O3' 'CXX=/tank/ted/tools/gcc/bin/g++'

The following dtrace script was used to catch the crash
#!/usr/sbin/dtrace -qws

proc:::signal-send
/args[2] == SIGABRT/
{
  stop();

  ustack();
  system("/usr/bin/pstack %d", pid);
  system("/usr/bin/pmap %d", pid);
  system("/usr/bin/pldd %d", pid);
  system("/usr/bin/ptree %d", pid);
  system("/usr/bin/pfiles %d", pid);

  system("/usr/bin/prun %d", pid);
  exit(0);
}

How to repeat:
just running mysql_install_db causes this crash
[20 May 2008 13:48] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior in my environment.

Can you repeat crast if run mysqld as `./libexec/mysqld --bootstrap --basedir=. --datadir=./data --log-warnings=0 --loose-skip-innodb --loose-skip-ndbcluster --max_allowed_packet=8M --net_buffer_length=16K` without help of mysql_install_db script?
[20 May 2008 18:23] Neel Nadgir
I tried as you suggested and still got the same crash (same callstack)
===

ted@frost ~/tools/mysql-6.0.5-falcon$ ./libexec/mysqld --bootstrap --basedir=. --datadir=/data/mysql-falcon --log-warnings=0 --loose-skip-innodb --loose-skip-ndbcluster --max_allowed_packet=8M --net_buffer_length=16K
080520 11:21:17 - mysqld got signal 10 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=0
max_threads=151
threads_connected=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338287 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Fatal signal 6 while backtracing
===
[21 May 2008 16:36] Kevin Lewis
Olav Sandtraa, Please work with PAE to isolate this.  It could be an endian issue.
[22 May 2008 22:18] Neel Nadgir
It turns out that this is a memory alignment issue. I fixed the crash by
changing the following

1. Change size of DeferredIndex::initialSpace from 500 to 512. This ensures that
   on 64 bit platforms, it is aligned at 8 byte boundaries.
2. in SymbolManger.cpp, there is some magic to roundup the pointer size.
   (ex line 127: symbol = (Symbol*) ((IPTR)(next + 3) & ~3); )
   I suggest this (and similar occurrences) be changed to use ROUNDUP()
   macro in Engine.h i.e symbol = (Symbol*) ROUNDUP(next, sizeof (void *))

if this is going to be part of the ondisk format, need to be more careful.
I do not know if it is a design goal of falcon/mysql that databases 
created with 64bit mysql are accessible via 32bit mysql..
[22 May 2008 23:07] Olav Sandstå
Hi Neel,

Thanks for looking into the cause for this problem and for providing a fix to it. I will work to get your fix checked in to the Falcon repository. 

Regards,
Olav
[28 May 2008 20:39] Jim Starkey
The issue in SymbolManager is real and should be fixed.  The comment about DeferredIndex initial allocation, however, is not.  All blocks allocated from MemMgr are on 8 byte or greater boundaries.

And neither the SymbolManager or DeferredIndex have anything to do with the ODS.
[28 May 2008 20:59] Neel Nadgir
The issue with DeferredIndex::initialSpace was that currentHunkOffset
is initialized to 500 initially. 

currentHunkOffset = sizeof(initialSpace); in DeferredIndex::initializeSpace

For ex, if we alloc 8 bytes. DeferredIndex::alloc() will return 
base + (currentHunkOffset - 8)
ie. base + 500 - 8
i.e base + 492 which is not 8 byte aligned.
and thus the crash.
[4 Jun 2008 13:03] Olav Sandstå
Just for the record: here is the same stack trace as reported by Neel but with symbol information:

Running: mysqld --bootstrap --basedir=. --datadir=./data --log-warnings=0 --loose-skip-innodb --loose-skip-ndbcluster --max_allowed_packet=8M --net_buffer_length=16K 
(process id 13650)
Reading libc_psr.so.1
080604 12:50:25 [ERROR] Can't find messagefile '/home/os136802/mysql/develop/repo/mysql-6.0-falcon-sunstudio/share/mysql/english/errmsg.sys'
t@1 (l@1) signal BUS (invalid address alignment) in SymbolManager::getSymbol at line 144 in file "SymbolManager.cpp"
  144           symbol->collision = hashTable [slot];
(dbx) where
current thread: t@1
=>[1] SymbolManager::getSymbol(this = 0x10119e1f0, string = 0x100b0c001 "SYSTEM"), line 144 in "SymbolManager.cpp"
  [2] Database::getSymbol(this = 0x10101ae70, string = 0x100b0c001 "SYSTEM"), line 1908 in "Database.cpp"
  [3] RoleModel::RoleModel(this = 0x1011a0620, db = 0x10101ae70), line 109 in "RoleModel.cpp"
  [4] Database::start(this = 0x10101ae70), line 508 in "Database.cpp"
  [5] Database::createDatabase(this = 0x10101ae70, filename = 0xffffffff7fffe088 "falcon_master.fts"), line 641 in "Database.cpp"
  [6] Connection::createDatabase(this = 0x10121a410, dbName = 0x10121a144 "FALCON_MASTER", fileName = 0x10121a174 "falcon_master.fts", account = 0x100b11cd4 "mysql", password = 0x100b11cda "mysql", threads = 0x10121a1a0), line 1066 in "Connection.cpp"
  [7] StorageDatabase::createDatabase(this = 0x101219fd8), line 158 in "StorageDatabase.cpp"
  [8] StorageHandler::initialize(this = 0x10101a328), line 996 in "StorageHandler.cpp"
  [9] StorageInterface::falcon_init(p = 0x101c38730), line 207 in "ha_falcon.cpp"
  [10] ha_initialize_handlerton(plugin = 0x101c31b58), line 428 in "handler.cc"
  [11] plugin_initialize(plugin = 0x101c31b58), line 1011 in "sql_plugin.cc"
  [12] plugin_init(argc = 0x100d9dbe8, argv = 0x10141d828, flags = 2), line 1217 in "sql_plugin.cc"
  [13] init_server_components(), line 3975 in "mysqld.cc"
  [14] main(argc = 9, argv = 0xffffffff7ffff288), line 4407 in "mysqld.cc"
[6 Jun 2008 12:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47530

2692 Olav Sandstaa	2008-06-06
      Fix to one of the memory alignment issues in Bug#36745 falcon crash on solaris
            
      The problem was a memory alignment issue in SymbolManager.cpp that caused 
      a crash when running in 64 bit mode on Solaris on SPARC. Fixed this 
      by ensuring the symbol objects are allocated at 8 byte boundaries
      when compiling for 64 bit systems.
[7 Jun 2008 18:09] Olav Sandstå
Here is the call stack for the second BUS (invalid address alignment) error:

t@1 (l@1) signal BUS (invalid address alignment) in DeferredIndex::addNode at line 196 in file "DeferredIndex.cpp"
  196                           leaf->nodes[leaf->count++] = node;
(dbx) where
current thread: t@1
=>[1] DeferredIndex::addNode(this = 0x10122cde8, indexKey = 0xffffffff7fffa6c8, recordNumber = 0), line 196 in "DeferredIndex.cpp"
  [2] Index::insert(this = 0x10121fa60, key = 0xffffffff7fffa6c8, recordNumber = 0, transaction = 0x101228810), line 212 in "Index.cpp"
  [3] Index::insert(this = 0x10121fa60, record = 0x10495b000, transaction = 0x101228810), line 206 in "Index.cpp"
  [4] Table::insertIndexes(this = 0x1011b9f60, transaction = 0x101228810, record = 0x10495b000), line 1218 in "Table.cpp"
  [5] Table::insert(this = 0x1011b9f60, transaction = 0x101228810, count = 8, fieldVector = 0x10121d760, values = 0x10121d510), line 358 in "Table.cpp"
  [6] NInsert::evalStatement(this = 0x10122c320, statement = 0x10121cfc8), line 143 in "NInsert.cpp"
  [7] Nfs::Statement::start(this = 0x10121cfc8, node = 0x10122c320), line 487 in "Statement.cpp"
  [8] PreparedStatement::executeUpdate(this = 0x10121cfc8), line 86 in "PreparedStatement.cpp"
  [9] Table::save(this = 0x1011b9f60), line 258 in "Table.cpp"
  [10] Database::createDatabase(this = 0x10101be20, filename = 0xffffffff7fffdf98 "falcon_master.fts"), line 658 in "Database.cpp"
  [11] Connection::createDatabase(this = 0x10121b3c0, dbName = 0x10121b0f4 "FALCON_MASTER", fileName = 0x10121b124 "falcon_master.fts", account = 0x100afc26c "mysql", password = 0x100afc272 "mysql", threads = 0x10121b150), line 1066 in "Connection.cpp"
  [12] StorageDatabase::createDatabase(this = 0x10121af88), line 158 in "StorageDatabase.cpp"
  [13] StorageHandler::initialize(this = 0x10101b2d8), line 996 in "StorageHandler.cpp"
  [14] StorageInterface::falcon_init(p = 0x101c396e0), line 207 in "ha_falcon.cpp"
  [15] ha_initialize_handlerton(plugin = 0x101c32b08), line 428 in "handler.cc"
  [16] plugin_initialize(plugin = 0x101c32b08), line 1011 in "sql_plugin.cc"
  [17] plugin_init(argc = 0x100d9eb98, argv = 0x10141e7d8, flags = 2), line 1217 in "sql_plugin.cc"
  [18] init_server_components(), line 3975 in "mysqld.cc"
  [19] main(argc = 9, argv = 0xffffffff7ffff198), line 4407 in "mysqld.cc"
[8 Jun 2008 6:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47579

2694 Olav Sandstaa	2008-06-08
      Fix to the second of the memory alignment issues in Bug#36745 falcon crash on solaris
      
      The problem that caused crashes on SPARC running in 64 bit mode was due a memory aligment issue
      when "nodes" were allocated in the DeferredIndex::initialSpace. Memory is allocated in the
      initialSpace starting from the end of the memory area. Since the end of the initialSpace was not
      aligned on an address boundary this resulted in that memory allocated from it was not aligned and thus
      crashed on Solaris when running in 64 bit mot on SPARC. Fixed this by ensuring that the size
      of the initialSpace is "8 byte memory aligned" (increased the size of it from 500 bytes to 512 bytes).
[30 Sep 2008 19:31] Jon Stephens
Documented in the 6.0.6 changelog as follows:

        mysql_install_db from a Falcon-enabled build crashed on Solaris/SPARC.