Bug #25409 embedded server crashes on startup
Submitted: 4 Jan 2007 11:32 Modified: 10 Apr 2007 16:57
Reporter: Ian Greenhoe Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Embedded Library ( libmysqld ) Severity:S1 (Critical)
Version:5.1 OS:Linux (Linux (Debian, SuSE))
Assigned to: Damien Katz CPU Architecture:Any

[4 Jan 2007 11:32] Ian Greenhoe
Description:
In recent copies of 5.1 from both the mysql-5.1 and mysql-5.1-maint repositories, the mysqltest-embedded program crashes shortly upon startup.

[FWIW -- This appears to be seperate from Bug 18518 insofar as the patch that fixed 18518 was in all trees which have failed.  However, it might be the same root cause.]

So far, this has been observed on three different machines, with multiple completely seperate builds off of freshly cloned trees.  All three machines are i386 architecture.  Two of the machines are running Debian Etch, and one is running SuSE.

I have run the debugger on this multiple times, and it has failed in the same place every time that I have observed it (on different builds, even on builds made and run on different machines).  Partial stack trace follows:

#0  0x0838433d in innobase_release_stat_resources (trx=0x459cd9b3)
    at ha_innodb.cc:373
#1  0x0837a1aa in innobase_release_temporary_latches (hton=0x895e850, 
    thd=0x8fbd300) at ha_innodb.cc:405
#2  0x08442aaf in release_temporary_latches (thd=0x8fbd300, plugin=0x895dc4c, 
    unused=0x0) at handler.cc:1193
#3  0x08281c61 in plugin_foreach_with_mask (thd=0x8fbd300, 
    func=0x8442a7a <release_temporary_latches>, type=1, state_mask=4294967287, 
    arg=0x0) at sql_plugin.cc:990
#4  0x08442ae7 in ha_release_temporary_latches (thd=0x8fbd300)
    at handler.cc:1201
#5  0x082597ef in select_send::send_data (this=0x8fc0200, items=@0x8fbd66c)
    at sql_class.cc:1045
#6  0x082a1e82 in end_send (join=0x8fc0210, join_tab=0x8fc3510, 
    end_of_records=false) at sql_select.cc:11286
...

The most obvious "can't happen" failure seems to be in frame 1 (innobase_release_temporary_latches @ ha_innodb.cc:405):

402		trx = (trx_t*) thd->ha_data[hton->slot];
403	
404		if (trx) {
405			innobase_release_stat_resources(trx);
406		}

Here is what trx and thd->ha_data[hton->slot] look like after the assignment on line 402:
(gdb) print trx
$1 = (trx_t *) 0x459cd9b3
(gdb) print thd->ha_data[hton->slot]
$2 = (void *) 0x0

This particular dataset was observed after the server crashed, however I have single stepped (by line) from line 402 to line 405, and even before the crash trx has a bogus value. Further, I strongly believe that trx has the same bogus value on every run.

How to repeat:
Clone a copy of an up-to-date mysql-5.1 tree.
Build it.
cd into the mysql-test dir.

To watch it crash:
./mtr --embedded

You will get an "Error 139" with very little other info from the mtr script or the server.

I've been running the debugger with:
cd into the mysql-test dir.
./mtr --embedded --start-and-stop
gdb ../libmysqld/examples/mysqltest-embedded
r \
 --no-defaults --silent --skip-safemalloc --tmpdir=var/tmp\
 --character-sets-dir=../sql/share/charsets --logdir=var/log\
 --socket=var/tmp/master.sock --port=9306 --database=test --user=root\
 --password= --timer-file=var/log/timer --server-arg=--no-defaults\
 --server-arg=--console --server-arg=--basedir=.\
 --server-arg=--character-sets-dir=../sql/share/charsets\
 --server-arg=--log-bin-trust-function-creators\
 --server-arg=--default-character-set=latin1\
 --server-arg=--language=../sql/share/english --server-arg=--tmpdir=var/tmp\
 --server-arg=--log-bin=var/log/master-bin\
 --server-arg=--pid-file=var/run/master.pid --server-arg=--port=9306\
 --server-arg=--server-id=1 --server-arg=--socket=var/tmp/master.sock\
 --server-arg=--innodb_data_file_path=ibdata1:10M:autoextend\
 --server-arg=--local-infile --server-arg=--datadir=var/master-data\
 --server-arg=--skip-ndbcluster --server-arg=--log=\
 --server-arg=--plugin_dir=../storage/example/.libs\
 --server-arg=--key_buffer_size=1M --server-arg=--sort_buffer=256K\
 --server-arg=--max_heap_table_size=1M --server-arg=--binlog_cache_size=32768\
 --server-arg=--innodb_lock_wait_timeout=1 --server-arg=--core-file\
 --server-arg=--open-files-limit=1024 --test-file t/innodb.test --result-file\
 r/innodb.result

These arguments were grabbed from the mtr script (using the script debug) and stripped of absolute paths. (The paths are relative to the mysql-test dir.)
[4 Jan 2007 14:41] MySQL Verification Team
Thank you for the bug report. Verified on Fedora Core 6.
[8 Jan 2007 18:00] Ian Greenhoe
Problem is that the THD class has a different size in innodb than outside of innodb.  (There may be other instances of this as well.)  This may be caused by incorrect #DEFINEs while compiling.
[10 Jan 2007 14:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17850

ChangeSet@1.2375, 2007-01-10 06:38:26-08:00, igreenhoe@ra.greendragongames.com +1 -0
  Fix for bug #25409, "embedded server crashes on startup".
  
  This was caused by several interlocking problems:
  
  1) The actual cause of the crash was that THD was compiled with different sets of
  parameters in different .o files in the libmysqld.a library (specifically, ha_innodb.cc
  was compiled without EMBEDDED_SERVER, causing considerable problems.)  Due to this,
  when the program was examined under gdb, nonsensical results occurred:  It appeared
  that a variable was being (consistently) assigned a specific garbage value.  Further,
  a printf in the code showed different results than the gdb "p" command on the same
  variables that were being assigned from. 
  
  2) Next problem was that ha_innodb.cc was not being compiled in the libmysqld dir.
  
  3) Next, there were multiple ha_innodb.o objects in the libmysqld.a library; this
  allowed the compiler to choose a random object.  This was true for almost all of
  the object files in the libmysqld directory (the only object files not effected were
  lib_sql.cc, emb_qcache.cc, and libmysqld.c)
  
  4) The last problem:
  **********************************************************************
  *** RECENT VERSIONS OF THE GNU CORE UTILITIES, SUCH AS LS AND SORT ***
  *** HAVE A SORTING ORDER THAT IS DEPENDENT ON THE LOCALE!          ***
  **********************************************************************
  This caused apparently random versions of the various .o files to be placed
  in the resulting library.  Setting the locale to "C" restored the expected
  behavior.  In the bash shell, you can set the locale to "C" via the command:
  export LANG=C
  
  It is unknown at this time if this change in the behavior of sort and ls will
  cause other problems in the code or compilation process.
[3 Feb 2007 21:08] Sergei Golubchik
The patch is wrong. The solution was implemented for the Bug#23369. Apparently it was broken recently, and this is the bug that should be fixed.
[10 Apr 2007 16:57] Timothy Smith
Several testers, myself included, are unable to repeat at this time.  There were a number of embedded fixes added since this bug was originally verified.  Setting to Can't repeat.