Bug #40397 Server does not handle failure to initialize storage engine
Submitted: 29 Oct 2008 17:14 Modified: 2 Jul 15:38
Reporter: John H. Embretsen
Status: Can't repeat
Category:Server: SE API Severity:S1 (Critical)
Version:6.0.8,5.4 OS:Any
Assigned to: Alexey Botchkov Target Version:5.4+
Tags: test failure, F_SERVER, pb2
Triage: Triaged: D1 (Critical) / R2 (Low) / E2 (Low)

[29 Oct 2008 17:14] John H. Embretsen
Description:
The MySQL Server does not gracefully handle failures to load/initialize certain storage
engines (plugins) such as Falcon. 

For example, in the generated test "falcon_ddl" in Pushbuild 2, a forced recovery attempt
fails (Bug#39789 "Falcon recovery failure after several CREATE + DROP TABLESPACE"). This
causes Falcon not to load properly, and finally the entire server crashes with the
following stack trace / output:

# 02:24:18 081028  2:24:18 [ERROR] Falcon: Recovery failed: can't open file
"/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/vardirs/master-data_recovery/
w . TABLESPACE ": No such file or directory (2)
# 02:24:18 081028  2:24:18 [ERROR] Plugin 'Falcon' init function returned error.
# 02:24:18 081028  2:24:18 [ERROR] Plugin 'Falcon' registration as a STORAGE ENGINE
failed.
# 02:24:18 InnoDB: The log sequence number in ibdata files does not match
# 02:24:18 InnoDB: the log sequence number in the ib_logfiles!
# 02:24:18 081028  2:24:18  InnoDB: Database was not shut down normally!
# 02:24:18 InnoDB: Starting crash recovery.
# 02:24:18 InnoDB: Reading tablespace information from the .ibd files...
# 02:24:18 InnoDB: Restoring possible half-written data pages from the doublewrite
# 02:24:18 InnoDB: buffer...
# 02:24:18 081028  2:24:18  InnoDB: Started; log sequence number 0 46745
# 02:24:18 081028  2:24:18 - mysqld got signal 11 ;
# 02:24:18 This could be because you hit a bug. It is also possible that this binary
# 02:24:18 or one of the libraries it was linked against is corrupt, improperly built,
# 02:24:18 or misconfigured. This error can also be caused by malfunctioning hardware.
# 02:24:18 We will try our best to scrape up some info that will hopefully help diagnose
# 02:24:18 the problem, but since we have already crashed, something is definitely wrong
# 02:24:18 and this may fail.
# 02:24:18 
# 02:24:18 key_buffer_size=8388600
# 02:24:18 read_buffer_size=131072
# 02:24:18 max_used_connections=0
# 02:24:18 max_threads=151
# 02:24:18 thread_count=0
# 02:24:18 connection_count=0
# 02:24:18 It is possible that mysqld could use up to 
# 02:24:18 key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 337799
K
# 02:24:18 bytes of memory
# 02:24:18 Hope that's ok; if not, decrease some variables in the equation.
# 02:24:18 
# 02:24:18 thd: 0xa7da808
# 02:24:18 Attempting backtrace. You can use the following information to find out
# 02:24:18 where mysqld died. If you see no messages after this, something went
# 02:24:18 terribly wrong...
# 02:24:18 stack_bottom = 0xbffa63a0 thread_stack 0x30c00
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(my_print_stacktrace+0x26)
[0x87f503b]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(handle_segfault+0x2cd)
[0x82b1239]
# 02:24:18 [0xa52420]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(plugin_lock(THD*,
st_plugin_int***)+0x7f) [0x8498735]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(ha_lock_engine(THD*,
handlerton*)+0x2f) [0x83e9625]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(ha_resolve_by_legacy_type(THD*,
legacy_db_type)+0x52) [0x83e9c1a]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(ha_checktype(THD*,
legacy_db_type, bool, bool)+0x24) [0x83e9c72]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld
[0x831c379]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(open_table_def(THD*,
TABLE_SHARE*, unsigned int)+0x429) [0x831f69d]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(get_table_share(THD*,
TABLE_LIST*, char*, unsigned int, unsigned int, int*)+0x163) [0x83100b1]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld
[0x83103b0]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(open_table(THD*,
TABLE_LIST*, st_mem_root*, enum_open_table_action*, unsigned int)+0x9f8) [0x8311776]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(open_tables(THD*,
TABLE_LIST**, unsigned int*, unsigned int)+0x3d2) [0x83128f6]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(open_and_lock_tables_derived(THD*,
TABLE_LIST*, bool, unsigned int)+0x77) [0x8312feb]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(simple_open_n_lock_tables(THD*,
TABLE_LIST*)+0x28) [0x8386a58]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(acl_reload(THD*)+0x1b7)
[0x83998e5]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(acl_init(bool)+0x14c)
[0x8399cfa]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(main+0x27b)
[0x82b4e1b]
# 02:24:18 /lib/libc.so.6(__libc_start_main+0xdc) [0x736dec]
# 02:24:18
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld(__gxx_personality_v0+0x3c9)
[0x81ce421]
# 02:24:18 Trying to get some variables.
# 02:24:18 Some pointers may be invalid and cause the dump to abort...
# 02:24:18 thd->query at (nil) is an invalid pointer
# 02:24:18 thd->thread_id=0
# 02:24:18 thd->killed=NOT_KILLED
# 02:24:18 The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
# 02:24:18 information that should help you find out what is causing the crash.
# 02:24:18 Writing a core file
sh: line 1: 27864 Segmentation fault      (core dumped)
/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld
--no-defaults --core-file --loose-console --loose-falcon-debug-mask=65535
--language=/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/share/mysql/english/
--datadir="/export/home/pb2/test/sb_1-103036-1225155866.65/mysql-6.0.8-alpha-linux-i686-test/vardirs/master-data_recovery"
--socket="/export/home/pb2/test/sb_1-103036-1225155866.65/tmp/WCybuKHnUn/master.sock"
--port=19306 2>&1
# 02:24:18 Shutting down the recovered server.
DBI connect('host=127.0.0.1:port=19306:user=root:database=test','',...) failed: Can't
connect to MySQL server on '127.0.0.1' (111) at lib/GenTest/Reporter/Recovery.pm line 75
# 02:24:18 Recovery has failed.
# 02:24:18 Test completed with failure status 104.

How to repeat:
As of 2008-10-29:

 - Branch or checkout the current mysql-6.0-falcon-team bzr branch (parent branches will
probably yield the same results) and build MySQL.
 - Branch or checkout the current mysql-test-extra-6.0 branch, or obtain the Random Query
Generator (gentest) by other means (e.g. MySQL Forge or Launchpad).
 - cd to your test-extra branch.
 - In the following command let --basedir point to your MySQL build.
 - Run the following command (the below example is in Unix format):

perl runall.pl \
--basedir=/home/tmp/mysql/bzr-repos/mysql-6.0-falcon-team \
--reporters=Deadlock,ErrorLog,Backtrace,Recovery,WinPackage \
--mysqld=--loose-falcon-lock-wait-timeout=1 \
--mysqld=--loose-falcon-debug-mask=2 \
--mysqld=--log-output=none \
--mysqld=--log=off \
--mysqld=--skip-safemalloc \
--duration=1200 \
--engine=Falcon \
--grammar=conf/falcon_ddl.yy \
--queries=100000

The test will run for a few minutes before attempting a forced recovery, and the crash
will most likely occur.
[29 Oct 2008 17:14] John H. Embretsen
This test succeeds against 6.0.7 (tested on Linux, using
mysql-6.0.7-alpha-linux-x86_64-glibc23.tar.gz).

For comparison, with revision olav@sun.com-20081015201310-1eprw26cyb9lrq8v (2008-10-15)
of the mysql-6.0-falcon-team branch, the server does *not* crash in this scenario,
although Falcon recovery fails due to the same bug as in the current revision:

# 00:57:35 081016  0:57:35 [ERROR] Falcon: Recovery failed: can't open file
"/export/home/pb2/test/sb_1-84900-1224110472.12/mysql-6.0.8-alpha-linux-i686-test/vardirs/master-data_recovery/
j . TABLESPACE ": No such file or directory (2)
# 00:57:35 081016  0:57:35 [ERROR] Plugin 'Falcon' init function returned error.
# 00:57:35 081016  0:57:35 [ERROR] Plugin 'Falcon' registration as a STORAGE ENGINE
failed.
# 00:57:35 InnoDB: The log sequence number in ibdata files does not match
# 00:57:35 InnoDB: the log sequence number in the ib_logfiles!
# 00:57:35 081016  0:57:35  InnoDB: Database was not shut down normally!
# 00:57:35 InnoDB: Starting crash recovery.
# 00:57:35 InnoDB: Reading tablespace information from the .ibd files...
# 00:57:35 InnoDB: Restoring possible half-written data pages from the doublewrite
# 00:57:35 InnoDB: buffer...
# 00:57:35 081016  0:57:35  InnoDB: Started; log sequence number 0 46655
# 00:57:35 081016  0:57:35 [Note] Event Scheduler: Loaded 0 events
# 00:57:35 081016  0:57:35 [Note]
/export/home/pb2/test/sb_1-84900-1224110472.12/mysql-6.0.8-alpha-linux-i686-test/libexec/mysqld:
ready for connections.
# 00:57:35 Shutting down the recovered server.
# 00:57:36 Recovery has failed.
# 00:57:36 Test completed with failure status 104.

Thus, it seems this is a regression introduced relatively recently.
[1 Nov 2008 19:36] Sveta Smirnova
Thank you for the report.

Verified as described.
[1 Jul 19:23] Sveta Smirnova
Bug is not repeatable in my environment with today 6.0-falcon-team sources.