Bug #33665 Falcon + event scheduler + restarts = crash
Submitted: 3 Jan 2008 18:04 Modified: 5 May 2008 18:35
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0.4-p2 OS:Any
Assigned to: Jim Starkey CPU Architecture:Any

[3 Jan 2008 18:04] Philip Stoev
Description:
A peculiar combination of Falcon, event scheduler and restarts causes a crash in Falcon.

0x824a181 handle_segfault + 441
0xa16402 (?)
0x83fce5d Error::error(char const*, ...) + 73
0x83fcf66 Error::assertionFailed(char const*, int) + 22
0x844ab5e Section::getSectionPage(Dbb*, int, int, LockType, unsigned int) + 938
0x844ce00 Section::getSectionPage(int, LockType, unsigned int) + 184
0x844ca1a Section::redoRecordLocatorPage(int, int, bool) + 58
0x84832c1 SRLRecordLocator::pass2() + 101
0x844fe22 SerialLog::recover() + 1094
0x83ec677 Database::openDatabase(char const*) + 367
0x83e8fad Connection::getDatabase(char const*, char const*, Threads*) + 193
0x83e68c2 Connection::openDatabase(char const*, char const*, char const*, char const*, char const*, Threads*) + 74
0x83bb3bf StorageDatabase::getOpenConnection() + 63
0x83bf164 StorageHandler::initialize() + 108
0x83be8b5 StorageHandler::createTable(char const*, char const*, bool) + 241
0x83b1bae StorageInterface::create(char const*, st_table*, st_ha_create_information*) + 94
0x830c3fa ha_create_table(THD*, char const*, char const*, char const*, st_ha_create_information*, bool) + 190
0x82e212b rea_create_table(THD*, char const*, char const*, char const*, st_ha_create_information*, List<Create_field>&, unsigned int, st_key*, handler*) + 167
0x831aa03 mysql_create_table_no_lock(THD*, char const*, char const*, st_ha_create_information*, Alter_info*, bool, unsigned int) + 1139
0x831aff9 mysql_create_table(THD*, char const*, char const*, st_ha_create_information*, Alter_info*, bool, unsigned int) + 465
0x8255695 mysql_execute_command(THD*) + 1357
0x825b5a5 mysql_parse(THD*, char const*, unsigned int, char const**) + 269
0x82540bd dispatch_command(enum_server_command, THD*, char*, unsigned int) + 909
0x8253d00 do_command(THD*) + 176
0x825254f handle_one_connection + 287

The problematic query is:

create table t1 (id int not null, str char(10), unique(str))

After restart, tablespace is OK, and the same query works.

How to repeat:
Please run:

perl mysql-test-run.pl --mysqld=--default-storage-engine=falcon \ main.events_restart_phase1 main.events_restart_phase2 \ main.events_restart_phase3 main.explain

All those four tests must run together in order to produce the right mix of restarts, however the crash is 100% reproducible.

Please let me know if a simplified test case is required for fixing this bug, and I will work to provide it.

You can run mysql-test-run.pl with the --script-debug option to see the individual server restarts.
[17 Jan 2008 11:37] Philip Stoev
Setting to verified, since a reasonably simple repeatable test case is available. Setting lead, please triage.
[17 Jan 2008 15:36] Kevin Lewis
Phillip, The call stack indicates that this is a recovery failure.  Can you same of the falcon database so that we can re-run the recovery?
[17 Jan 2008 15:59] Philip Stoev
Corrupted tablespace for bug #33665

Attachment: bug33665.zip (application/x-zip-compressed, text), 312.75 KiB.

[17 Jan 2008 16:01] Philip Stoev
Hello, I just attached the corrupted tablespace. You can mount it and run a CREATE TABLE ENGINE=Falcon and the crash will occur.

This table space is produced using the following command:

perl mysql-test-run.pl --mysqld=--default-storage-engine=falcon \
events_restart_phase1 events_restart_phase2 events_restart_phase3
[17 Jan 2008 21:54] Kevin Lewis
Possible 'double recovery' problem.
[23 Jan 2008 14:51] Kevin Lewis
Possibly a double recovery problem
[23 Jan 2008 21:38] Kevin Lewis
Jim Starkey pushed a change to Falcon that requires it ro wait for a Page Cache Flush at the end of recovery.  It used to do this, but when the pool of IO threads were added, the recovery process just called for a flush to happen and then returned.  Now the flush waits for all the IO threads to complete.

When recovery does not wait for the flush to complete, new blocks can be written to the serial log before all the pages from the previous recovery can be written to disk.  These blocks indicate that recovery should start after the previous one.  If a shutdown happens before these pages are fully written, The following recovery will not start early enough.

This very likely solves the double recovery problems exhibited by Bugs 32992, 33608 & 33665.  I am putting these bugs into Patch Pending so that this fix can be verified, or not.
[11 Feb 2008 20:48] Kevin Lewis
Patch is in mysql-6.0-falcon-team and mysql-6.0-release version 6.0.4
[5 May 2008 18:35] Paul DuBois
Noted in 6.0.4 changelog.

Incomplete Falcon recovery after server restarts eventually resulted
in tablespace corruption.