Bug #32921 Falcon; Intermittent failure during server restart, often falcon_bug_size_16
Submitted: 3 Dec 2007 3:16 Modified: 5 May 2008 18:21
Reporter: Kevin Lewis Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Jim Starkey CPU Architecture:Any

[3 Dec 2007 3:16] Kevin Lewis
Description:
Hakan wrote; falcon_page_size_16 fails consistently with the optimized build on Pushbuild. If we have a bug report for the failing tests, then we can disable it. I tested the test suite with optimized build of Falcon now. Either falcon_page_size_16 or falcon_bug_28095_II are failing with
   failed: 2013: Lost connection to MySQL server during query

The problem with falcon_page_size_16 is after the restart of Falcon
while creating a table:

(gdb) bt
#0  0x00002b304407dfcb in raise () from /lib/libpthread.so.0
#1  0x000000000086d7ab in Error::error (string=<value optimized out>)
    at Error.cpp:92
#2  0x000000000084cd58 in Cache::fetchPage (this=0x2aaaaacafeb8, 
    dbb=0x2aaaaacbb948, pageNumber=9, pageType=PAGE_sections, 
    lockType=Exclusive) at Cache.cpp:292
#3  0x00000000008a29bb in Section::deleteSectionLevel (dbb=0x5736, 
    pageNumber=22353, transId=0) at Section.cpp:950
#4  0x00000000008645b3 in Dbb::deleteSection (this=0x5736,
sectionId=1114, 
    transId=4) at Dbb.cpp:596
#5  0x00000000008ae0d3 in SerialLog::recover (this=0x2aaaaac3da80)
    at SerialLog.cpp:353
#6  0x0000000000860387 in Database::openDatabase (this=0x2aaaaaaaeba0, 
    filename=0x440853a0 "/dev/shm/var/master-data/falcon_master.fts")
    at Database.cpp:721
#7  0x00000000008554c7 in Connection::getDatabase (this=0x2aaaaacaf628, 
    dbName=0x2aaaaacaf32c "FALCON_MASTER", 
    dbFileName=0x440853a0 "/dev/shm/var/master-data/falcon_master.fts", 
    threads=0x2aaaaacaf388) at Connection.cpp:1653
#8  0x0000000000857562 in Connection::openDatabase
(this=0x2aaaaacaf628, 
    dbName=0x2aaaaacaf32c "FALCON_MASTER", filename=<value optimized
out>, 
    account=0xb53891 "mysql", password=0xb53891 "mysql", address=0x0, 
    parent=0x2aaaaacaf388) at Connection.cpp:931
#9  0x0000000000827790 in StorageDatabase::getOpenConnection (
    this=0x2aaaaacaf1b0) at StorageDatabase.cpp:133
#10 0x000000000082a0b6 in StorageHandler::initialize
(this=0x2aaaaaaae040)
    at StorageHandler.cpp:893
#11 0x000000000082a765 in StorageHandler::createTable (this=0x5736, 
    pathname=0x44086c90 "./page_16/t1", tableSpaceName=0x0,
tempTable=false)
    at StorageHandler.cpp:585
#12 0x00000000008239d6 in StorageInterface::create (this=0x1af3810, 
    mySqlName=0x44086c90 "./page_16/t1", form=0x44085bb0,
info=0x44088060)
    at ha_falcon.cpp:660
#13 0x0000000000767c41 in ha_create_table (thd=0x19874c0, 
    path=<value optimized out>, db=0x1af0630 "page_16", 
    table_name=0x1af02b0 "t1", create_info=0x44088060, 
    update_create_info=false) at handler.cc:2701
#14 0x0000000000730c19 in rea_create_table (thd=0x19874c0, 
    path=0x44086c90 "./page_16/t1", db=0x1af0630 "page_16", 
    table_name=0x1af02b0 "t1", create_info=0x44088060, 
    create_fields=@0x44088178, keys=0, key_info=0x1af0a48,
file=0x1af07b0)
    at unireg.cc:473
#15 0x0000000000777c85 in mysql_create_table_no_lock (thd=0x19874c0, 
    db=0x1af0630 "page_16", table_name=0x1af02b0 "t1",
create_info=0x44088060, 
    alter_info=0x44088130, internal_tmp_table=false,
select_field_count=0)
    at sql_table.cc:3471
---Type <return> to continue, or q <return> to quit---
#16 0x0000000000778045 in mysql_create_table (thd=0x19874c0, 
    db=0x1af0630 "page_16", table_name=0x1af02b0 "t1",
create_info=0x44088060, 
    alter_info=0x44088130, internal_tmp_table=224, select_field_count=0)
    at sql_table.cc:3578
#17 0x00000000006966c0 in mysql_execute_command (thd=0x19874c0)
    at sql_parse.cc:2236
#18 0x0000000000696ceb in mysql_parse (thd=0x19874c0, 
    inBuf=0x1af0210 "CREATE TABLE t1 (a int)", length=23, 
    found_semicolon=0x44089070) at sql_parse.cc:5415

How to repeat:
This lost connection can be seen in pushbuild usually during falcon_page_size_16.

Jim wrote;
I suspect that the fix will be easy, but this looks like one very tough nut to crack.  It can be reproduced only by running 40 tests, stopping the server, restarting with recovery, stopping the server again, and restarting a second time with recovery.  And then it only happens one time out of three.

I'd like to promise a quick resolution, but I just can't.
[3 Dec 2007 16:07] Ann Harrison
This was a pre-existing bug in recovery - timing sensitive.  Something happened to make it appear.  There's nothing specific about 16K pages.
[3 Dec 2007 16:10] Jim Starkey
>The fix is pushed.
>
>The bug was a failure to rebuild a Section page when required during
>recovery.  The bug has been there indefinitely.  Something changed in
>the timing and sequence of the test suite that brought it out.  And
>there was probably no way we could have caught the bug by systematic
>testing.
>
>I caught the bug by doing a full database validation following recovery
>that would force a bugcheck on any database corruption.  While this
>validated the diagnosis and fix, it should also give us a greater
>confidence that the recovery is robust and tested.
>
>I will add a configuration parameter post beta to force post-recovery
>validation.  Since recovery is automatic on every server start and
>validation is potentially quite expensive, I don't want to make it the
>default.  But we should always use it when testing.
>
[21 Feb 2008 16:15] Kevin Lewis
Patch is in mysql-6.0-release version 6.0.4
[5 May 2008 18:21] Paul DuBois
Noted in 6.0.4 changelog.

Falcon failed to rebuild a Section page during recovery, causing a
server crash.