Bug #39706 Falcon recovery assertion in Cache::fetchPage for simple scenarios
Submitted: 28 Sep 2008 8:45 Modified: 9 Jan 2009 13:59
Reporter: Hakan Küçükyılmaz Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0-falcon, 6.0-falcon-team OS:Any
Assigned to: Christopher Powers CPU Architecture:Any
Tags: F_PAGE IO

[28 Sep 2008 8:45] Hakan Küçükyılmaz
Description:
Crash recovery does not work for simple case.

How to repeat:
CREATE TABLE t1 (a int) Engine Falcon;
INSERT INTO t1 VALUES (2), (3), (5);

--
-- Kill mysqld (like kill -9 <pid_of_mysqld>)
-- Restart mysqld
-- If you don't get an assertion, then kill and restart mysqld again
--

sql/mysqld(my_print_stacktrace+0x2e)[0xa5e378]
sql/mysqld(handle_segfault+0x25c)[0x66eacd]
/lib/libpthread.so.0[0x7f32e218ea90]
/lib/libpthread.so.0(raise+0x2b)[0x7f32e218e94b]
sql/mysqld(Error::debugBreak()+0xe)[0x8b2e48]
sql/mysqld(Error::error(char const*, ...)+0xea)[0x8b2f38]
sql/mysqld(Cache::fetchPage(Dbb*, int, PageType, LockType)+0x2c1)[0x895541]
sql/mysqld(Dbb::fetchPage(int, PageType, LockType)+0x24)[0x8aacc6]
sql/mysqld(Dbb::handoffPage(Bdb*, int, PageType, LockType)+0x1f)[0x8ab049]
sql/mysqld(Section::findNextRecord(int, int, Stream*)+0xec)[0x8e7fc4]
sql/mysqld(Section::findNextRecord(int, int, Stream*)+0x2ab)[0x8e8183]
sql/mysqld(Section::findNextRecord(int, Stream*)+0x16)[0x8e81ec]
sql/mysqld(Dbb::findNextRecord(Section*, int, Stream*)+0x11)[0x8aa551]
sql/mysqld(TableSpaceManager::bootstrap(int)+0x30a)[0x882d7c]
sql/mysqld(Database::openDatabase(char const*)+0x25a)[0x8a7016]
sql/mysqld(Connection::getDatabase(char const*, char const*, Threads*)+0x109)[0x89ce2d]
sql/mysqld(Connection::openDatabase(char const*, char const*, char const*, char const*, char const*, Threads*)+0x120)[0x89e3e2]
sql/mysqld(StorageDatabase::getOpenConnection()+0x63)[0x86daeb]
sql/mysqld(StorageHandler::initialize()+0x83)[0x86f8b5]
sql/mysqld(StorageInterface::falcon_init(void*)+0x22a)[0x867e9c]
sql/mysqld(ha_initialize_handlerton(st_plugin_int*)+0x7f)[0x7624e6]
sql/mysqld[0x7f8bb7]
sql/mysqld(plugin_init(int*, char**, int)+0x88b)[0x7fb01c]
sql/mysqld[0x6716bc]
sql/mysqld(main+0x35a)[0x671f83]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7f32e0c311a6]
sql/mysqld(__gxx_personality_v0+0x271)[0x5afb99]
[Falcon] Error: page 103/0 wrong page type, expected 7 got 2
[28 Sep 2008 9:10] Philip Stoev
This bug happens frequently in PushBuild2. I am glad that a very simple test case is also able to demonstrate it.
[28 Sep 2008 9:15] Valeriy Kravchuk
Can not repeat this with 6.0.8 built from mysql-6.0 bzr tree today (debug binaries).
[28 Sep 2008 9:24] Hakan Küçükyılmaz
I started mysqld like this:

MYSQLD_CONFIG_BASE="--no-defaults \
  --socket=/tmp/mysql.sock"
MYSQLD_CONFIG_GENERAL="$MYSQLD_CONFIG_BASE \
  --basedir=/home/hakan/work/mysql/mysql-6.0-falcon-team \
  --datadir=/tmp/foo \
  --tmpdir=/tmp \
  --language=/home/hakan/work/mysql/mysql-6.0-falcon-team/sql/share/english \
  --skip-grant-tables \
  --query_cache_size=0 \
  --query_cache_type=0 \
  --skip-innodb \
  --log-error"

sql/mysqld $MYSQLD_CONFIG_GENERAL
[1 Oct 2008 1:34] MySQL Verification Team
I couldn't repeat too on Windows with several attempts:

c:\dbs>c:\dbs\6.0\bin\mysqld --defaults-file=c:\dbs\6.0\my.ini --standalone --console
080930 22:32:24  InnoDB: Started; log sequence number 0 46409
080930 22:32:24 [Note] Recovering after a crash using server60
080930 22:32:24 [Note] Starting crash recovery...
080930 22:32:24 [Note] Crash recovery finished.
080930 22:32:24 [Note] Event Scheduler: Loaded 0 events
080930 22:32:24 [Note] c:\dbs\6.0\bin\mysqld: ready for connections.
Version: '6.0.8-alpha-nt-debug-log'  socket: ''  port: 3600  Source distribution
[1 Oct 2008 5:35] Hakan Küçükyılmaz
Just tried on Mac OS X/Intel with latest
   - mysql-6.0-falcon-team tree, and
   - BUILD/compile-pentium-debug-max-no-ndb

CREATE SCHEMA test; USE SCHEMA test; CREATE TABLE t1 (a int) Engine Falcon; INSERT INTO t1 VALUES (2), (3), (5);

kill -9 <pid_of_mysqld>

restart mysqld

Program received signal SIGABRT, Aborted.
0x9003d66c in kill ()
(gdb) bt
#0  0x9003d66c in kill ()
#1  0x9010e8cf in raise ()
#2  0x00351ca0 in Error::debugBreak () at Error.cpp:94
#3  0x00351d2e in Error::error (string=0x70819c "assertion (%s) failed at line %d in file %s\n") at Error.cpp:71
#4  0x00351de5 in Error::assertionFailed (text=0x7139d0 "bdb", fileName=0x713a10 "IndexRootPage.cpp", line=951) at Error.cpp:78
#5  0x003cc7cf in IndexRootPage::indexMerge (dbb=0x8808760, indexId=31, logRecord=0xbfffe624, transId=0) at IndexRootPage.cpp:951
#6  0x0040d390 in SRLUpdateIndex::execute (this=0xbfffe624) at SRLUpdateIndex.cpp:179
#7  0x0040d471 in SRLUpdateIndex::redo (this=0xbfffe624) at SRLUpdateIndex.cpp:149
#8  0x0041edd1 in SerialLog::recover (this=0x6e716b0) at SerialLog.cpp:363
#9  0x003a6a59 in Database::openDatabase (this=0x6d05670, filename=0xbfffe9bc "/private/tmp/foo/falcon_master.fts") at Database.cpp:729
#10 0x0039affd in Connection::getDatabase (this=0x8808540, dbName=0x8808254 "FALCON_MASTER", dbFileName=0xbfffe9bc "/private/tmp/foo/falcon_master.fts", threads=0x88082b0) at Connection.cpp:1651
#11 0x0039b1aa in Connection::openDatabase (this=0x8808540, dbName=0x8808254 "FALCON_MASTER", filename=0x8808284 "falcon_master.fts", account=0x6ca7e0 "mysql", password=0x6ca7e0 "mysql", address=0x0, parent=0x88082b0) at Connection.cpp:934
#12 0x0035c350 in StorageDatabase::getOpenConnection (this=0x8808128) at StorageDatabase.cpp:136
#13 0x0035f2d1 in StorageHandler::initialize (this=0x6d05020) at StorageHandler.cpp:995
#14 0x003460f2 in StorageInterface::falcon_init (p=0x6c02838) at ha_falcon.cpp:237
#15 0x001f1183 in ha_initialize_handlerton (plugin=0x8019668) at handler.cc:427
#16 0x002a860e in plugin_initialize (plugin=0x8019668) at sql_plugin.cc:1011
#17 0x002af168 in plugin_init (argc=0x9dbfbc, argv=0x6c009a8, flags=2) at sql_plugin.cc:1218
#18 0x000c34a8 in init_server_components () at mysqld.cc:4088
#19 0x000c86bd in main (argc=12, argv=0xbffff190) at mysqld.cc:4570
[1 Oct 2008 21:04] Philip Stoev
----- Original Message ----- 
From: "Jim Starkey" <jstarkey@nimbusdb.com>
To: "Falcon" <falcon-private@mysql.com>
Sent: Wednesday, October 01, 2008 9:58 PM
Subject: Why Falcon Doesn't Work Any More

> The problem is simple and straight forward.  Cache.cpp is totally 
> broken.  More specifically, the I/O thread (and architecture) is totally 
> broken.
> 
> As I left it, Cache::ioThread was like this:
> 
>        for (;;)
>            {
>            int32 pageNumber = flushBitmap->nextSet(0);
>            int count;
>            Dbb *dbb;
>           
>            if (pageNumber >= 0)
>                {
>                int    slot = pageNumber % hashSize;
>                bool hit = false;
>                Bdb *bdbList = NULL;
>                UCHAR *p = buffer;
>                sync.lock(Shared);
>               
>                // Look for a page to flush.  Then get all his friends
>               
>                for (Bdb *bdb = hashTable[slot]; bdb; bdb = bdb->hash)
>                    if (bdb->pageNumber == pageNumber && bdb->flushIt &&
>    bdb->isDirty)
> 
> The simple explanation is this.  When a checkpoint was required, a dirty 
> page bitmap was produced.  The bitmap, however, didn't contain the table 
> space id, so ioThread had to loop through the collisions in the hash 
> table to find the dirty buffer(s).
> 
> The code is now:
> 
>        for (;;)
>            {
>            int32 pageNumber = flushBitmap->nextSet(0);
>            int count;
> 
>            if (pageNumber >= 0)
>                {
>                Bdb *bdb;
>                Dbb *dbb;
>                int    slot = PAGENUM_2_SLOT(pageNumber);
>                bool hit = false;
>                Bdb *bdbList = NULL;
>                UCHAR *p = buffer;
> 
>                // Look for the page to flush.
>                bdb = lockFindBdbIncrementUseCount(pageNumber, slot);
>                if (bdb && bdb->flushIt && bdb->isDirty)
> 
> This only handles the first Bdb in a collision chain.  In almost all 
> circumstances, this will be the falcon_user tablespace.  Consequently, 
> dirty pages in falcon_master are never written.
> 
> There is no doubt in my mind that the dirty buffer that Kelly saw at the 
> end of bdb aging chain was the direct result of this problem.
[2 Oct 2008 22:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55152

2848 Christopher Powers	2008-10-02
      Bug#39706 "Falcon recovery assertion in Cache::fetchPage for simple scenarios"
      
      Reverted Cache.* to the revision preceding the 2008-09-02 updates for WL#4479, 4480 and 4481
[2 Oct 2008 22:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55153

2848 Christopher Powers	2008-10-02
      Bug#39706 "Falcon recovery assertion in Cache::fetchPage for simple scenarios"
      
      Reverted Cache.* to the revision preceding the 2008-09-02 updates for WL#4479, 4480 and 4481
[2 Oct 2008 23:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55156

2849 Christopher Powers	2008-10-02
      Bug#39706, "Falcon recovery assertion in Cache::fetchPage for simple scenarios"
      
      Reapply fix from Bug#39692, "Falcon allocates SectorCache even if falcon_use_sectorcache is OFF"
[9 Jan 2009 13:59] MC Brown
A note has been added to the 6.0.8 changelog: 

Recovery of Falcon tables could crash because of an invalid or unrecognised tablespace ID.