MySQL Bugs: #42560: Falcon recovery error: page 411048/1 wrong page type, expected 5 got 7

Bug #42560	Falcon recovery error: page 411048/1 wrong page type, expected 5 got 7
Submitted:	2 Feb 2009 21:58	Modified:	15 May 2009 15:56
Reporter:	Olav Sandstå	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S2 (Serious)
Version:	6.0.10-alpha	OS:	Any
Assigned to:	Vladislav Vaintroub	CPU Architecture:	Any
Tags:	F_RECOVERY

Description:
During recovery of a fairly large table containing about 2 million records, Falcon hits the following assert after about 10-15 minutes:

New table space FALCON_USER, id 1, type 0, filename falcon_user.fts
New table space FALCON_TEMPORARY, id 2, type 0, filename falcon_temporary.fts
[Falcon] Error: page 411048/1 wrong page type, expected 5 got 7

Call stack from the core file:

  [12] Error::error(string = 0x15c913f "page %d/%d wrong page type, expected %d got %d\n", ...), line 71 in "Error.cpp"
  [13] Cache::fetchPage(this = 0x1eb5620, dbb = 0x1eb9b20, pageNumber = 411048, pageType = PAGE_btree, lockType = Exclusive), line 322 in "Cache.cpp"
  [14] Dbb::fetchPage(this = 0x1eb9b20, pageNumber = 411048, pageType = PAGE_btree, lockType = Exclusive), line 226 in "Dbb.cpp"
  [15] Dbb::handoffPage(this = 0x1eb9b20, bdb = 0x4faab28, pageNumber = 411048, pageType = PAGE_btree, lockType = Exclusive), line 275 in "Dbb.cpp"
  [16] IndexRootPage::findInsertionLeaf(dbb = 0x1eb9b20, indexId = 1, indexKey = 0xfffffd7fffdf0500, recordNumber = 1500237, transId = 0), line 311 in "IndexRootPage.cpp"
  [17] IndexRootPage::addIndexEntry(dbb = 0x1eb9b20, indexId = 1, key = 0xfffffd7fffdfaf90, recordNumber = 1500237, transId = 0), line 133 in "IndexRootPage.cpp"
  [18] IndexRootPage::indexMerge(dbb = 0x1eb9b20, indexId = 1, logRecord = 0xfffffd7fffdfe360, transId = 0), line 1009 in "IndexRootPage.cpp"
  [19] SRLUpdateIndex::execute(this = 0xfffffd7fffdfe360), line 217 in "SRLUpdateIndex.cpp"
  [20] SRLUpdateIndex::redo(this = 0xfffffd7fffdfe360), line 187 in "SRLUpdateIndex.cpp"
  [21] SerialLog::recover(this = 0x1e250f0), line 403 in "SerialLog.cpp"
  [22] Database::openDatabase(this = 0x1c801b8, filename = 0xfffffd7fffdfe8dc "/export/home/tmp/mysqldb/falcon_master.fts"), line 751 in "Database.cpp"
  [23] Connection::getDatabase(this = 0x1eb4d90, dbName = 0x1eb4a04 "FALCON_MASTER", dbFileName = 0xfffffd7fffdfe8dc "/export/home/tmp/mysqldb/falcon_master.fts", threads = 0x1eb4a70), line 1650 in "Connection.cpp"
  [24] Connection::openDatabase(this = 0x1eb4d90, dbName = 0x1eb4a04 "FALCON_MASTER", filename = 0x1eb4a3c "falcon_master.fts", account = 0x15c2f4f "mysql", password = 0x15c2f55 "mysql", address = (nil), parent = 0x1eb4a70), line 933 in "Connection.cpp"
  [25] StorageDatabase::getOpenConnection(this = 0x1eb4890), line 136 in "StorageDatabase.cpp"
  [26] StorageHandler::initialize(this = 0x1c7f668), line 987 in "StorageHandler.cpp"
  [27] StorageInterface::falcon_init(p = 0x1c012d0), line 245 in "ha_falcon.cpp"
  [28] ha_initialize_handlerton(plugin = 0x21b2e58), line 441 in "handler.cc"
  [29] plugin_initialize(plugin = 0x21b2e58), line 1007 in "sql_plugin.cc"
  [30] plugin_init(argc = 0x18eed58, argv = 0x1c00388, flags = 0), line 1216 in "sql_plugin.cc"
  [31] init_server_components(), line 4125 in "mysqld.cc"
  [32] main(argc = 12, argv = 0xfffffd7fffdff3a8), line 4636 in "mysqld.cc"

How to repeat:
I have seen this crash only once. What I did at that time was to:

1. Start Falcon. I used the following configuration for Falcon:

--falcon_page_cache_size=500M --falcon_record_memory_max=1000M --falcon_serial_log_dir=/export/home2/tmp/mysql-log/falcon --falcon-debug-mask=2

2. Run a Java client that do the following:

a: Creates a simple table like this:

    CREATE TABLE TESTDATA (priKey int NOT NULL,data VARCHAR(1000), PRIMARY KEY(priKey)) ENGINE=Falcon

b. Inserts records of about 1000 byte into this table.

c. Commits once for every 100.000 inserted records.

3. After about 2.000.000 records was inserted I did a:

    kill -9 ....

4. Falcon get restarted by mysqld_safe

5. Wait for about 10-15 minutes until the assert was hit.

Suggested fix:
Recovery should not fail (and if it fails it better do it immediately instead of waiting for 15 minutes :-) )

Java client that was used for generating load on the database

Attachment: LoadData.java (text/x-java), 4.49 KiB.

c:\dbs>c:\dbs\6.0\bin\mysqld --defaults-file=c:\dbs\6.0\my.ini --standalone --console
SSL error: Unable to get certificate from 'C:/dbs/5.0/server-cert.pem'
090202 21:31:37 [Warning] Failed to setup SSL
090202 21:31:38  InnoDB: Started; log sequence number 0 48155
[Falcon] Error: assertion (!(serialLog->recovering && (serialLog->recoveryPhase ==2))) failed at line 86 in file .\Pa

090202 21:32:55 - mysqld got exception 0x80000003 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=0
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338230 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
00000001407AC43A    mysqld.exe!Error::debugBreak()[error.cpp:92]
00000001407AC308    mysqld.exe!Error::error()[error.cpp:73]
00000001407AC3F8    mysqld.exe!Error::assertionFailed()[error.cpp:79]
00000001408B1EBB    mysqld.exe!PageInventoryPage::allocPage()[pageinventorypage.cpp:86]
000000014084497F    mysqld.exe!Dbb::allocPage()[dbb.cpp:249]
000000014084D9F4    mysqld.exe!Section::getSectionPage()[section.cpp:273]
0000000140851CEB    mysqld.exe!Section::getSectionPage()[section.cpp:1416]
00000001408516A8    mysqld.exe!Section::redoRecordLocatorPage()[section.cpp:1278]
0000000140847B0D    mysqld.exe!Dbb::redoRecordLocatorPage()[dbb.cpp:1233]
00000001408BE4CB    mysqld.exe!SRLRecordLocator::pass2()[srlrecordlocator.cpp:87]
0000000140875BD1    mysqld.exe!SerialLog::recover()[seriallog.cpp:351]
00000001407ED2F2    mysqld.exe!Database::openDatabase()[database.cpp:733]
00000001407CC20B    mysqld.exe!Connection::getDatabase()[connection.cpp:1652]
00000001407C9624    mysqld.exe!Connection::openDatabase()[connection.cpp:934]
00000001407C1739    mysqld.exe!StorageDatabase::getOpenConnection()[storagedatabase.cpp:137]
00000001407AAFCE    mysqld.exe!StorageHandler::initialize()[storagehandler.cpp:986]
0000000140799A7B    mysqld.exe!StorageInterface::falcon_init()[ha_falcon.cpp:245]
00000001400805FF    mysqld.exe!ha_initialize_handlerton()[handler.cc:450]
0000000140339CAC    mysqld.exe!plugin_initialize()[sql_plugin.cc:1007]
0000000140339ACD    mysqld.exe!plugin_init()[sql_plugin.cc:1216]
00000001401CC8FB    mysqld.exe!init_server_components()[mysqld.cc:4130]
00000001401CA549    mysqld.exe!win_main()[mysqld.cc:4639]
00000001401CD624    mysqld.exe!mysql_service()[mysqld.cc:4803]
00000001401CDA8E    mysqld.exe!main()[mysqld.cc:4976]
00000001409DDF9C    mysqld.exe!__tmainCRTStartup()[crt0.c:327]
00000001409DDD6E    mysqld.exe!mainCRTStartup()[crt0.c:196]
000000007763495D    kernel32.dll!BaseThreadInitThunk()
0000000077768791    ntdll.dll!RtlUserThreadStart()
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Thank you for the bug report.

To reproduce, clone from the mysql-test-extra-6.0 tree and then

$ cd mysql-test-extra-6.0/test-extra/gentest
$ perl pb2gentest.pl /build/bzr/6.0-falcon-team /tmp/varidr aa falcon_ddl

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65789

3010 Vladislav Vaintroub	2009-02-10
      Bug #42560 Falcon recovery error:  wrong page type, 
      expected 5 got <something>
      
      Problem:
      during index creation root page number is not logged. 
      The page is not recreated on recovery which may result 
      in corrupted index.
      
      
      Fix:
      - Log index root page number and recreate index root
      page in recovery of SRLCreateIndex. Do it in phase 2
      of recovery, instead of "redo" phase.
      (phase2 is responsible for recreating "physical" objects
      , e.g pages)
      
      
      - Log index root page number when creating index and 
      recreate this page on recovery. Refactor code so 
      createIndexRoot can be called inside and outside recovery.
      The code is more compact and better readable.
      
      -  Simplify redoIndexPage.
      Do not care about left and right neighbour pages.
      they are logged separately during splits, and there is no 
      need for careful trialReads.
      
      - Cleanup: 
       remove misleadingly named  IndexPage::createNewLevel(), 
         substitute with initRootPage().
      
       remove setIndexRoot(), its single usage was within 
       redoDeleteIndex()

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65810

3010 Vladislav Vaintroub	2009-02-10
      Bug #42560 Falcon recovery error:  wrong page type, 
      expected 5 got <something>
      
      Problem:
      during index creation root page number is not logged. 
      The page is not recreated on recovery which may result 
      in corrupted index.
      
      
      Fix:
      - Log index root page number and recreate index root
      page in recovery of SRLCreateIndex. Do it in phase 2
      of recovery, instead of "redo" phase.
      (phase2 is responsible for recreating "physical" objects
      , e.g pages)
      
      
      - Log index root page number when creating index and 
      recreate this page on recovery. Refactor code so 
      createIndexRoot can be called inside and outside recovery.
      The code is more compact and better readable.
      
      -  Simplify redoIndexPage.
      Do not care about left and right neighbour pages.
      they are logged separately during splits, and there is no 
      need for careful trialReads.
      
      - Cleanup: 
       remove misleadingly named  IndexPage::createNewLevel(), 
         substitute with initRootPage().
      
       remove setIndexRoot(), its single usage was within 
       redoDeleteIndex()

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65812

3010 Vladislav Vaintroub	2009-02-10
      Bug #42560 Falcon recovery error:  wrong page type, 
      expected 5 got <something>
      
      Problem:
      during index creation root page number is not logged. 
      The page is not recreated on recovery which may result 
      in corrupted index.
      
      
      Fix:
      - Log index root page number and recreate index root
      page in recovery of SRLCreateIndex. Do it in phase 2
      of recovery, instead of "redo" phase.
      (phase2 is responsible for recreating "physical" objects
      , e.g pages)
      
      
      - Log index root page number when creating index and 
      recreate this page on recovery. Refactor code so 
      createIndexRoot can be called inside and outside recovery.
      The code is more compact and better readable.
      
      -  Simplify redoIndexPage.
      Do not care about left and right neighbour pages.
      they are logged separately during splits, and there is no 
      need for careful trialReads.
      
      - Cleanup: 
       remove misleadingly named  IndexPage::createNewLevel(), 
         substitute with initRootPage().
      
       remove setIndexRoot(), its single usage was within 
       redoDeleteIndex()

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65824

3010 Vladislav Vaintroub	2009-02-10
      Bug #42560 Falcon recovery error:  wrong page type, 
      expected 5 got <something>
      
      Problem:
      during index creation root page number is not logged. 
      The page is not recreated on recovery which may result 
      in corrupted index.
      
      
      Fix:
      - Log index root page number and recreate index root
      page in recovery of SRLCreateIndex. Do it in phase 2
      of recovery, instead of "redo" phase.
      (phase2 is responsible for recreating "physical" objects
      , e.g pages)
      
      - Also, SRLDeleteIndex will be executed in phase 2 of recovery
      
      - Log index root page number when creating index and 
      recreate this page on recovery. Refactor code so 
      createIndexRoot can be called inside and outside recovery.
      The code is more compact and better readable.
      
      -  Simplify redoIndexPage.
      Do not care about left and right neighbour pages.
      they are logged separately during splits, and there is no 
      need for careful trialReads.
      
      - Cleanup: 
       remove misleadingly named  IndexPage::createNewLevel(), 
         substitute with initRootPage().
      
       remove setIndexRoot(), its single usage was within 
       redoDeleteIndex()

Change looks good to me.  

In addition, Jim Starkey made these comments via email;
-------------------------------
While I heartedly applaud refactoring, combining refactoring with semantic change is rather difficult to follow.  It would be easier to follow if the refactor were pushed as a separate change.

That aside, it looks fine.  Including the root index page number in SRLCreateIndex is completely sound (I wonder why I didn't do it in the first place?), as is always recreating the root page during redo of SRLCreateIndex.  I also agree that this requires moving the re-creation from phase 3 to phase 2 of recovery.  And I'm complete OK with the refactoring.

As for the details, it's hard to follow given the size of change.  But it's theoretically sound and I have no doubt that Vlad will test it and fix any cases that might have slipped through the cracks. 
-------------------------------

Pushed into 6.0.11-alpha (revid:alik@sun.com-20090302140208-lfdejjbcyezlhhjt) (version source revid:vvaintroub@mysql.com-20090210225004-d09yy9inre0laepg) (merge vers: 6.0.10-alpha) (pib:6)

A note has been added to the 6.0.11 changelog: 

Recovery of a Falcon table with a large number of rows can cause a failure in the page type written for the internal FALCON_USER and FALCON_TEMPORARY tablespaces.