Bug #42745 Exception: can't find table space during recovery
Submitted: 11 Feb 2009 0:41 Modified: 15 May 2009 15:59
Reporter: Vladislav Vaintroub Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Vladislav Vaintroub CPU Architecture:Any
Tags: F_RECOVERY
Triage: Triaged: D1 (Critical)

[11 Feb 2009 0:41] Vladislav Vaintroub
Description:
Recovery in of random query generator test falcon_ddl that heavily uses tablespaces crashes with

Exception: can't find table space during recovery

Callstack:

>	mysqld.exe!_CxxThrowException(void * pExceptionObject=0x000000000012dcf8, const _s__ThrowInfo * pThrowInfo=0x000000014001be18)  Line 112	C++
 	mysqld.exe!TableSpaceManager::getTableSpace(int id=445)  Line 253	C++
 	mysqld.exe!SerialLog::getDbb(int tableSpaceId=445)  Line 1533 + 0x15 bytes	C++
 	mysqld.exe!SerialLog::bumpPageIncarnation(int pageNumber=2, int tableSpaceId=445, int state=1)  Line 1208 + 0xe bytes	C++
 	mysqld.exe!SRLSectionPage::pass2()  Line 109 + 0x25 bytes	C++
 	mysqld.exe!SerialLog::recover()  Line 370 + 0x15 bytes	C++
 	mysqld.exe!Database::openDatabase(const char * filename=0x000000000012ece0)  Line 753 + 0x14 bytes	C++
 	mysqld.exe!Connection::getDatabase(const char * dbName=0x00000000029603ac, const char * dbFileName=0x000000000012ece0, Threads * threads=0x0000000002960418)  Line 1651	C++
 	mysqld.exe!Connection::openDatabase(const char * dbName=0x00000000029603ac, const char * filename=0x00000000029603e4, const char * account=0x000000013fec6004, const char * password=0x000000013fec5ffc, const char * address=0x0000000000000000, Threads * parent=0x0000000002960418)  Line 933 + 0x22 bytes	C++
 	mysqld.exe!StorageDatabase::getOpenConnection()  Line 137	C++
 	mysqld.exe!StorageHandler::initialize()  Line 987 + 0x14 bytes	C++
 	mysqld.exe!StorageInterface::falcon_init(void * p=0x00000000023c7e90)  Line 257 + 0xc bytes	C++
 	mysqld.exe!ha_initialize_handlerton(st_plugin_int * plugin=0x00000000023c6170)  Line 450 + 0x14 bytes	C++
 	mysqld.exe!plugin_initialize(st_plugin_int * plugin=0x00000000023c6170)  Line 1008 + 0x1b bytes	C++
 	mysqld.exe!plugin_init(int * argc=0x0000000140263888, char * * argv=0x0000000001e77640, int flags=0)  Line 1217 + 0xc bytes	C++
 	mysqld.exe!init_server_components()  Line 4134 + 0x6b bytes	C++
 	mysqld.exe!win_main(int argc=26, char * * argv=0x0000000001e73bf0)  Line 4643 + 0x5 bytes	C++
 	mysqld.exe!mysql_service(void * p=0x0000000000000000)  Line 4807	C++
 	mysqld.exe!main(int argc=26, char * * argv=0x0000000001e73bf0)  Line 4980	C++
 	mysqld.exe!__tmainCRTStartup()  Line 266 + 0x19 bytes	C
 	mysqld.exe!mainCRTStartup()  Line 182	C
 	kernel32.dll!BaseThreadInitThunk()  + 0xd bytes	
 	ntdll.dll!RtlUserThreadStart()  + 0x21 bytea

The problem is that Falcon can have "orphaned" tablespaces. 
The information about is read in TableSpace::bootstrap() at startup and is taken from system tables. If database crashes, pages are not up-to-date and they can be missing recently created tablespaces.

Yet, the serial log can still have entries with tablespaceids that are not found during bootstrap. IF such an entry is found, recovery would crash.

How to repeat:
run falcon_ddl long enough to get it..

Suggested fix:
I'm not sure whether it is the best solution, but it could make sense to 
handle lost tablespaces as if they were dropped ,e.g skip them during phase 2 and 3 of recovery.

Generally, I think the idea to store tablespace info in system tables is not a right idea. The whole recovery depends on this info and current handling of reading an outdated/inconsistent database pages does not sound enouraging.
Yet, I believe we cannot change this now, as it is too late, so we need to provide a hack to workaround possible problems.
[11 Feb 2009 0:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65829

3011 Vladislav Vaintroub	2009-02-11
      Bug#42745 : Exception: can't find table space during recovery
      In rare circumstances it can happen that information about newly created tablespace that is stored in falcon system tables  is not yet flushed to the disk. Recovery that processes a log record that reference such tablespace, will fail.
      
      Solution:
      Skip  all updates to such tablespaces. This will make database consistent after recovery even if some tables creating right before crash will be missing.
      
      
      This patch also includes fix for Bug#42743 Falcon fails to recover;Test tablespace file is not open when doing a fetchPage
[13 Feb 2009 19:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66295

3018 Vladislav Vaintroub	2009-02-13
      Bug#42745: TableSpace not found during recovery.
      
      The reason for the error is that tablespace was not recreated in recovery, even if enough information in serial log was available to do that.
      
      What happens:
      - tablespace info is not on disk (in falcon system tables) at the start of recovery
      - SRLCreateTableSpace  ld is found before the checkpoint record  in serial log and is ignored.
      
      Fix: 
      remove post-flush check from SRLCreateTableSpace::redo(), and recreate tablespace if it is not already present.
      modified:
        storage/falcon/SRLCreateTableSpace.cpp
        storage/falcon/SerialLog.cpp

=== modified file 'storage/falcon/SRLCreateTableSpace.cpp'
--- a/storage/falcon/SRLCreateTableSpace.cpp	2009-01-28 23:57:54 +0000
+++ b/storage/falcon/SRLCreateTableSpace.cpp	2009-02-13 19:45:32 +0000
@@ -90,12 +90,12 @@ void SRLCreateTableSpace::pass1()
 
 void SRLCreateTableSpace::pass2()
 {
-	if (control->isPostFlush())
-		{
-		TableSpaceInit tsInit;
-		tsInit.comment		= comment;
-		log->database->tableSpaceManager->redoCreateTableSpace(tableSpaceId, nameLength, name, filenameLength, filename, type, &tsInit);
-		}
+	if (log->database->tableSpaceManager->findTableSpace(tableSpaceId))
+		return;
+
+	TableSpaceInit tsInit;
+	tsInit.comment		= comment;
+	log->database->tableSpaceManager->redoCreateTableSpace(tableSpaceId, nameLength, name, filenameLength, filename, type, &tsInit);
 }
 
 void SRLCreateTableSpace::commit()

=== modified file 'storage/falcon/SerialLog.cpp'
--- a/storage/falcon/SerialLog.cpp	2009-02-11 00:54:25 +0000
+++ b/storage/falcon/SerialLog.cpp	2009-02-13 19:45:32 +0000
@@ -365,22 +365,7 @@ void SerialLog::recover()
 			Log::log("Processed: %8ld\n", recordCount);
 			
 		if (!isTableSpaceDropped(record->tableSpaceId) || record->type == srlDropTableSpace)
-			try 
-				{
-				record->pass2();
-				}
-			catch(SQLException &e)
-				{
-				// We can have missing tablespaces at this stage.
-				//(missing in the system table at the time of crash
-				// and not found by bootstrap). Handle them as dropped
-				// until someone comes up with a better idea
-				if (e.getSqlcode() == TABLESPACE_NOT_EXIST_ERROR)
-					{
-					Log::log("Cannot find tablespace %d",record->tableSpaceId);
-					setTableSpaceDropped(record->tableSpaceId);
-					}
-				}
+			record->pass2();
 		}
 
 	Log::log("Processed: %8ld\n", recordCount);

-- 
MySQL Code Commits Mailing List
For list archives: http://lists.mysql.com/commits
To unsubscribe:    http://lists.mysql.com/commits?unsub=commits@bugs.mysql.com
[18 Feb 2009 17:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66783

3026 Vladislav Vaintroub	2009-02-18
      Bug #42745 Exception: can't find table space during recovery
      Bug #41837 Falcon recovery error: page 102/0 wrong page type, 
      expected 7 got 0 
      
      
      Problem: Falcon internal table system.tablespaces can be inconsistent 
      when mysqld has crashed or was killed.
      
      This table (possibly inconsistent) was previously read on Falcon startup
      in TableSpaceManager::bootstrap(), to provide recovery with enough 
      information to associate tablespace IDs in log records with actual 
      tablespace files.
      
      But since the table is possibly outdated information, it was still possible
      to have lost tablespaceid without any correspond file. Recovery will stop 
      then.
      
      Another problem with TableSpace::bootstrap() is attempt to read behind the 
      end of falcon_master.fts. Typically it would be a page present referenced 
      in a section page, and referenced page was not yet flushed to disk at the 
      moment of crash(Bug#41837)
      
      Solution:
      Avoid reading system.tablespaces from disk whenever possible. Instead, 
      when a new tablespace is added or some tablespace is deleted, current state
      (listof all existing tablespaces) is written to serial log in 
      SRLTableSpaces record. If recovery finds SRLTableSpaces during the first 
      pass, last SRLTableSpaces record is used to reconstruct the before-crash 
      state.
      
      If recovery does not find SRLTableSpaces, it will still read system.table
      spaces with TableSpaceManager::bootstrap() as before. However reading from
      disk is safe, because tablespaces were not modified since the last
      checkpoint.
      added:
        storage/falcon/SRLTableSpaces.cpp
        storage/falcon/SRLTableSpaces.h
      modified:
        storage/falcon/CMakeLists.txt
        storage/falcon/Database.cpp
        storage/falcon/Makefile.am
        storage/falcon/SRLVersion.h
        storage/falcon/SerialLog.cpp
        storage/falcon/SerialLogControl.cpp
        storage/falcon/SerialLogControl.h
        storage/falcon/SerialLogRecord.h
        storage/falcon/TableSpaceManager.cpp
        storage/falcon/TableSpaceManager.h
[18 Feb 2009 22:22] Kevin Lewis
Code reviewed by Kevin Lewis and Jim Starkey
[2 Mar 2009 14:12] Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090302140208-lfdejjbcyezlhhjt) (version source revid:vvaintroub@mysql.com-20090218174140-unfccjescdawur5g) (merge vers: 6.0.10-alpha) (pib:6)
[15 May 2009 15:59] MC Brown
An entry has been added to the 6.0.11 changelog: 

Recovery of Falcon tables could fail with an indicating that a wrong page type was identified in the Falcon serial log.