MySQL Bugs: #36396: Assertion in IO::pread (on closed tablespace file)

Bug #36396	Assertion in IO::pread (on closed tablespace file)
Submitted:	29 Apr 2008 13:02	Modified:	9 Oct 2008 12:06
Reporter:	Vladislav Vaintroub	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S3 (Non-critical)
Version:		OS:	Any
Assigned to:	Vladislav Vaintroub	CPU Architecture:	Any

Description:
During a test involving tablespaces, got assertion on IO::longSeek on the
tablespace file. Debugging shows, that here 2 gopher threads are involved, one doing dropTableSpace/closeFile and another deleteIndex/pread on the same file.

I managed to get stack snapshots of both threads right after dropTablespace closes the file,  while other thread is doing seek on that file

Thread 1:

mysqld.exe!IO::closeFile()  Line 343 /*fileId == 32*/
mysqld.exe!Dbb::close()  Line 1137
mysqld.exe!Dbb::dropDatabase()  Line 1216
mysqld.exe!TableSpace::dropTableSpace()  Line 148
mysqld.exe!TableSpaceManager::expungeTableSpace(int tableSpaceId=3)  Line 388
mysqld.exe!SRLDropTableSpace::commit()  Line 73	
mysqld.exe!SerialLogTransaction::commit()  Line 93
mysqld.exe!SerialLogTransaction::doAction()  Line 158
mysqld.exe!Gopher::gopherThread()  Line 71

Thread 2:

 mysqld.exe!_errno() 
 mysqld.exe!_lseeki64(int fh=32, __int64 pos=16117760, int mthd=0)  Line 87
 mysqld.exe!IO::longSeek(__int64 offset=16117760)  Line 360 
 mysqld.exe!IO::pread(__int64 offset=16117760, int length=4096, unsigned char * buffer=0x0000000004ea7000)
 mysqld.exe!IO::readPage(Bdb * bdb=0x00000000048db978)  Line 238  
 mysqld.exe!Cache::fetchPage(Dbb * dbb=0x0000000004ad89e0, int pageNumber=3935, PageType pageType=PAGE_any, LockType lockType=Exclusive)  Line 261
 mysqld.exe!Dbb::fetchPage(int pageNumber=3935, PageType pageType=PAGE_any, LockType lockType=Exclusive)  Line 220
 mysqld.exe!IndexRootPage::deleteIndex(Dbb * dbb=0x0000000004ad89e0, int indexId=0, unsigned int transId=95)  Line 700 
 mysqld.exe!SRLDeleteIndex::commit()  Line 123
 mysqld.exe!SerialLogTransaction::commit()
 mysqld.exe!SerialLogTransaction::doAction() 
 mysqld.exe!Gopher::gopherThread() 

How to repeat:

I use falcon_bug_36294 test case for it.

Not sure it is simply repeatable, as get it crashing on this place only when running test case under debugger like this:

1. C:\bk\mysql-6.0-falcon-team\mysql-test> start perl mysql-test-run.pl --start-and-exit

2. Attach the debugger to mysqld process (optionally set breakpoint on IO::closeFile)

3. C:\bk\mysql-6.0-falcon-team\mysql-test>..\client\debug\mysqltest.exe -uroot --port=9306 test< suite\falcon_team\t\falcon_bug_36294.test

Suggested fix:
Avoid the race?

Vlad, Jim suggests that a RecoveryObjects *recoveryTablespace; is needed in SerialLog class.  But there still may be race condition.

After discussion, we decided to use a syncGopher SyncObject so that only one gopher thread will do a dropTablespace at a time.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47582

2685 Vladislav Vaintroub	2008-06-09
      Bug#36396 - Assertion in IO::pread (on closed tablespace file)
      The problem was that gopher  processing "drop tablespace"
      was faster then the gopher procesing preceeding "drop table".
      As a result, tablespace file was closed and removed, while
      "DROP TABLE" still tried to access it.
      
      Solution is to synchronize gopher threads in this case, i.e
      all while transaction containing DROP TABLESPACE is  being
      processed, all other gophers wait until it completes.
      Additionally, a race condition during  CREATE TABLESPACE has
      been observed and fixed in this patch. It can happen that
      DROP TABLESPACE was completed, but the file is still there
      because commit record was not yet processed by a gopher thread.
      This resulted into infrequent errors on CREATE with "file already
      exists" message.
      
      The solution to this problem is to track number of pending
      DROP TABLESPACE commands.During CREATE TABLESPACE, if the file
      exists and there are pending DROPs, wait for a short time (10 sec)
      in a loop  for file to be deleted.
      
      No testcase is provided with this patch. Had no luck writing
      mysql-test-run script reproducing these race conditions.

> From: Kevin Lewis [mailto:klewis@mysql.com]
> So if I understand it, what used to happen is that there were
> infrequent
> errors on CREATE with "file already exists" message. 

Exactly, we've seen this messages sometimes happen on pushbuild, typically
after DROP TABLESPACE IF EXISTS foo engine=falcon; CREATE TABLESPACE foo
engine=falcon sequence.

> Now, it may get the same message, but only after waiting 10 seconds.
Given the infrequency
> of the problem.  And the fact that the failed create tablespace can be
retried with probably success.  I agree that simple is better.

Yes. Also, polling/waiting  happens only, if there is at least one "drop
tablespace" in progress. It does not always take 10 seconds.The polling
frequency is 1 sec and in all configuration I tried client thread waited for
at most 1 sec .Except for one case, when I deliberately suspended DROR
TABLESPACE gopher using debugger and had the "file already exists" message
in client thread after 10 seconds wait.

Fix is in version 6.0.6

Documented in the 6.0.6 changelog as follows:

        Dropping a Falcon tablespace concurrently with dropping a table using
        that tablespace caused the server to crash.