Bug #38186 falcon_bug_31295 fails on pushbuild
Submitted: 16 Jul 2008 22:04 Modified: 9 Oct 2008 13:02
Reporter: Vladislav Vaintroub Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Vladislav Vaintroub
Tags: pushbuild, test failure
Triage: D2 (Serious)

[16 Jul 2008 22:04] Vladislav Vaintroub
Description:
falcon_bug_31295  fails on pushbuild with "Tablespace already exists".

mysqltest: At line 20: query 'CREATE TABLESPACE ts1 ADD DATAFILE 'ts1.fts' ENGINE $engine' failed: 1718: Tablespace data file 'ts1.fts' already exists

The result from queries just before the failure was:
*** Bug #31295 ***
SET @@storage_engine = 'Falcon';
DROP TABLE IF EXISTS t1;
DROP TABLESPACE ts1 ENGINE 'Falcon';
CREATE TABLESPACE ts1 ADD DATAFILE 'ts1.fts' ENGINE 'Falcon';

How to repeat:
Look on pushbuild
[17 Jul 2008 7:21] John Embretsen
This has happended before in Pushbuild:
2008-06-30 17:10:00	bzr_mysql-6.0-falcon	36 (Kevin_Lewis__klewis-20080630171000-1), 'vm-win2003-32-a' Win32 VS2003 -max-nt.

I have also seen this happen intermittently on a non-Pushbuild Linux host, when running the full falcon suite.
[17 Jul 2008 20:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49982

2752 Vladislav Vaintroub	2008-07-17
      Bug#38186 : CREATE TABLESPACE fails sometimes if called directly after 
      DROP TABLESPACE on the same file
      
      The problem is that tablespace files are not deleted directly on client COMMIT.
      Deletion is done later, by background(Gopher) thread. Thus the file may still 
      exist when "create tablespace" is called.
      
      The solution is to to modify CREATE TABLESPACE and if the file is there and 
      there are pending  DROP operation, wait for some seconds until the file is deleted.
      
      An attempt to solve this problem was done previously, but waiting for deletion 
      was put on the wrong place. Client thread was waiting for gopher in 
      TablespaceManager::dropTablespace, which  can caused  a "soft-deadlock" :
      - Client thread was holding syncSysDDL lock and was waiting for Gopher 
        to delete file
      - Gopher did not run, was stuck in thread->sleep and waited for Scheduler 
        to wakeup
      - Scheduler was doing scavenge and was stuck waiting for syncSysDDL lock 
        in Database:updateCardinalities()
      
      This soft-deadlock was resolved after 10 seconds, client gave up and 
      returned with "tablespace file already exists".
      
      The refinement to this solution is given here . Client waits for 
      file to be deleted in StorageHandler::createTablespace while no Falcon locks 
      are held. Thus the soft-deadlock described is not possible with this solution
[17 Jul 2008 20:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49984

2752 Vladislav Vaintroub	2008-07-17
      Bug#38186 : CREATE TABLESPACE fails sometimes if called directly after 
      DROP TABLESPACE on the same file
      
      The problem is that tablespace files are not deleted directly on client COMMIT.
      Deletion is done later, by background(Gopher) thread. Thus the file may still 
      exist when "create tablespace" is called.
      
      The solution is to to modify CREATE TABLESPACE and if the file is there and 
      there are pending  DROP operation, wait for some seconds until the file is deleted.
      
      An attempt to solve this problem was done previously, but waiting for deletion 
      was put on the wrong place. Client thread was waiting for gopher in 
      TablespaceManager::dropTablespace, which  can caused  a "soft-deadlock" :
      - Client thread was holding syncSysDDL lock and was waiting for Gopher 
        to delete file
      - Gopher did not run, was stuck in thread->sleep and waited for Scheduler 
        to wakeup
      - Scheduler was doing scavenge and was stuck waiting for syncSysDDL lock 
        in Database:updateCardinalities()
      
      This soft-deadlock was resolved after 10 seconds, client gave up and 
      returned with "tablespace file already exists".
      
      The refinement to this solution is given here . Client waits for 
      file to be deleted in StorageHandler::createTablespace while no Falcon locks 
      are held. Thus the soft-deadlock described is not possible with this solution
[7 Aug 2008 18:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51131

2769 Vladislav Vaintroub	2008-08-07
      Bug#38186
      Problem:
      Waiting for  tablespace to be dropped has no effect.
      
      Cause:
      We are checking the the existence of wrong file
      (tableSpaceName)
      
      Solution:
      Change first parameter in waitForPendingDrops
      to be filename, not tablespace name.
[7 Aug 2008 18:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51132

2769 Vladislav Vaintroub	2008-08-07
      Bug#38186
      Problem:
      Waiting for  tablespace to be dropped has no effect.
            
      Cause:
      We are checking the the existence of wrong file
      (tableSpaceName)
            
      Solution:
      Change first parameter in waitForPendingDrops
      to be filename, not tablespace name.
[7 Aug 2008 18:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51133

2769 Vladislav Vaintroub	2008-08-07
      Bug#38186
      Problem:
      Waiting for  tablespace to be dropped has no effect.
            
      Cause:
      We are checking the the existence of wrong file
      (tableSpaceName)
            
      Solution:
      Change first parameter in waitForPendingDrops
      to be filename, not tablespace name.
[8 Aug 2008 11:33] John Embretsen
Thanks, Vlad, the latest patch seems to "fix" the issue way better than the previous attempts!

I tried out the fix on a Linux host where this test has been known to fail almost consistently every second to third time it is run.

In the falcon-team branch, I saw no failures even when running falcon_bug_31295 100 times in a row, nor when running the falcon suite. Without this fix, this particular test failed in almost every run of the falcon suite.

On a side note, it is interesting how every third run of the test is much slower than the other runs, e.g.:

falcon.falcon_bug_31295        [ pass ]           1302
falcon.falcon_bug_31295        [ pass ]            305
falcon.falcon_bug_31295        [ pass ]            304
falcon.falcon_bug_31295        [ pass ]           1296
falcon.falcon_bug_31295        [ pass ]            298
falcon.falcon_bug_31295        [ pass ]            346
falcon.falcon_bug_31295        [ pass ]           1288
falcon.falcon_bug_31295        [ pass ]            286
falcon.falcon_bug_31295        [ pass ]            310
falcon.falcon_bug_31295        [ pass ]           1291

I also applied the patch to the mysql-6.0-falcon branch, where I've seen similar failures with a new test, falcon_tablespace_priv, on this particular host. 

There, I did not see any of these tests failing even when running them 100 times in a row. When running the falcon suite, falcon_bug_31295 test failed once (after the 10 second timeout), but this is not easily reproducible (it passes most of the time). The latter case is probably just a matter of adjusting the timeout value to accomodate really slow or busy machines.
[22 Aug 2008 21:31] Kevin Lewis
This fix is in version 6.0.7
[28 Aug 2008 20:17] Bugs System
Pushed into 6.0.7-alpha  (revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (version source revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (pib:3)
[14 Sep 2008 2:09] Bugs System
Pushed into 6.0.6-alpha  (revid:vvaintroub@mysql.com-20080717203845-q5bo9rdpbigm1nat) (version source revid:hakan@mysql.com-20080716160034-sdexuyp3qow7zlc6) (pib:3)
[9 Oct 2008 13:02] Jon Stephens
Documented in the 6.0.7 changelog as follows:

        CREATE TABLESPACE failed when invoked immediately following a
        DROP TABLESPACE statement that used the same tablespace name.
[21 Oct 2008 20:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56749

2875 Vladislav Vaintroub	2008-10-21
      Bug#39789: Falcon recovery failure after several CREATE + 
      DROP TABLESPACE
      
      Solved in this patch:
      
      -Drop tablespace in REDO phase removed datafiles without 
      taking into account that some of them should stay after the 
      recovery. In this case, even if metadata for tablespace 
      would be present after recovery, the data is lost together 
      with the file.
      
      Typical scenario for such failed recovery would be something
      like
      
      create tablespace A add datafile 'f' engine falcon;
      drop tablespace A;
      create tablespace B add datafile 'f' engine falcon;
      create table T tablespace B;
      
      Recovery for "drop tablespace A" would delete file used
      by tablespace B and invalidate table T.
      
      Solution for this problem is to calculate whether datafiles 
      should stay after recovery or can be deleted. Calculation done
      in the pass1 of recovery, deletion in the REDO phase.
      
      While fixing this, other smaller problems were fixed
      - failure to recover, when the database crashed after datafile 
      had been created but before header was written.
      - redoCreateTableSpace did not attempt to create tablespaces, 
      but only opened existing ones.
      - fixed a long-standing problem that tablespace file was present
      even after "drop tablespace" successfully completed. This was
      previously worked around in Bug#38186
[30 Oct 2008 13:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57456

2889 Vladislav Vaintroub	2008-10-30
       Fixed a long-standing problem that tablespace file was present even after "drop tablespace" successfully completed. This problem was  previously worked around in Bug#38186.
[30 Oct 2008 14:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57464

2889 Vladislav Vaintroub	2008-10-30
       Fixed a long-standing problem that tablespace file was present even after
      "drop tablespace" successfully completed. This problem was  previously worked 
      around in Bug#38186.
[31 Oct 2008 0:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57515

2892 Vladislav Vaintroub	2008-10-31
       Bug #38186: CREATE TABLESPACE failed when invoked immediately following a
       DROP TABLESPACE statement that used the same tablespace name.
      
       Fixed a long-standing problem that tablespace file was present even after "drop tablespace" successfully 
      completed.  The problem was worked around previously, this patch contains refined solution. 
      Previously, file was closed and deleted when the gopher thread processed dropTableSpace record.
       Now  the file is removed earlier, in the client thread . Gopher thread only closes the file descriptor later. 
      
      This technique ensures that 
      - datafile does not exist after "drop tablespace" is completed
      - file descriptor remains valid as long as it can be used by commited ,  write-incomplete transactions.
      
      On  Windows, removing an open file requires FILE_SHARE_DELETE flag for CreateFile. Also file has to be renamed to 
      a unique name before deletion. This functionality is implemented  in posix-alike winOpen() and winUnlink()  functions
      
      This patch also contains missing test case for Bug#38186.
[20 Nov 2008 12:56] Bugs System
Pushed into 6.0.8-alpha  (revid:vvaintroub@mysql.com-20081031002913-hg55ea1xjrimzx25) (version source revid:vvaintroub@mysql.com-20081031002913-hg55ea1xjrimzx25) (pib:5)