Bug #41158 DROP TABLE holds LOCK_open during unlink()
Submitted: 1 Dec 2008 20:01 Modified: 23 Sep 2010 22:20
Reporter: Domas Mituzas Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0, 5.1, etc OS:Any
Assigned to: Konstantin Osipov
Triage: Triaged: D2 (Serious) / R3 (Medium) / E3 (Medium)

[1 Dec 2008 20:01] Domas Mituzas
Description:
DROP TABLE usually calls unlink() on one or more files while holding LOCK_open.

On really big files and very popular operating/file systems unlink() may be running for minutes - and for that time no queries can execute. 

How to repeat:
DROP TABLE veryverybigtable;

OR

add sleep() before unlink() :)

Suggested fix:
Avoid holding LOCK_open
[17 Dec 2008 18:00] Konstantin Osipov
http://forge.mysql.com/worklog/task.php?id=3983
[10 Nov 2009 3:46] Andrew Dalgleish
Bug #44757 set as duplicate of this.
[8 Dec 2009 2:25] Andrew Dalgleish
If you are using innodb_file_per_table on ext3, using DROP TABLE on very large tables can be very slow and blocks the mysql server.

A *partial* work-around is to create an additional hardlink to the table.ibd file. This makes the call to unlink() faster, although there is still some clean-up of the main data dictionary. You can then remove the additional hardlink later, without blocking the mysql server.
[5 Aug 2010 9:57] Sveta Smirnova
See also bug #55722
[5 Aug 2010 11:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115086

3095 Konstantin Osipov	2010-08-05
      A preliminary patch that fixes Bug#41158 "DROP TABLE holds 
      LOCK_open during unlink()". This patch is a pre-requisite for 
      the fix for Bug#52044.
[5 Aug 2010 11:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115087

3095 Konstantin Osipov	2010-08-05
      A preliminary patch that fixes Bug#41158 "DROP TABLE holds 
      LOCK_open during unlink()". This patch is a pre-requisite for 
      the fix for Bug#52044.
[5 Aug 2010 14:47] Mark Callaghan
Excellent.

Is this dependent on the MDL changes in 5.5?
[5 Aug 2010 15:22] Davi Arnaut
Yes.
[5 Aug 2010 18:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115135

3095 Konstantin Osipov	2010-08-05
      A fix for Bug#41158 "DROP TABLE holds LOCK_open during unlink()". 
      
      Remove acquisition of LOCK_open around file system operations,
      since such operations are now protected by metadata locks.
     @ sql/datadict.cc
        No longer necessary to protect ha_create_table() with
        LOCK_open. Serial execution is now ensured by metadata
        locks.
[6 Aug 2010 18:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115240

3096 Konstantin Osipov	2010-08-06
      A fix for Bug#41158 "DROP TABLE holds LOCK_open during unlink()". 
      
      Remove acquisition of LOCK_open around file system operations,
      since such operations are now protected by metadata locks.
      Rework table discovery algorithm to not require LOCK_open.
     @ mysql-test/t/disabled.def
        Disable NDB tests due to Bug#55799.
     @ sql/datadict.cc
        No longer necessary to protect ha_create_table() with
        LOCK_open. Serial execution is now ensured by metadata
        locks.
     @ sql/ha_ndbcluster.cc
        Do not manipulate with LOCK_open in cluster code.
     @ sql/ha_ndbcluster_binlog.cc
        Do not manipulate with LOCK_open in cluster code.
     @ sql/ha_ndbcluster_binlog.h
        Update function signature.
     @ sql/handler.cc
        Implement ha_check_if_table_exists().
        @todo: some engines provide ha_table_exists_in_engine()
        handlerton call, for those we perhaps shouldn't
        call ha_discover(), to be more efficient.
        Since currently it's only NDB, postpone till
        integration with NDB.
     @ sql/handler.h
        Declare ha_check_if_table_exists() function.
     @ sql/mdl.cc
        Remove an obsolete comment.
     @ sql/sql_base.cc
        Update to a new signature of close_cachded_tables():
        from now on we always call it without LOCK_open.
        Update comments.
        Remove get_table_share_with_create(), we should
        not attempt to create a table under LOCK_open.
        Introduce get_table_share_with_discover() instead,
        which would request a back off action if the table
        exists in engine.
        Do not attempt to discover views or MyISAM
        Merge engine children.
        Remove acquisition of LOCK_open for 
        data dictionary operations, such as check_if_table_exists().
        Do not use get_table_share_with_create/discover for views,
        where it's not needed.
        Make tdc_remove_table() optionally acquire LOCK_open
        to simplify usage of this function.
        Use the right mutex in the partitioning code when
        manipulating with thd->open_tables.
     @ sql/sql_base.h
        Update signatures of changes functions.
     @ sql/sql_insert.cc
        Do not wrap quick_rm_table() with LOCK_open acquisition, 
        this is unnecessary.
     @ sql/sql_parse.cc
        Update to the new calling convention of tdc_remove_table().
        Update to the new signature of close_cached_tables().
        Update comments.
     @ sql/sql_rename.cc
        Update to the new calling convention of tdc_remove_table().
        Remove acquisition of LOCK_open around filesystem
        operations.
     @ sql/sql_show.cc
        Remove get_trigger_table_impl().
        Do not acquire LOCK_open for a dirty read of the trigger
        file.
     @ sql/sql_table.cc
        Do not acquire LOCK_open for filesystem operations.
     @ sql/sql_trigger.cc
        Do not require LOCK_open for trigger file I/O.
     @ sql/sql_truncate.cc
        Update to the new signature of tdc_remove_table().
     @ sql/sql_view.cc
        Do not require LOCK_open for view I/O.
        Use tdc_remove_table() to expel view share.
        Update comments.
     @ sql/sys_vars.cc
        Update to the new signature of close_cached_tables().
[9 Aug 2010 18:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115359

3097 Konstantin Osipov	2010-08-09
      A fix for Bug#41158 "DROP TABLE holds LOCK_open during unlink()". 
      
      Remove acquisition of LOCK_open around file system operations,
      since such operations are now protected by metadata locks.
      Rework table discovery algorithm to not require LOCK_open.
      
      No new tests added since all MDL locking operations are covered
      in lock.test and mdl_sync.test, and as long as these tests
      pass despite the increased concurrency, consistency must be
      unaffected.
     @ mysql-test/t/disabled.def
        Disable NDB tests due to Bug#55799.
     @ sql/datadict.cc
        No longer necessary to protect ha_create_table() with
        LOCK_open. Serial execution is now ensured by metadata
        locks.
     @ sql/ha_ndbcluster.cc
        Do not manipulate with LOCK_open in cluster code.
     @ sql/ha_ndbcluster_binlog.cc
        Do not manipulate with LOCK_open in cluster code.
     @ sql/ha_ndbcluster_binlog.h
        Update function signature.
     @ sql/handler.cc
        Implement ha_check_if_table_exists().
        @todo: some engines provide ha_table_exists_in_engine()
        handlerton call, for those we perhaps shouldn't
        call ha_discover(), to be more efficient.
        Since currently it's only NDB, postpone till
        integration with NDB.
     @ sql/handler.h
        Declare ha_check_if_table_exists() function.
     @ sql/mdl.cc
        Remove an obsolete comment.
     @ sql/sql_base.cc
        Update to a new signature of close_cached_tables():
        from now on we always call it without LOCK_open.
        Update comments.
        Remove get_table_share_with_create(), we should
        not attempt to create a table under LOCK_open.
        Introduce get_table_share_with_discover() instead,
        which would request a back off action if the table
        exists in engine.
        Remove acquisition of LOCK_open for 
        data dictionary operations, such as check_if_table_exists().
        Do not use get_table_share_with_create/discover for views,
        where it's not needed.
        Make tdc_remove_table() optionally acquire LOCK_open
        to simplify usage of this function.
        Use the right mutex in the partitioning code when
        manipulating with thd->open_tables.
     @ sql/sql_base.h
        Update signatures of changes functions.
     @ sql/sql_insert.cc
        Do not wrap quick_rm_table() with LOCK_open acquisition, 
        this is unnecessary.
     @ sql/sql_parse.cc
        Update to the new calling convention of tdc_remove_table().
        Update to the new signature of close_cached_tables().
        Update comments.
     @ sql/sql_rename.cc
        Update to the new calling convention of tdc_remove_table().
        Remove acquisition of LOCK_open around filesystem
        operations.
     @ sql/sql_show.cc
        Remove get_trigger_table_impl().
        Do not acquire LOCK_open for a dirty read of the trigger
        file.
     @ sql/sql_table.cc
        Do not acquire LOCK_open for filesystem operations.
     @ sql/sql_trigger.cc
        Do not require LOCK_open for trigger file I/O.
     @ sql/sql_truncate.cc
        Update to the new signature of tdc_remove_table().
     @ sql/sql_view.cc
        Do not require LOCK_open for view I/O.
        Use tdc_remove_table() to expel view share.
        Update comments.
     @ sql/sys_vars.cc
        Update to the new signature of close_cached_tables().
[9 Aug 2010 18:38] Konstantin Osipov
The patch was queued into mysql-5.5-runtime. It requires additional QA, random query generator and stress testing in particular.
[9 Aug 2010 21:39] Peiran Song
In Bug #55722, I reported two cases where the semaphores were different, hopefully the bug fix covered both.
 
=== Case 1 (5.1.42, zfs)
--Thread 9 has waited at buf/buf0flu.c line 607 for 253.00 seconds the semaphore:
S-lock on RW-latch at 9143aec10 created in file buf/buf0buf.c line 547
a writer (thread id 22746) has reserved it in mode  exclusive

thread 9: main thread doing flushing buffer pool
thread 22746: drop table
buf/buf0flu.c line 607 : rw_lock_s_lock_gen(&(block->lock), BUF_IO_WRITE);
buf/buf0buf.c line 547 : rw_lock_create(&block->lock, SYNC_LEVEL_VARYING);

=== Case 2 (percona 5.1.45, xfs)

This semaphore was the same as reported in bug #44757

--Thread 1258379600 has waited at row/row0purge.c line 541 for 241.00 seconds the
semaphore:
S-lock on RW-latch at 0xe03f60 '&dict_operation_lock'
a writer (thread id 1449539920) has reserved it in mode  exclusive

thread 1258379600 : unknown (maybe the purge thread separated from the main thread?)
thread 1449539920 : drop table
[10 Aug 2010 12:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115410

3099 Magne Mahre	2010-08-10
      Post-commit fix for Bug#41158
      
      A label statement needs to be followed by at least
      one primary expression.  If built without
      WITH_PARTITION_STORAGE_ENGINE set, the block would 
      be empty.
      
      Added ';' as a dummy statement to fix it.
[25 Aug 2010 9:24] Bugs System
Pushed into mysql-5.5 5.5.6-m3 (revid:alik@ibmvm-20100825092002-2yvkb3iwu43ycpnm) (version source revid:alik@ibmvm-20100825092002-2yvkb3iwu43ycpnm) (merge vers: 5.5.6-m3) (pib:20)
[30 Aug 2010 8:32] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@sun.com-20100830082732-n2eyijnv86exc5ci) (version source revid:alik@sun.com-20100830082732-n2eyijnv86exc5ci) (merge vers: 5.6.1-m4) (pib:21)
[30 Aug 2010 8:36] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100830082745-n6sh01wlwh3itasv) (version source revid:alik@sun.com-20100830082745-n6sh01wlwh3itasv) (pib:21)
[7 Sep 2010 17:56] Mark Callaghan
Please reopen this. I think this can be fixed in the innodb plugin for 5.1 by always doing background drop table. This can be a big issue for anyone using innodb_file_per_table in 5.1.
[7 Sep 2010 18:02] Mark Callaghan
From row_drop_table_for_mysql(). I want a my.cnf option to always do the drop in the background possibly also controlled by whether file-per-table is used and by the size of the table.

        if (table->n_mysql_handles_opened > 0) {
                ibool   added;

                added = row_add_table_to_background_drop_list(table->name);

                if (added) {
[7 Sep 2010 18:40] James Day
For MySQL 5.1 and earlier the workarounds for this are:

1. Defragment in the filesystem. When you extend many tables the chunks for each end up very fragmented, so the defragmenting can be of benefit for table scans as well as dropping speed. If you don't have a defragmenting program, copying the files somewhere and copying them back will do the job.

2. When dropping, create a filesystem link to the IBD file. Drop the table in MySQL and the OS won't drop the file in the filesystem because the link exists, so the drop will be fast within MySQL. Remove the link and the OS will finish the deleting work in the background.

Also see bug #31592, a request to make the size by which individual table tablespaces grow configurable or just bigger.
[7 Sep 2010 20:17] Mark Callaghan
How is defragmenting the filesystem a workaround? That requires me to take the server offline for too long which will cause even more stalls.

Using links is a very short-term workaround. Most of us are unwilling to make our scripts even more complex to do that, or to grant permissions to the file system to others.

The fix in InnoDB is easy.
[8 Sep 2010 14:27] James Day
See bug #56655 for a new bug covering the request for more work on this in 5.1.
[23 Sep 2010 22:20] Paul Dubois
Noted in 5.5.6, 5.6.1 changelogs.

DROP TABLE held a lock during unlink() file system operations,
causing performance problems if unlink() took a long time.
[18 Nov 2010 20:59] Dmitry Lenev
Bug #28614 was marked as duplicate of this bug.