MySQL Bugs: #61188: DROP TABLE extremely slow

Bug #61188	DROP TABLE extremely slow
Submitted:	16 May 2011 16:43	Modified:	2 Nov 2011 16:32
Reporter:	Jonathan Stimmel	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.5.12-log	OS:	Linux (CentOS release 5.3 (Final))
Assigned to:	Inaam Rana	CPU Architecture:	Any

Description:
In "recent" minor releases of 5.5, DROP TABLE can be incredibly slow. As I file this, I've been waiting almost 45 minutes to drop a moderately sized, compressed/partitioned InnoDB table:

mysql> show processlist \G
*************************** 1. row ***************************
     Id: 2352
   User: dbuser
   Host: localhost
     db: db
Command: Query
   Time: 2573
  State: checking permissions
   Info: DROP TABLE `A`

Thr DROP is proceeding (some of the partitions have been unlinked in the filesystem), it's just exceedingly slow.

MySQL is almost entirely idle, aside from a select against INFORMATION_SCHEMA.GLOBAL_STATUS and a cheap query against another table (both run once every 60s).

MySQL has a CPU-bound thread:

$ pgrep -x mysqld | xargs -r ps -L -o pid,lwp,pcpu,comm 
  PID   LWP %CPU COMMAND
 8740  8740  0.0 mysqld
 8740  8746  0.0 mysqld
 8740  8747  0.0 mysqld
 8740  8748  0.0 mysqld
 8740  8749  0.0 mysqld
 8740  8750  0.0 mysqld
 8740  8751  0.0 mysqld
 8740  8752  0.1 mysqld
 8740  8753  0.1 mysqld
 8740  8754  0.1 mysqld
 8740  8755  0.1 mysqld
 8740  8870  0.0 mysqld
 8740  8871  0.0 mysqld
 8740  8872  0.0 mysqld
 8740  8873  4.3 mysqld
 8740  8874  0.8 mysqld
 8740  8877  0.0 mysqld
 8740 25068 99.9 mysqld
 8740 25143  0.0 mysqld
 8740 25314  0.0 mysqld
 8740 26445  0.0 mysqld
 8740 26838  0.0 mysqld

while the local disks are virtually idle:

$ date
Mon May 16 12:03:19 EDT 2011
$ sar -b -s 11:30:00
Linux 2.6.32.15-asl.1.pf.el5.x86_64 (ita1avsdb1)        05/16/11

20110516 11:30:01          tps      rtps      wtps   bread/s   bwrtn/s
20110516 11:40:01         0.86      0.03      0.83      0.32     10.15
20110516 11:50:01         1.70      0.03      1.67      2.11     17.91
20110516 12:00:01         2.09      0.19      1.91      1.57     55.82
Average:               1.55      0.08      1.47      1.33     27.97

I have performance_schema enabled:

mysql> select * from threads where processlist_id=2352;
+-----------+----------------+---------------------------+
| THREAD_ID | PROCESSLIST_ID | NAME                      |
+-----------+----------------+---------------------------+
|      2368 |           2352 | thread/sql/one_connection |
+-----------+----------------+---------------------------+
1 row in set (0.00 sec)

mysql> truncate events_waits_summary_by_thread_by_event_name
mysql> select sleep(60);
mysql> select * from events_waits_summary_by_thread_by_event_name
       where thread_id=2368 and COUNT_STAR>0 \G
*************************** 1. row ***************************
     THREAD_ID: 2368
    EVENT_NAME: wait/synch/mutex/innodb/buf_pool_zip_mutex
    COUNT_STAR: 48329
SUM_TIMER_WAIT: 1935086400
MIN_TIMER_WAIT: 16800
AVG_TIMER_WAIT: 40039
MAX_TIMER_WAIT: 3224400
1 row in set (0.00 sec)
mysql> select SOURCE, EVENT_NAME, sum(TIMER_WAIT), count(*)
       from events_waits_history_long where thread_id=2368
       group by SOURCE, EVENT_NAME \G
*************************** 1. row ***************************
         SOURCE: buf0buddy.c:376
     EVENT_NAME: wait/synch/mutex/innodb/buf_pool_zip_mutex
sum(TIMER_WAIT): 139258800
       count(*): 1765
*************************** 2. row ***************************
         SOURCE: buf0buddy.c:485
     EVENT_NAME: wait/synch/mutex/innodb/buf_pool_zip_mutex
sum(TIMER_WAIT): 25796400
       count(*): 1314
*************************** 3. row ***************************
         SOURCE: buf0lru.c:377
     EVENT_NAME: wait/synch/mutex/innodb/buf_pool_zip_mutex
sum(TIMER_WAIT): 75835200
       count(*): 3431
*************************** 4. row ***************************
         SOURCE: buf0lru.c:477
     EVENT_NAME: wait/synch/mutex/innodb/buf_pool_zip_mutex
sum(TIMER_WAIT): 159886800
       count(*): 3431
4 rows in set (0.04 sec)

Finally, here's the table definition:

CREATE TABLE A (
  B smallint(6) NOT NULL,
  C datetime NOT NULL,
  D smallint(6) NOT NULL,
  E smallint(6) NOT NULL,
  F decimal(5,0) NOT NULL,
  G date NOT NULL,
  H smallint(6) NOT NULL,
  I smallint(6) NOT NULL,
  PRIMARY KEY (B,C,D,E,F,G,H,I),
  KEY J (E,F,G,B,C,D),
  KEY K (E,G,B,C,D)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
/*!50500 PARTITION BY RANGE  COLUMNS(G)
(PARTITION pNULL VALUES LESS THAN ('0000-00-00') ENGINE = InnoDB,
 PARTITION p201001 VALUES LESS THAN ('2010-02-01') ENGINE = InnoDB,
 PARTITION p201002 VALUES LESS THAN ('2010-03-01') ENGINE = InnoDB,
 PARTITION p201003 VALUES LESS THAN ('2010-04-01') ENGINE = InnoDB,
 PARTITION p201004 VALUES LESS THAN ('2010-05-01') ENGINE = InnoDB,
 PARTITION p201005 VALUES LESS THAN ('2010-06-01') ENGINE = InnoDB,
 PARTITION p201006 VALUES LESS THAN ('2010-07-01') ENGINE = InnoDB,
 PARTITION p201007 VALUES LESS THAN ('2010-08-01') ENGINE = InnoDB,
 PARTITION p201008 VALUES LESS THAN ('2010-09-01') ENGINE = InnoDB,
 PARTITION p201009 VALUES LESS THAN ('2010-10-01') ENGINE = InnoDB,
 PARTITION p201010 VALUES LESS THAN ('2010-11-01') ENGINE = InnoDB,
 PARTITION p201011 VALUES LESS THAN ('2010-12-01') ENGINE = InnoDB,
 PARTITION p201012 VALUES LESS THAN ('2011-01-01') ENGINE = InnoDB,
 PARTITION p201101 VALUES LESS THAN ('2011-02-01') ENGINE = InnoDB,
 PARTITION p201102 VALUES LESS THAN ('2011-03-01') ENGINE = InnoDB,
 PARTITION p201103 VALUES LESS THAN ('2011-04-01') ENGINE = InnoDB,
 PARTITION p201104 VALUES LESS THAN ('2011-05-01') ENGINE = InnoDB,
 PARTITION p201105 VALUES LESS THAN ('2011-06-01') ENGINE = InnoDB,
 PARTITION p201106 VALUES LESS THAN ('2011-07-01') ENGINE = InnoDB,
 PARTITION p201107 VALUES LESS THAN ('2011-08-01') ENGINE = InnoDB,
 PARTITION p201108 VALUES LESS THAN ('2011-09-01') ENGINE = InnoDB,
 PARTITION p201109 VALUES LESS THAN ('2011-10-01') ENGINE = InnoDB,
 PARTITION p201110 VALUES LESS THAN ('2011-11-01') ENGINE = InnoDB,
 PARTITION p201111 VALUES LESS THAN ('2011-12-01') ENGINE = InnoDB,
 PARTITION p201112 VALUES LESS THAN ('2012-01-01') ENGINE = InnoDB,
 PARTITION p201201 VALUES LESS THAN ('2012-02-01') ENGINE = InnoDB,
 PARTITION p201202 VALUES LESS THAN ('2012-03-01') ENGINE = InnoDB,
 PARTITION p201203 VALUES LESS THAN ('2012-04-01') ENGINE = InnoDB,
 PARTITION p201204 VALUES LESS THAN ('2012-05-01') ENGINE = InnoDB,
 PARTITION p201205 VALUES LESS THAN ('2012-06-01') ENGINE = InnoDB,
 PARTITION p201206 VALUES LESS THAN ('2012-07-01') ENGINE = InnoDB,
 PARTITION p201207 VALUES LESS THAN ('2012-08-01') ENGINE = InnoDB,
 PARTITION p201208 VALUES LESS THAN ('2012-09-01') ENGINE = InnoDB,
 PARTITION p201209 VALUES LESS THAN ('2012-10-01') ENGINE = InnoDB,
 PARTITION p201210 VALUES LESS THAN ('2012-11-01') ENGINE = InnoDB)

The table had over 600 million rows, mostly concentrated in the first few partitions

How to repeat:
Although I've witness this a few times, I don't yet have a repeatable test case. If I'm able to produce one, I will add it to the bug.

I suspect the problem is related to InnoDB partitioning and/or compression and the size of the table.

Please, send your my.cnf file content and the results of:

show engine innodb status\G

while this slow DROP TABLE is running.

$ cat /etc/my.cnf 
# $Id$

[mysqld_safe]
pid-file = /var/lib/mysql/mysqld.pid

[mysqld]
user = mysql
socket = /var/lib/mysql/mysql.sock
datadir = /var/lib/mysql
tmpdir = /var/tmp

performance_schema

tmp_table_size = 256MB
max_heap_table_size = 256MB

log-output = TABLE
slow_query_log = ON

innodb_file_per_table
innodb_stats_on_metadata = OFF
innodb_strict_mode = ON
innodb_file_format = Barracuda
innodb_data_file_path = ibdata1:1G
innodb_buffer_pool_size = 32G
innodb_buffer_pool_instances = 8
innodb_log_buffer_size = 64M
innodb_log_files_in_group = 3
innodb_log_file_size = 1G
innodb_io_capacity = 20000
innodb_write_io_threads = 4
innodb_read_io_threads = 4
innodb_open_files = 750
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_max_dirty_pages_pct = 1
innodb_purge_threads = 1

Note that it took several minutes to get innodb status. The thread "Waiting for table metadata lock" is a mysql client I accidentally started without -A after the drop was in progress.

The performance appears to degrade with the number of partitions. Dropping an empty table with 2 partitions takes around 5 seconds, while the same table with 50 partitions takes takes about a minute and a half. Not surprisingly the problem disappears with MyISAM.

I've reproduced this bug on two hosts (5.5.10 and 5.5.12), but not a third (also 5.5.12). I'm still investigating what the key difference is in that third host.

Here's my test script:

#!/bin/bash

set -o errexit -o nounset

PARTITIONS=50
MYSQL="mysql -vvv --database=test -Ee"

$MYSQL "SELECT VERSION()"
$MYSQL "CREATE TABLE t (i INT) ENGINE innodb ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4"
$MYSQL "ALTER TABLE t PARTITION BY RANGE COLUMNS(i) (PARTITION pNULL VALUES LESS THAN (0))"
for i in `seq $PARTITIONS`; do
    $MYSQL "ALTER TABLE t ADD PARTITION (PARTITION p$i VALUES LESS THAN ($i))"
done
$MYSQL "DROP TABLE t"

(Changing Category to InnoDB)

I misspoke in my previous comment - the script I posted reproduced the problem on one host (5.5.10), but not another (5.5.12).

After several hours, the original server eventually finished dropping its tables, so I ran my test script on that host, and was unable to reproduce the problem (<100ms to drop the test table). As that server has no user tables, I'm now suspecting that the performance hit is also related to the size of innodb data outside of the table being dropped.

In particular, I suspect innodb may be scanning all data pages in the buffer pool - perhaps to free those pages which belong to the table being dropped. To test this theory I took the empty host and did the following:

0. ran my script, which takes <100ms to drop the test table
1. recreated the tables
2. inserted 20-30M rows
3. let the database idle
4. ran my script, which now takes 2-3 seconds to drop
5. restarted mysql
6. ran the script again on the cold, idle database - drop is again <100ms

Note that at step 0 InnoDB's has 32GB of pages allocated, but very few are in use since all tables have been dropped.

As a further test, I restarted the host which took over a minute to run the test script. After restart, dropping the test table takes <100ms.

At this point, I suspect that dropping tables causes InnoDB to scan all active pages in the buffer pool(s). This scan becomes much more expensive due to the need to decompress most pages. Finally, I suspect this process is repeated for each partition. With a 32 GB buffer pool and 200 large, compressed partitions, it's easy to see how this could degrade to a time scale of hours.

I think this is a duplicate of/related to known Bug #51325. Please, check.

See http://www.mysqlperformanceblog.com/2011/02/03/performance-problem-with-innodb-and-drop-ta... also.

Agreed, this is most likely a duplicate of bug 51325.

Duplicate of http://bugs.mysql.com/bug.php?id=51325.

I suspect the slow DROP TABLE problem for compressed InnoDB tables is not 51325. I just had a test server crash because of long lock wait while doing "DROP DATABASE" on a db with 6 compressed InnoDB tables and a 60G buffer pool. I use 5.1.52 + the facebook patch. I always consider that I created the bug in the facebook patch, so I am still investigating and will try to reproduce with 5.1.52 unmodified.

I think http://bugs.mysql.com/bug.php?id=59727 might be related because of buf_LRU_block_remove_hashed_page

The call stack from MySQL 5.1.52 + Facebook patch is:
buf_buddy_free_low,buf_LRU_block_remove_hashed_page,buf_LRU_invalidate_tablespace,fil_delete_tablespace,row_drop_table_for_mysql,row_drop_tables_for_mysql_in_background,srv_master_thread,start_thread,clone

The first long lock wait message is:

InnoDB: Warning: a long semaphore wait:
--Thread 1201695040 has waited at buf/buf0buf.c line 1671 for 247.00 seconds the
semaphore:
Mutex at 0x37fcc00 created file buf/buf0buf.c line 1050, lock var 0
waiters flag 0
wait has ended

Thread 13 (Thread 1246906688 (LWP 31166)):
#0  buf_buddy_free_low (buf=0x2ab7c4382600, i=2) at buf/buf0buddy.c:575
#1  0x0000000000832594 in buf_LRU_block_remove_hashed_page (bpage=0x2ab7c4382700, zip=1)
at ./include/buf0buddy.ic:121
#2  0x000000000083467c in buf_LRU_invalidate_tablespace (id=96) at buf/buf0lru.c:469
#3  0x000000000085352f in fil_delete_tablespace (id=96) at fil/fil0fil.c:2401
#4  0x00000000007d300d in row_drop_table_for_mysql (name=0x2ab93809d1d8
"test/purchases_index6", trx=0x2ab97af3daf0, drop_db=<value optimized out>,
delayed_drop=0) at row/row0mysql.c:3409
#5  0x00000000007d3f24 in row_drop_tables_for_mysql_in_background (wait=0) at
row/row0mysql.c:2153
#6  0x00000000007e9a57 in srv_master_thread (arg=<value optimized out>) at
srv/srv0srv.c:3080
#7  0x00000031c6c062f7 in start_thread () from /lib64/libpthread.so.0
#8  0x00000031c64d1e3d in clone () from /lib64/libc.so.6

All of the time is in buf_buddy_free_low. That is only called for compressed pages. Thus I am more certain that this problem is much worse for compressed tables. And there is a for loop in buf_buddy_free_low over buf_pool->zip_free[] which is probably the problem.

        for (bpage = UT_LIST_GET_FIRST(buf_pool->zip_free[i]); bpage; ) {
                UNIV_MEM_VALID(bpage, BUF_BUDDY_LOW << i);
                ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_FREE);

                if (bpage == buddy) {
buddy_free:
                        /* The buddy is free: recombine */
                        buf_buddy_remove_from_free(bpage, i);
buddy_free2:
                        ut_ad(buf_page_get_state(buddy) == BUF_BLOCK_ZIP_FREE);
                        ut_ad(!buf_pool_contains_zip(buddy));
                        i++;
                        buf = ut_align_down(buf, BUF_BUDDY_LOW << i);

                        goto recombine;
                }

                ut_a(bpage != buf);

                {
                        buf_page_t*     next = UT_LIST_GET_NEXT(list, bpage);
                        UNIV_MEM_ASSERT_AND_FREE(bpage, BUF_BUDDY_LOW << i);
                        bpage = next;
                }
        }

On unmodified mysql 5.1.52:
* drop of 6 uncompressed tables --> 21 seconds
* drop of 6 compressed tables --> 53 minutes

buffer pool is 55GB and warm as the tables were just loaded. Tables should be fully cached in the buffer pool. I used a modified version of the insert benchmark and will attach my files.

Ignore duplicate settings in this, but my.cnf was:

[mysqld]

plugin-load=innodb=ha_innodb_plugin.so;innodb_trx=ha_innodb_plugin.so;innodb_locks=ha_innodb_plugin.so;innodb_lock_waits=ha_innodb_plugin.so;innodb_cmp=ha_innodb_plugin.so;innodb_cmp_reset=ha_innodb_plugin.so;innodb_cmpmem=ha_innodb_plugin.so;innodb_cmpmem_reset=ha_innodb_plugin.so

innodb_buffer_pool_size=16G
innodb_log_file_size=1900M
innodb_flush_log_at_trx_commit=1
innodb_doublewrite=1
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80

innodb_file_format=barracuda
innodb_file_per_table

max_connections=2000
table_cache=2000

key_buffer_size=200M
innodb_io_capacity=1000
innodb_thread_concurrency=128

log_bin
sync_binlog=1

innodb_read_io_threads=8
innodb_io_capacity=2000
skip_log_bin
innodb_buffer_pool_size=55G

modified insert benchmark test

Attachment: iibench.py (text/x-python-script), 15.60 KiB.

shell script to run iibench

Attachment: rp.sh (application/octet-stream, text), 993 bytes.

I added a test case. To run it for compressed tables:
bash rp.sh 20000000 yes yes yes 6
time mysql -A -e 'drop database test; create database test'

And for uncompressed tables:
bash rp.sh 20000000 yes yes no 6
time mysql -A -e 'drop database test; create database test'

The DROP command holds critical InnoDB mutexes while running which blocks most activity until done. SHOW INNODB STATUS hangs. In some cases there are long-lock wait timeouts on internal InnoDB mutexes and InnoDB crashes when that occurs. I wonder if LOCK_open is also held.

I can't use compressed InnoDB tables until this is fixed.

I agree that this is not a duplicate of bug 51325. That bug is about general code path of drop table where buffer pool mutex may become a contention point. This is compression specific.
Agree with Mark that buf_buddy_free_low() looks heavy duty. Depends on how long zip_free[] lists can be. If we are coalescing aggressively, as we seem to do. then these lists should not be very long.
I am assigning it to our buf_buddy author Marko and resetting the state from duplicate to analyzing.

Marko, if you have too much on your plate and you want me to look into it, please let me know and I'll give it a stab.

It looks like the buddy allocator is combining blocks too aggressively. The recombination is not needed for correctness; it is only needed for avoiding fragmentation in the buddy allocator, that is, to minimize INFORMATION_SCHEMA.INNODB_CMPMEM.pages_free.

To make the matter worse, sizeof(buf_page_t) is much smaller than typical compressed page size. We should consider allocating the buf_page_t from a separate pool, which should be easier, because the block size is always fixed.

grab it

Grabbing back. I have a patch that reduces the recombination efforts in buf_buddy_free_low() and makes bpage->state trustable for blocks shorter than PAGE_ZIP_MIN_SIZE.

Mark,

I tried reproducing the test you reported. I had to work on a smaller box with 16G RAM therefore I reduced the table size to 2 million instead of 20 million. Worked with 6 tables still. The drop database took 3 seconds. Maybe because tables are too small. I'll see if I can get my hands on a box that has more than 16G RAM. Meanwhile if you get a chance can you get "select * from information_schema.innodb_cmpmem" before and after drop database and also LRU and unzip_LRU length from innodb status? In my case, it looked like:

before DROP DATABASE (just after load of tables)
mysql> select * from information_schema.innodb_cmpmem;
+-----------+------------+------------+----------------+-----------------+
| page_size | pages_used | pages_free | relocation_ops | relocation_time |
+-----------+------------+------------+----------------+-----------------+
|       128 |          0 |          0 |              0 |               0 |
|       256 |          0 |          0 |              0 |               0 |
|       512 |          0 |          0 |              0 |               0 |
|      1024 |          0 |          0 |              0 |               0 |
|      2048 |          0 |          0 |              0 |               0 |
|      4096 |          0 |          0 |              0 |               0 |
|      8192 |     319966 |          0 |             18 |               0 |
|     16384 |          0 |          0 |              0 |               0 |
+-----------+------------+------------+----------------+-----------------+
8 rows in set (0.00 sec)

LRU len: 319989, unzip_LRU len: 319966

after DROP DATABASE
mysql> select * from information_schema.innodb_cmpmem;
+-----------+------------+------------+----------------+-----------------+
| page_size | pages_used | pages_free | relocation_ops | relocation_time |
+-----------+------------+------------+----------------+-----------------+
|       128 |          0 |          0 |              0 |               0 |
|       256 |          0 |          0 |              0 |               0 |
|       512 |          0 |          0 |              0 |               0 |
|      1024 |          0 |          0 |              0 |               0 |
|      2048 |          0 |          0 |              0 |               0 |
|      4096 |          0 |          0 |              0 |               0 |
|      8192 |          0 |          0 |         155729 |               0 |
|     16384 |          0 |          0 |              0 |               0 |
+-----------+------------+------------+----------------+-----------------+
8 rows in set (0.00 sec)

As you mentioned all your data is cached. Assuming you are doing test on a freshly started server i.e.: no other data in the buffer pool. If this is the case, then the hypothesis that Marko and I were working on may not resolve this issue. The above tells us that buf_page_t structs are not allocated from buf_buddy because we don't have any compressed only pages in the buffer pool. And we both tried in our own way to reduce heavy work when coalescing freed buf_page_t structs.

Before drop:

'select * from innodb_cmpmem'
+-----------+------------+------------+----------------+-----------------+
| page_size | pages_used | pages_free | relocation_ops | relocation_time |
+-----------+------------+------------+----------------+-----------------+
|       128 |      30463 |          1 |              7 |               0 |
|       256 |          0 |          0 |              0 |               0 |
|       512 |          0 |          0 |              0 |               0 |
|      1024 |          0 |          0 |              0 |               0 |
|      2048 |          0 |          0 |              0 |               0 |
|      4096 |          0 |          0 |              0 |               0 |
|      8192 |    2422533 |          1 |              0 |               0 |
|     16384 |          0 |          0 |              0 |               0 |
+-----------+------------+------------+----------------+-----------------+

select * from innodb_cmp'
+-----------+--------------+-----------------+---------------+------------------+----------------+-----------------+
| page_size | compress_ops | compress_ops_ok | compress_time | compress_ok_time | uncompress_ops | uncompress_time |
+-----------+--------------+-----------------+---------------+------------------+----------------+-----------------+
|      1024 |            0 |               0 |             0 |                0 |              0 |               0 |
|      2048 |            0 |               0 |             0 |                0 |              0 |               0 |
|      4096 |            0 |               0 |             0 |                0 |              0 |               0 |
|      8192 |      9471628 |         9468137 |          1460 |             1460 |           1369 |               0 |
|     16384 |            0 |               0 |             0 |                0 |              0 |               0 |

----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 60411084800; in additional pool allocated 0
Dictionary memory allocated 91454
Buffer pool size   3604480
Free buffers       0
Database pages     2422546
Old database pages 435281
Modified db pages  0
Read ahead: 0
Evicted after read ahead without access: 0
Percent pages dirty: 0.00
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Total writes: 0 LRU, 50754271 flush list, 0 single page
Write sources: fg free margin 0, bg free margin 0, bg dirty 34014, preflush 49576988, adaptive 77157, other 0, bg checkpoint 2708971, fg checkpoint 46868017
Neighbor pages flushed: 49815512 from list, 0 from LRU
Pages made young 809367, not young 7
0.00 youngs/s, 0.00 non-youngs/s
Pages read 19, created 2431200, written 50754271
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s
LRU len: 2422546, unzip_LRU len: 2392070
I/O sum[0]:cur[0], unzip sum[0]:cur[0]

Innodb variables before drop

ignore_builtin_innodb   OFF
innodb_adaptive_flushing        ON
innodb_adaptive_hash_index      ON
innodb_adaptive_hash_latch_cache        ON
innodb_additional_mem_pool_size 20971520
innodb_aio_old_usecs    2000000
innodb_auto_lru_dump    0
innodb_autoextend_increment     8
innodb_autoinc_lock_mode        1
innodb_background_checkpoint    ON
innodb_background_drop_table    ON
innodb_background_thread_interval_usecs 1000000
innodb_buffer_pool_size 59055800320
innodb_change_buffering inserts
innodb_checksums        ON
innodb_commit_concurrency       0
innodb_compression_level        6
innodb_compression_method       quicklz
innodb_concurrency_tickets      500
innodb_data_file_path   ibdata1:512M:autoextend
innodb_data_home_dir    /data/mysql
innodb_deadlock_detect  OFF
innodb_doublewrite      ON
innodb_expand_import    OFF
innodb_fast_checksums   ON
innodb_fast_free_list   OFF
innodb_fast_shutdown    1
innodb_file_format      Barracuda
innodb_file_format_check        Barracuda
innodb_file_per_table   ON
innodb_flush_log_at_trx_commit  1
innodb_flush_method     O_DIRECT
innodb_flush_neighbors_for_lru  ON
innodb_flush_neighbors_on_checkpoint    ON
innodb_force_recovery   0
innodb_ibuf_max_iops_when_below_limit   10
innodb_ibuf_max_pct_of_buffer_pool      20
innodb_ibuf_max_pct_of_io_capacity      50
innodb_io_capacity      2000
innodb_lock_wait_timeout        2
innodb_locks_unsafe_for_binlog  OFF
innodb_log_buffer_size  4194304
innodb_log_compressed_pages     OFF
innodb_log_file_size    536870912
innodb_log_files_in_group       2
innodb_log_group_home_dir       /data/mysql
innodb_lru_dump_old_pages       OFF
innodb_lru_load_max_entries     524288
innodb_max_dirty_pages_pct      75
innodb_max_purge_lag    0
innodb_merge_sort_block_size    1048576
innodb_mirrored_log_groups      1
innodb_old_blocks_pct   18
innodb_old_blocks_time  1000
innodb_open_files       10240
innodb_prepare_commit_mutex     ON
innodb_read_ahead_linear        OFF
innodb_read_ahead_threshold     56
innodb_read_io_threads  8
innodb_release_locks_early      OFF
innodb_replication_delay        0
innodb_retry_io_on_error        OFF
innodb_rollback_on_timeout      OFF
innodb_spin_wait_delay  6
innodb_stats_on_metadata        ON
innodb_stats_sample_pages       8
innodb_strict_mode      OFF
innodb_support_xa       ON
innodb_sync_checkpoint_limit    0
innodb_sync_spin_loops  30
innodb_table_locks      ON
innodb_thread_concurrency       128
innodb_thread_lifo      OFF
innodb_thread_sleep_delay       10000
innodb_use_purge_thread 0
innodb_use_sys_malloc   ON
innodb_version  1.0.13
innodb_write_io_threads 4
storage_engine  InnoDB
table_type      InnoDB

Only the tables to be dropped are in the buffer pool

After the drop. This drop seemed to be fast, but I did not time it.

'select * from innodb_cmpmem'
+-----------+------------+------------+----------------+-----------------+
| page_size | pages_used | pages_free | relocation_ops | relocation_time |
+-----------+------------+------------+----------------+-----------------+
|       128 |          0 |          0 |          13601 |               0 |
|       256 |          0 |          0 |              0 |               0 |
|       512 |          0 |          0 |              0 |               0 |
|      1024 |          0 |          0 |              0 |               0 |
|      2048 |          0 |          0 |              0 |               0 |
|      4096 |          0 |          0 |              0 |               0 |
|      8192 |          0 |          0 |        1196168 |               3 |
|     16384 |          0 |          0 |              0 |               0 |

select * from innodb_cmp'
+-----------+--------------+-----------------+---------------+------------------+----------------+-----------------+
| page_size | compress_ops | compress_ops_ok | compress_time | compress_ok_time | uncompress_ops | uncompress_time |
+-----------+--------------+-----------------+---------------+------------------+----------------+-----------------+
|      1024 |            0 |               0 |             0 |                0 |              0 |               0 |
|      2048 |            0 |               0 |             0 |                0 |              0 |               0 |
|      4096 |            0 |               0 |             0 |                0 |              0 |               0 |
|      8192 |      9471628 |         9468137 |          1460 |             1460 |           1369 |               0 |
|     16384 |            0 |               0 |             0 |                0 |              0 |               0 |

I am rerunning the repro cases now on two servers -- one that loads 6 X 15M rows and one that loads 6 X 20M rows. It does the IS queries and drop table immediately after the loads are done so there will be dirty pages in the buffer pool. I don't know if the presence of dirty pages has an impact.

The following line is of interest:

+-----------+------------+------------+----------------+-----------------+
| page_size | pages_used | pages_free | relocation_ops | relocation_time |
+-----------+------------+------------+----------------+-----------------+
|       128 |          0 |          0 |          13601 |               0 |

What this tells us is that though your data probably fits in the buffer pool but LRU eviction has started and we have some compressed only pages in the buffer pool. 128 byte allocations are for buf_page_t struct. When dealing with compression things get a little tricky in terms of what we mean by 'in-memory'. Say, your compressed data is 20G in size. Assuming 8K compressed page size, we can think of 'in-memory' data in two ways:
* if buffer pool is bigger than 60G then you'll be able to cache both compressed pages and the uncompressed version of the pages in the buffer pool. You'll know this because you'll probably not see any 128 byte allocations for buf_page_t structs which are used for compressed only pages.
* if buffer pool is roughly bigger than 24GB (10% slack for IO bound workloads) but less that 60GB then you'll still be able to cache all your on disk data in the buffer pool and won't see any significant IO activity but within buffer pool most of your pages will be in compressed only form meaning lots of decompression and buf_page_t allocations. From a user perspective this will still be an in-memory workload but the buffer pool logic will follow the extra code path of allocating/deallocating buf_page_t structs which can become a bottleneck.

The following line is of interest:

+-----------+------------+------------+----------------+-----------------+
| page_size | pages_used | pages_free | relocation_ops | relocation_time |
+-----------+------------+------------+----------------+-----------------+
|       128 |          0 |          0 |          13601 |               0 |

What this tells us is that though your data probably fits in the buffer pool but LRU eviction has started and we have some compressed only pages in the buffer pool. 128 byte allocations are for buf_page_t struct. When dealing with compression things get a little tricky in terms of what we mean by 'in-memory'. Say, your compressed data is 20G in size. Assuming 8K compressed page size, we can think of 'in-memory' data in two ways:
* if buffer pool is bigger than 60G then you'll be able to cache both compressed pages and the uncompressed version of the pages in the buffer pool. You'll know this because you'll probably not see any 128 byte allocations for buf_page_t structs which are used for compressed only pages.
* if buffer pool is roughly bigger than 24GB (10% slack for IO bound workloads) but less that 60GB then you'll still be able to cache all your on disk data in the buffer pool and won't see any significant IO activity but within buffer pool most of your pages will be in compressed only form meaning lots of decompression and buf_page_t allocations. From a user perspective this will still be an in-memory workload but the buffer pool logic will follow the extra code path of allocating/deallocating buf_page_t structs which can become a bottleneck.

Mark,

I believe we have nailed it down. The problem is indeed with buf_page_t allocation/deallocation. We're able to reduce drop database time to 8 seconds from unpatched version where it took upward of 20 minutes before choking on semaphore waits.

We have two different approaches to resolve this issue. We are going to decide shortly which one to pick.

Let me know when there is a patch I can try

For a similar issue not involving compressed tables see Bug #51325.

The expected solution for this is under test at the moment. We expect to release it more generally once the testers have confirmed that the problem is solved or at least greatly improved.

Does anyone have a _simple_ reproducible test case I can run for this that generates a noticeable delay in DROP TABLE (10+ minutes)? [ ideally shell script generated ] I've made workarounds and thus on the production systems I'm using I no longer experience the issue.  Trying some simple test cases does not seem to be triggering the issue.

The fix for this is in MySQL 5.5.15 and is expected to be in 5.1.59. At the time of writing it hasn't yet been added to the  release notes but that should happen soon.

Added to changelog:

The DROP TABLE command for an InnoDB table could be very slow, in a
 configuration with a combination of table compression, partitioning,
 and a large buffer pool.

If still suffering from slower than desirable drops, see MySQL bug #51325 (Oracle bug #11759044) for work on improving another part of dropping tables.

I recently had a chance to recreate the conditions from my original report. I upgraded to 5.5.15, then let it run normally for a few days before dropping all tables. This operation ran many times faster than in my original case, and I assume that any remaining performance issues are due to other issues.

Thanks. Good to have confirmation that the performance improvement works!