Bug #68022 DROP TABLE slow when it decompress compressed-only pages
Submitted: 3 Jan 2013 18:00 Modified: 7 Jul 2013 11:28
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S1 (Critical)
Version:5.1,5.5 OS:Any
Assigned to: CPU Architecture:Any

[3 Jan 2013 18:00] Mark Callaghan
Description:
My standard disclaimer applies here. This problem occurs on a heavily patched version of MySQL 5.1. But from reading code for official 5.1 and 5.5 I think the problem exists there.

DROP TABLE is slow for compressed InnoDB tables and sometimes locks up servers for 30 seconds. The diagnosis from Harrison is:

So this looks like a three way blockage.  The DROP TABLE thread holds the dictionary lock and does some operation which is taking a long time.  Another thread then comes in and holds LOCK_open and then wants to get the dictionary lock.  All other threads then hang on the LOCK_open mutex.

The problem is that during the DROP, compressed-only pages from that table that are in the buffer pool will be decompressed.

How to repeat:
Do DROP TABLE for a compressed InnoDB table when many pages from that table are compressed-only in the buffer pool (as in the buffer pool has compressed copies of pages for that table but not uncompressed copies). Observe that the pages are decompressed. The interesting part of the stack trace is listed below.

#0  0x0000000000914dae in inflate (strm=<optimized out>, flush=2) at inflate.c:1058
#1  0x00000000007c0c0a in page_zip_decompress_clust (heap=0x2b2606c7f000, offsets=0x2b2606c861d0, trx_id_col=1, index=0x2b24c70ddc90, n_dense=<optimized out>, recs=0x2b2606c7f090, d_stream=0x2b272370b0a0, page_zip=0x2b140701a670) at page/page0zip.c:2855
#2  page_zip_decompress (page_zip=<optimized out>, page=0x3d0 <Address 0x3d0 out of bounds>, all=<optimized out>, space_id=2270) at page/page0zip.c:3265
#3  0x00000000008424d4 in buf_zip_decompress (block=0x2b140701a660, check=<optimized out>) at buf/buf0buf.c:1558
#4  0x0000000000842f2e in buf_page_get_gen (space=2270, zip_size=8192, offset=23938, rw_latch=2, guess=<optimized out>, mode=12, file=0x9d4edf "btr/btr0sea.c", line=1239, mtr=0x2b272370b280) at buf/buf0buf.c:1981
#5  0x000000000083d4b2 in btr_search_drop_page_hash_when_freed (space=<optimized out>, zip_size=<optimized out>, page_no=<optimized out>) at btr/btr0sea.c:1238
#6  0x000000000087d0f5 in fseg_free_extent (seg_inode=0x2b1f63a440f2 "", space=2270, zip_size=8192, page=23936, mtr=0x2b272370b840) at fsp/fsp0fsp.c:3535
#7  0x000000000088022c in fseg_free_step (header=0x2b1a13df804a "", mtr=0x2b272370b840) at fsp/fsp0fsp.c:3630
#8  0x0000000000821eeb in btr_free_but_not_root (space=2270, zip_size=8192, root_page_no=3) at btr/btr0btr.c:1491
#9  0x000000000085331f in dict_drop_index_tree (rec=0x2b1be4208156 "", mtr=0x2b272370c130) at dict/dict0crea.c:724

Suggested fix:
Change buf_page_get_gen to not decompress pages when BUF_PEEK_IF_IN_POOL is set.
[3 Jan 2013 18:05] Mark Callaghan
This is part of the problem stack trace. Why are pages getting decompressed given that btr_search_drop_page_hash_when_freed calls buf_page_get_gen with BUF_PEEK_IF_IN_POOL?

inflate
page_zip_decompress_clust
page_zip_decompress
buf_zip_decompress
buf_page_get_gen
btr_search_drop_page_hash_when_freed
fseg_free_extent
fseg_free_step
btr_free_but_not_root

buf_page_get_gen returns early for BUF_PEEK_IF_IN_POOL:
1) when the page is not in the buffer pool
2) when a disk read is pending for that page

For the case when the page is compressed-only in the buffer pool, it will decompress the page. That can make DROP TABLE very slow. I don't know why it is done. From reading official code for 5.1 and 5.5 I think they do the same, so I don't think this problem is limited to the Facebook patch but I have not tried to reproduce the problem there.
[4 Jan 2013 12:30] Nizameddin Ordulu
InnoDB goes through the in-memory pages of the to-be-dropped compressed table
in order to remove the table's records from adaptive hash index. While doing so
it decompresses those pages whose uncompressed copy is not in the memory.

This is wasteful and unnecessary because if the uncompressed copy of a page
is not in the memory then its records can not be in AHI. The uncompressed copy
of the page is removed by buf_LRU_free_block() and it already calls
btr_search_drop_page_hash_index() to remove all the records of the page from
AHI.

If a page is compressed-only, then buf_page_get_gen() can not return a buf_block_t*
without doing too much work so I decided to just return NULL. From the perspective
of the caller this should not make any difference because if the page doesn't have
any records in AHI then the caller doesn't do anything with the page.

patch:

diff --git a/storage/innodb_plugin/buf/buf0buf.c b/storage/innodb_plugin/buf/buf0buf.c
index d384f3f..4c127b5 100644
--- a/storage/innodb_plugin/buf/buf0buf.c
+++ b/storage/innodb_plugin/buf/buf0buf.c
@@ -1864,6 +1864,16 @@ loop:
 
        case BUF_BLOCK_ZIP_PAGE:
        case BUF_BLOCK_ZIP_DIRTY:
+               if (mode == BUF_PEEK_IF_IN_POOL) {
+                       /* Do not decompress a page if the caller is only peeking to
+                        * see if the page is in the buffer pool.
+                        * In this case, the block object is invalid because
+                        * buf_LRU_free_block() does not allocate a buf_block_t object
+                        * for the half-freed page. We therefore return NULL.
+                        */
+                       buf_pool_mutex_exit();
+                       return(NULL);
+               }
                bpage = &block->page;
                /* Protect bpage->buf_fix_count. */
                mutex_enter(&buf_pool_zip_mutex);
[20 Apr 2013 13:43] Laurynas Biveinis
Is the Verified status correct?

(5.1 is not pushed ATM)

5.5$ bzr log -r 4170 -n0
------------------------------------------------------------
revno: 4170 [merge]
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: mysql-5.5
timestamp: Mon 2013-01-21 15:19:18 +0200
message:
  Merge mysql-5.1 to mysql-5.5.
    ------------------------------------------------------------
    revno: 2661.830.82
    committer: Marko Mäkelä <marko.makela@oracle.com>
    branch nick: mysql-5.1
    timestamp: Mon 2013-01-21 14:59:49 +0200
    message:
      Bug#16067973 DROP TABLE SLOW WHEN IT DECOMPRESS COMPRESSED-ONLY PAGES
      
      buf_page_get_gen(): Do not attempt to decompress a compressed-only
      page when mode == BUF_PEEK_IF_IN_POOL. This mode is only being used by
      btr_search_drop_page_hash_when_freed(). There cannot be any adaptive
      hash index pointing to a page that does not exist in uncompressed
      format in the buffer pool.
      
      innodb_buffer_pool_evict_update(): New function for debug builds, to handle
      SET GLOBAL innodb_buffer_pool_evicted='uncompressed'
      by evicting all uncompressed page frames of compressed tablespaces
      from the buffer pool.
      
      rb#1873 approved by Jimmy Yang
[7 Jul 2013 11:28] James Day
Updated status to closed, here's the missing status update from 29 Jan 2013:

-----
Added to changelog for 5.1.69, 5.5.31, 5.6.11:

The DROP TABLE statement for a table using compression could be
slower than necessary, causing a stall for several seconds. MySQL was
unnecessarily decompressing pages in the buffer pool related to the
table as part of the DROP operation.
-----