MySQL Bugs: #61341: buf_LRU_insert_zip_clean can be O(N) on LRU length

Bug #61341	buf_LRU_insert_zip_clean can be O(N) on LRU length
Submitted:	28 May 2011 15:37	Modified:	5 Jan 2012 20:15
Reporter:	Mark Callaghan	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB Plugin storage engine	Severity:	S5 (Performance)
Version:	5.1.52	OS:	Any
Assigned to:	Marko Mäkelä	CPU Architecture:	Any
Tags:	compressed, innodb

Description:
I have been running benchmarks using a read-only workload on compressed InnoDB tables. The database is not cached in the InnoDB buffer pool. So the test initially stores a compressed and uncompressed copy of every page (frame?) and then begins evicting the uncompressed copies when there are no more free pages. I noticed a performance dip during the test.

In this case I am running with two user connections and at test start I get ~300 requests/second which is expected given the disks I use. However, throughput drops to ~60 requests/second after a few minutes and that should not occur.

The common thread stack is:
buf_LRU_insert_zip_clean,buf_LRU_free_block,buf_LRU_search_and_free_block,buf_LRU_get_free_block,buf_buddy_alloc_low,buf_page_init_for_read,buf_read_page_low,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_for_mysql,ha_innobase::index_read,handler::index_read_idx_map,join_read_const,join_read_const_table,make_join_statistics,JOIN::optimize,mysql_select,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command

So I looked at buf_LRU_insert_zip_clean and it has a do...while loop that can be O(N) on the size of the LRU. That will not scale. To confirm this is a problem I added count to count the number of loop iterations per call. Right now there are ~137,000 pages on the LRU and my extra code is reporting many instances of 50,000 or more iterations of the loop per call.

/********************************************************************//**
Insert a compressed block into buf_pool->zip_clean in the LRU order. */
UNIV_INTERN
void
buf_LRU_insert_zip_clean(
/*=====================*/
        buf_page_t*     bpage)  /*!< in: pointer to the block in question */
{
        buf_page_t*     b;

        ut_ad(buf_pool_mutex_own());
        ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE);

        /* Find the first successor of bpage in the LRU list
        that is in the zip_clean list. */
        b = bpage;
        do {
                b = UT_LIST_GET_NEXT(LRU, b);
        } while (b && buf_page_get_state(b) != BUF_BLOCK_ZIP_PAGE);

        /* Insert bpage before b, i.e., after the predecessor of b. */
        if (b) {
                b = UT_LIST_GET_PREV(list, b);
        }

        if (b) {
                UT_LIST_INSERT_AFTER(list, buf_pool->zip_clean, b, bpage);
        } else {
                UT_LIST_ADD_FIRST(list, buf_pool->zip_clean, bpage);
        }
}

How to repeat:
Run a read-only workload on a compressed table. Watch throughput over time starting with a cold buffer pool. Initially length(LRU) ~= length(unzip_LRU) but once uncompressed pages must be evicted from the unzip_LRU the problem seems to begin.

I repeated some of the tests from https://www.facebook.com/notes/mysql-at-facebook/innodb-compression-for-read-only-workload... and noticed that for the compressed table tests I was warming the buffer pool by mistake prior to starting the test. After changing the test to use a cold buffer pool I noticed the problem.

It takes between 15 and 30 minutes to get out of this state which might be the amount of time it takes for the buffer pool to reach a steady state with length(unzip_LRU) about 10% of length(LRU)

Good catch, Mark. The funny thing is that it turns out that the buf_pool->zip_clean list is not really needed for anything. It was going to be used for resizing the buffer pool, a feature that has still not been implemented.

I tried removing zip_clean. The only dependence is in buf_get_latched_pages_number(), which would no longer include any clean (non-dirty) compressed-only blocks in the count. Luckily, this is only exposed in debug builds:

#ifdef UNIV_DEBUG
	export_vars.innodb_buffer_pool_pages_latched
		= buf_get_latched_pages_number();
#endif /* UNIV_DEBUG */

Is this fixed in 5.1 or 5.5? We are using compression on a few servers and this causes lots of stalls that are hard to debug. Would be nice for others to get an official fix.

Sorry Mark, it looks like that we forgot to update the bug status.

This was fixed already in June, by putting zip_clean behind #ifdef UNIV_DEBUG or UNIV_BUF_DEBUG. It was merged in this changeset:

revno: 3562
revision-id: kent.boortz@oracle.com-20110703154737-d27i4ypu2a0ran21
parent: kent.boortz@oracle.com-20110630153713-9mk02181m1d70g6o
parent: georgi.kodinov@oracle.com-20110607124316-rfzy0xt3uopdbkvm
committer: Kent Boortz <kent.boortz@oracle.com>
branch nick: mysql-5.1
timestamp: Sun 2011-07-03 17:47:37 +0200

Judging from the revno, this narrowly missed the 5.1.59 release.

Added to the 5.1.60 changelog:

The InnoDB buffer pool management code was optimized for handling
pages from compressed tables. This fixes a slowdown that could occur
particularly during the warmup period for the buffer pool.