Bug #61341 buf_LRU_insert_zip_clean can be O(N) on LRU length
Submitted: 28 May 2011 15:37 Modified: 5 Jan 2012 20:15
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: InnoDB Plugin storage engine Severity:S5 (Performance)
Version:5.1.52 OS:Any
Assigned to: Marko Mäkelä CPU Architecture:Any
Tags: compressed, innodb

[28 May 2011 15:37] Mark Callaghan
I have been running benchmarks using a read-only workload on compressed InnoDB tables. The database is not cached in the InnoDB buffer pool. So the test initially stores a compressed and uncompressed copy of every page (frame?) and then begins evicting the uncompressed copies when there are no more free pages. I noticed a performance dip during the test.

In this case I am running with two user connections and at test start I get ~300 requests/second which is expected given the disks I use. However, throughput drops to ~60 requests/second after a few minutes and that should not occur.

The common thread stack is:

So I looked at buf_LRU_insert_zip_clean and it has a do...while loop that can be O(N) on the size of the LRU. That will not scale. To confirm this is a problem I added count to count the number of loop iterations per call. Right now there are ~137,000 pages on the LRU and my extra code is reporting many instances of 50,000 or more iterations of the loop per call.

Insert a compressed block into buf_pool->zip_clean in the LRU order. */
        buf_page_t*     bpage)  /*!< in: pointer to the block in question */
        buf_page_t*     b;

        ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE);

        /* Find the first successor of bpage in the LRU list
        that is in the zip_clean list. */
        b = bpage;
        do {
                b = UT_LIST_GET_NEXT(LRU, b);
        } while (b && buf_page_get_state(b) != BUF_BLOCK_ZIP_PAGE);

        /* Insert bpage before b, i.e., after the predecessor of b. */
        if (b) {
                b = UT_LIST_GET_PREV(list, b);

        if (b) {
                UT_LIST_INSERT_AFTER(list, buf_pool->zip_clean, b, bpage);
        } else {
                UT_LIST_ADD_FIRST(list, buf_pool->zip_clean, bpage);

How to repeat:
Run a read-only workload on a compressed table. Watch throughput over time starting with a cold buffer pool. Initially length(LRU) ~= length(unzip_LRU) but once uncompressed pages must be evicted from the unzip_LRU the problem seems to begin.
[28 May 2011 15:40] Mark Callaghan
I repeated some of the tests from https://www.facebook.com/notes/mysql-at-facebook/innodb-compression-for-read-only-workload... and noticed that for the compressed table tests I was warming the buffer pool by mistake prior to starting the test. After changing the test to use a cold buffer pool I noticed the problem.
[28 May 2011 16:04] Mark Callaghan
It takes between 15 and 30 minutes to get out of this state which might be the amount of time it takes for the buffer pool to reach a steady state with length(unzip_LRU) about 10% of length(LRU)
[30 May 2011 12:29] Marko Mäkelä
Good catch, Mark. The funny thing is that it turns out that the buf_pool->zip_clean list is not really needed for anything. It was going to be used for resizing the buffer pool, a feature that has still not been implemented.

I tried removing zip_clean. The only dependence is in buf_get_latched_pages_number(), which would no longer include any clean (non-dirty) compressed-only blocks in the count. Luckily, this is only exposed in debug builds:

		= buf_get_latched_pages_number();
#endif /* UNIV_DEBUG */
[11 Nov 2011 6:08] Mark Callaghan
Is this fixed in 5.1 or 5.5? We are using compression on a few servers and this causes lots of stalls that are hard to debug. Would be nice for others to get an official fix.
[15 Nov 2011 12:26] Marko Mäkelä
Sorry Mark, it looks like that we forgot to update the bug status.

This was fixed already in June, by putting zip_clean behind #ifdef UNIV_DEBUG or UNIV_BUF_DEBUG. It was merged in this changeset:

revno: 3562
revision-id: kent.boortz@oracle.com-20110703154737-d27i4ypu2a0ran21
parent: kent.boortz@oracle.com-20110630153713-9mk02181m1d70g6o
parent: georgi.kodinov@oracle.com-20110607124316-rfzy0xt3uopdbkvm
committer: Kent Boortz <kent.boortz@oracle.com>
branch nick: mysql-5.1
timestamp: Sun 2011-07-03 17:47:37 +0200

Judging from the revno, this narrowly missed the 5.1.59 release.
[5 Jan 2012 20:15] John Russell
Added to the 5.1.60 changelog:

The InnoDB buffer pool management code was optimized for handling
pages from compressed tables. This fixes a slowdown that could occur
particularly during the warmup period for the buffer pool.