Bug #54346 excessive LRU flushing with multiple buffer pool instances
Submitted: 8 Jun 2010 18:28 Modified: 14 Dec 2010 20:16
Reporter: Mikhail Izioumtchenko Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S2 (Serious)
Version:1.1 OS:Any
Assigned to: Inaam Rana

[8 Jun 2010 18:28] Mikhail Izioumtchenko
Description:
LRU flushing is done with buf_flush_free_margin() in 1.0. It is done in 1.1 
with multiple buffer pool instances with either of:

buf_flush_free_margin(buf_pool)
buf_flush_free_margins() which invokes buf_flush_free_margin() for each
buffer pool.

Either way buf_flush_LRU_recommendation() is used in buf_flush_free_margin()
to calculate the desired number of pages to flush. 
The recommendation is unchanged in 1.1 and doesn't seem to depend on
the number of buffer pool instances:

return(BUF_FLUSH_FREE_BLOCK_MARGIN(buf_pool)
               + BUF_FLUSH_EXTRA_MARGIN(buf_pool)
               - n_replaceable);

where the positive part amounts to 5+64 + (5+64)/4+100 for most buffer pool sizes. So we used to add 100, now we often add 100 per each buffer pool instance.

How to repeat:
See the code. Performance testing (dbt2) shows increased IO and decrease
in dirty page counts in 5.5 when innodb-buffer-pool-instances>1, 
often with some decrease in performance.

Suggested fix:
/= buffer_pool_instances applied to either the positive part, or the total
recommendation. Or something that takes into the account that the readahead 
area is still 64. For example, add the entire readahead size in buf_flush_LRU_recommendation, but with the probability of 1/buffer_pool_instances. The value of adaptive_flushing may affect the optimal formula, too.
[10 Jun 2010 15:11] Mikhail Izioumtchenko
I don't think attempting to flush 6400 pages where we used to flush 100
is such a good idea. And I start to think that flushing more to help
readahead does more harm than good.
Here is the result for a couple of simple patches, this shows at least that more testing
is needed:

 Cores: 8/8 dscczz01 Thu Jun 10 15:03:48 2010 dscczz01:/spare2/mizioumt/ctc/dbt2runs.11

  1: c552:iobl,bpi16
  2: c553:iobl,bpi16
  3: c55:iobl,bpi16
    conn BUYS         1         2         3
 ===========================================
      16 8.3% 1:5511.19 2:5245.74 3:5089.72

16 buffer pools, 1G total (that's still 16, right? It becomes 1 buffer pool
when it's <1G, not <=1G iirc).
2x1G redo logs, io-capacity 2000, adaptive flushing and native aio are on.

c55 is the unchanged mysql-trunk-innodb r3098.

c552 is the patch that flushes less.

=== modified file 'storage/innobase/include/buf0flu.h'
--- storage/innobase/include/buf0flu.h  2010-04-19 15:44:15 +0000
+++ storage/innobase/include/buf0flu.h  2010-06-09 19:51:36 +0000
@@ -226,10 +226,10 @@
 available to replacement in the free list and at the end of the LRU list (to
 make sure that a read-ahead batch can be read efficiently in a single
 sweep). */
-#define BUF_FLUSH_FREE_BLOCK_MARGIN(b) (5 + BUF_READ_AHEAD_AREA(b))
+#define BUF_FLUSH_FREE_BLOCK_MARGIN(b) ((5 + BUF_READ_AHEAD_AREA(b))/srv_buf_pool_instances)
 /** Extra margin to apply above BUF_FLUSH_FREE_BLOCK_MARGIN */
-#define BUF_FLUSH_EXTRA_MARGIN(b)      (BUF_FLUSH_FREE_BLOCK_MARGIN(b) / 4 \
-                                       + 100)
+#define BUF_FLUSH_EXTRA_MARGIN(b)      ((BUF_FLUSH_FREE_BLOCK_MARGIN(b) / 4 \
+                                       + 100)/srv_buf_pool_instances)
 #endif /* !UNIV_HOTBACKUP */

 #ifndef UNIV_NONINL

c553 includes the above but also looks at fewer blocks, works a bit worse 
than c552.

=== modified file 'storage/innobase/include/buf0flu.h'
--- storage/innobase/include/buf0flu.h  2010-04-19 15:44:15 +0000
+++ storage/innobase/include/buf0flu.h  2010-06-09 19:55:19 +0000
@@ -226,10 +226,10 @@
 available to replacement in the free list and at the end of the LRU list (to
 make sure that a read-ahead batch can be read efficiently in a single
 sweep). */
-#define BUF_FLUSH_FREE_BLOCK_MARGIN(b) (5 + BUF_READ_AHEAD_AREA(b))
+#define BUF_FLUSH_FREE_BLOCK_MARGIN(b) ((5 + BUF_READ_AHEAD_AREA(b))/srv_buf_pool_instances)
 /** Extra margin to apply above BUF_FLUSH_FREE_BLOCK_MARGIN */
-#define BUF_FLUSH_EXTRA_MARGIN(b)      (BUF_FLUSH_FREE_BLOCK_MARGIN(b) / 4 \
-                                       + 100)
+#define BUF_FLUSH_EXTRA_MARGIN(b)      ((BUF_FLUSH_FREE_BLOCK_MARGIN(b) / 4 \
+                                       + 100)/srv_buf_pool_instances)
 #endif /* !UNIV_HOTBACKUP */

 #ifndef UNIV_NONINL

=== modified file 'storage/innobase/include/buf0lru.h'
--- storage/innobase/include/buf0lru.h  2010-05-20 13:16:32 +0000
+++ storage/innobase/include/buf0lru.h  2010-06-09 19:58:28 +0000
@@ -73,7 +73,7 @@
 #define BUF_LRU_OLD_MIN_LEN    512     /* 8 megabytes of 16k pages */

 /** Maximum LRU list search length in buf_flush_LRU_recommendation() */
-#define BUF_LRU_FREE_SEARCH_LEN(b)     (5 + 2 * BUF_READ_AHEAD_AREA(b))
+#define BUF_LRU_FREE_SEARCH_LEN(b)     ((5 + 2 * BUF_READ_AHEAD_AREA(b))/srv_buf_pool_instances)

 /******************************************************************//**
 Invalidates all pages belonging to a given tablespace when we are deleting
[10 Jun 2010 15:19] Mikhail Izioumtchenko
the above was for 16 connections and 50 wh dbt2 database which is 4.7G.
I'm leaving my comments public since the entire bug is private,
not sure if I'm right here.
[6 Oct 2010 19:14] John Russell
Adding to change log:

With multiple buffer pools enabled, InnoDB could flush more data from
the buffer pool than necessary, causing extra I/O overhead.
[9 Nov 2010 19:49] Bugs System
Pushed into mysql-5.5 5.5.7-rc (revid:sunanda.menon@sun.com-20101109182959-otkxq8vo2dcd13la) (version source revid:marko.makela@oracle.com-20100824081003-v4ecy0tga99cpxw2) (merge vers: 5.1.50) (pib:21)
[13 Nov 2010 16:15] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:marko.makela@oracle.com-20100824081003-v4ecy0tga99cpxw2) (merge vers: 5.1.50) (pib:21)
[13 Nov 2010 16:41] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:marko.makela@oracle.com-20100824081003-v4ecy0tga99cpxw2) (pib:21)