Bug #45015 | InnoDB buffer pool can be severely affected by table scans | ||
---|---|---|---|
Submitted: | 21 May 2009 18:10 | Modified: | 20 Jun 2010 17:18 |
Reporter: | Harrison Fisk | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S4 (Feature request) |
Version: | 5.0/5.1 | OS: | Any |
Assigned to: | Marko Mäkelä | CPU Architecture: | Any |
Tags: | buffer pool, innodb, LRU, table scan |
[21 May 2009 18:10]
Harrison Fisk
[21 May 2009 18:21]
Harrison Fisk
Some URLs which are relevant: Midpoint insertion strategy used by the MyISAM key cache: http://dev.mysql.com/doc/refman/5.0/en/midpoint-insertion.html ARC systems: http://en.wikipedia.org/wiki/Adaptive_Replacement_Cache Q&A with Heikki saying there is no table scan resistance in InnoDB: http://www.mysqlperformanceblog.com/2007/10/26/heikki-tuuri-innodb-answers-part-i/
[16 Jun 2009 8:35]
Marko Mäkelä
InnoDB already implements an LRU cache that is divided into two parts, like MyISAM. Some more pointers: http://en.wikipedia.org/wiki/Page_replacement_algorithm http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-11.pdf The latter introduces LIRS (Low Inter-reference Recency Set), an improvement over LRU. There would be a few challenges in an InnoDB implementation: 1. How to store the non-resident set of blocks (and how much of it to store)? Many advanced algorithms (LIRS, LRU-K, ARC, …) keep track of some non-resident blocks, to have access to more history. 2. How to efficiently compute the IRR (Inter-Reference Recency, number of other *distinct* blocks accessed between two consecutive references to the block)? 3. How to modify table and index scans so that they only invoke buf_page_get_gen() once per B-tree page, as required by the IRR computation? Without this change, table scans would get IRR=0 instead of the desired IRR=∞. Currently, InnoDB will store the persisent cursor position, commit the mini-transaction (releasing the page) and restore the cursor (re-latching the page) each time it returns a record to the user. The prebuilt->fetch_cache[] alleviates this a little, but it is not tightly coupled with B-tree pages.
[27 Aug 2009 7:22]
Marko Mäkelä
InnoDB Plugin 1.0.5 will address this by introducing two settable global variables, innodb_old_blocks_pct and innodb_old_blocks_time. The parameter innodb_old_blocks_pct (5..95) controls the desired amount of "old" blocks in the LRU list. The default is 37, corresponding to the old fixed ratio of 3/8. Each time a block is accessed, it will be moved to the "new" blocks if its first access was at least innodb_old_blocks_time milliseconds ago (default 0, meaning every block). The idea is that in index scans, blocks will be accessed a few times within innodb_old_blocks_time, and they will remain in the "old" section of the LRU list. Thus, when innodb_old_blocks_time is nonzero, blocks retrieved for one-time index scans will be more likely candidates for eviction than blocks that are accessed in random patterns.
[8 Sep 2009 11:20]
James Day
Marko, am I right in thinking that innodb_old_blocks_time = 1 would often be reasonably close in meaning to "if table scans are uncommon, only move non-leaf nodes to the old block list"? If so, 1 seems like a better default than 0 because it would reduce the surprise effect that unexpected table scans produce for end users, by greatly decreasing the number of pages that become old during a scan. Maybe better to suggest it in documentation than change the default behavior at this time.
[8 Sep 2009 12:15]
Marko Mäkelä
James, you are probably right. When the adaptive hash index works efficiently, read-heavy workloads should access mostly the leaf pages. However, we are somewhat reluctant to change the default behaviour in the built-in InnoDB of MySQL 5.1. A change of the default setting could be considered for the InnoDB Plugin.
[14 Oct 2009 14:39]
Bugs System
Pushed into 5.1.41 (revid:joro@sun.com-20091014143611-cphb0enjlx6lpat1) (version source revid:satya.bn@sun.com-20091008130559-4b6sduldhka30zuz) (merge vers: 5.1.40) (pib:13)
[22 Oct 2009 6:35]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091019135554-s1pvptt6i750lfhv) (merge vers: 6.0.14-alpha) (pib:13)
[22 Oct 2009 7:08]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091022060553-znkmxm0g0gm6ckvw) (version source revid:alik@sun.com-20091019131022-2o2ymjfjjoraq833) (merge vers: 5.5.0-beta) (pib:13)
[3 Nov 2009 21:31]
Paul DuBois
Noted in 5.1.41, 5.5.0 changelogs. (Feature is not in 6.0.14) In the default operation of the InnoDB buffer pool, a block is loaded at the midpoint for the first access and then moved immediately to the head of the list as soon as another access occurs. In the case of a table scan (such as performed for a mysqldump operation), each block read by the scan ends up moving to the head of the list because multiple rows are accessed from each block. This occurs even for a one-time scan, where the blocks are not otherwise used by other queries. Blocks may also be loaded by the read-ahead background thread and then moved to the head of the new sublist by a single access. These effects are disadvantageous because they push blocks that are in heavy use by other queries out of the new sublist to the old sublist where they become subject to eviction. InnoDB Plugin now provides two system variables that enable LRU algorithm tuning: innodb_old_blocks_pct: Specifies the approximate percentage of the buffer pool used for the old block sublist. The range of values is 5 to 95. The default value is 37 (that is, 3/8 of the pool). innodb_old_blocks_time: Specifies how long in milliseconds (ms) a block inserted into the old sublist must stay there before it can be moved to the new sublist. The default value is 0, so after a block is inserted into the old sublist, it moves immediately to the new sublist the next time it is accessed, no matter how soon after insertion the next access occurs. If the value is greater than 0, blocks remain in the old sublist until an access occurs at least that many ms after initial insertion. For example, a value of 1000 causes blocks to stay in the old sublist for 1 second before moving to the new sublist.
[4 Nov 2009 15:56]
Paul DuBois
For additional information about the new system variables, see: http://dev.mysql.com/doc/refman/5.1/en/innodb-buffer-pool.html
[18 Dec 2009 10:35]
Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:51]
Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 11:06]
Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:20]
Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)
[5 May 2010 15:17]
Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 17:50]
Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[28 May 2010 5:49]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:19]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 6:46]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[29 May 2010 23:24]
Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[15 Jun 2010 8:16]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100615080459-smuswd9ooeywcxuc) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (merge vers: 5.1.47) (pib:16)
[15 Jun 2010 8:32]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100615080558-cw01bzdqr1bdmmec) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (pib:16)
[17 Jun 2010 11:50]
Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:28]
Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:15]
Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[2 Mar 2012 8:52]
MySQL Verification Team
Take a look at this: http://www.mysqlperformanceblog.com/2011/11/13/side-load-may-massively-impact-your-mysql-p...
[6 Mar 2012 23:53]
James Day
Yes, setting innodb_old_blocks_time to 1000 to 5000 is a good thing and likely to be helpful for quite a lot of workloads.