Bug #75534 Solve buffer pool mutex contention by splitting it
Submitted: 16 Jan 2015 19:28 Modified: 27 Feb 2017 13:36
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: buffer pool, contention, innodb, mutex, scalability

[16 Jan 2015 19:28] Laurynas Biveinis
Description:
Buffer pool mutex protects several data structures at once. It may become hot in some workloads.

Increasing the number of buffer pool instances does not always help, as some buffer pool instances (the ones that hot pages hash to) are naturally hotter than the others. And from algorithmic point of view, large number of buffer pool instances poses challenges for flushing algorithms at least.

How to repeat:
.

Suggested fix:
Split the mutex. Uploading a patch in a minute.
[16 Jan 2015 19:33] Laurynas Biveinis
Bug 75534 patch for 5.7.5

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug75534.patch (application/octet-stream, text), 199.66 KiB.

[16 Jan 2015 19:36] Laurynas Biveinis
This is the XtraDB buffer pool mutex split patch, included in versions 5.0 to 5.6. This version for 5.7 has been further improved. The patch was originally developed by Yasufumi Kinoshita and later maintained by me.

- Removes the buffer pool mutex. Introduces several new list/hash
  protecting mutexes, and access without any mutex to several
  variables. There atomic variables or os_rmb/os_wmb is used where
  deemed appropriate. volatile is not used.
  The new mutexes are
  - LRU_list_mutex for the LRU_list;
  - zip_free mutex for the zip_free arrays;
  - zip_hash mutex for the zip_hash hash and in_zip_hash flag;
  - free_list_mutex for the free_list and withdraw list. If desired,
    withdraw_list_mutex may be easily further split in the future.
  buf_pool->watch[] and all bpage protection has been moved to
  page_hash.
  The variables switched from buffer pool mutex protection to atomic
  operations and/or os_rmb/os_wmb. Particularly the uses of latter, while
  I tried to make them correct, might be very debatable.
  - srv_buf_pool_old_size, srv_buf_pool_size, srv_buf_pool_curr_size,
    srv_buf_pool_base_size
  - buf_pool->buddy_stat[i].used
  - buf_pool->curr_size, n_chunks_new
- Reduces critical section length or removes it completely for
  buf_block_buf_fix_inc/dec calls.
- Exploits the fact that freed pages must have no pointers to them
  from the buffer pool nor from any other thread except for the
  freeing one to remove redundant locking. The same applies to freshly
  allocated pages before any pointers to them are published. This
  however necessitates removing some of the debug checks that scan
  buffer pool chunks directly, as they don't have a way to freeze such
  blocks. (buf_block_align)
- Related to above, add more consistency asserts to
  buf_page_set_state. Add some scalability asserts (!mutex_own) too.
- buf_buddy_alloc rewritten not to require the buffer pool mutex at
  the start, which then might be released, and this fact propagated to
  the caller to make decisions to re-check things. It is now called
  with mutexes unlocked, and the caller buf_page_init_for_read
  algorithm has been simplified. All its allocations now happen with
  mutexes unlocked.
- buf_flush_LRU_list_batch uses mutex_enter_nowait to skip over any
  currently-locked blocks.
- Removed some outdated buf0buf.cc comments.

Bugs fixed fully or partially, besides the current one:
- http://bugs.mysql.com/bug.php?id=64344 fixed buf_page_init_for_read
  holding mutexes while allocating memory. It also should be easier to
  fix buf_LRU_free_page now.
- http://bugs.mysql.com/bug.php?id=75503
- http://bugs.mysql.com/bug.php?id=75504
[19 Jan 2015 16:23] MySQL Verification Team
Fully verified.
[22 Jan 2015 5:32] Laurynas Biveinis
The patch was produced for the 5.7.5 tree with some other small InnoDB fixes applied, most notably one for bug 71411. Thus it might fuzz a bit if applying on clean 5.7.5, but it's orthogonal to those other fixes.
[4 Feb 2015 8:07] Laurynas Biveinis
Bug 75534 patch for 5.7.5, v2

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug75534-2.patch (application/octet-stream, text), 205.82 KiB.

[4 Feb 2015 8:08] Laurynas Biveinis
Updated patch for 5.7.5. Passes MTR: regular, ASAN, and Valgrind.

    Changes from the previous submission:
    - removed a spurious debugging fprintf(stderr);
    - fixed a debug build assertion reporting a lock order violation
      in buffer pool resize in the case of multiple instances, added a
      testcase innodb_buffer_pool_resize_multiple_pools. Details at
      https://bugs.launchpad.net/percona-server/+bug/1414257.
    - removed the "fix" of a non-bug 75503.
    - fixed a typo and a missing dirty page check condition in
      innodb_buffer_pool_evict_uncompressed, added a testcase
      innodb_buffer_pool_debug.
    - Added an old XtraDB regression testcase
      (https://bugs.launchpad.net/percona-xtradb/+bug/317074) as
      innodb_zip/innodb-buffer-pool. It might be of limited value now,
      nevertheless it's here for consideration.
    - Fixed a Valgrind annotation race condition in
      buf_LRU_block_free_non_file_page where a frame would be marked
      as unallocated after putting the block back to the free list and
      releasing its mutex. Another thread might have allocated the
      same block meanwhile, then getting its frame declared as
      unallocated, resulting in spurious Valgrind errors. While at
      that, do not bother marking the frame as undefined right before
      marking it as unallocated.
[1 Apr 2016 11:02] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.8.0 release, and here's the changelog entry:

To address contention that could occur under some workloads, the buffer
pool mutex was removed and replaced by several list and hash protecting
mutexes. Also, several buffer pool related variables no longer require
buffer pool mutex protection. 

Thanks to Yasufumi Kinoshita and Laurynas Biveinis for the patch.
[27 Feb 2017 13:36] Laurynas Biveinis
See bug 85205, bug 85208.