Bug #70406 buf_pool_get_dirty_pages_count() needs not to lock the buffer pool mutex
Submitted: 24 Sep 2013 9:31 Modified: 25 Sep 2013 7:28
Reporter: Laurynas Biveinis Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any

[24 Sep 2013 9:31] Laurynas Biveinis
buf_pool_get_dirty_pages_count() locks both buffer pool and its flush list mutexes.  But it only needs the latter.  The former was added by [1] to avoid seeing BUF_BLOCK_REMOVE_HASH pages in the flush list in the middle of relocation.  But there is nothing wrong in seeing them and in fact they can be considered to be BUF_BLOCK_FILE_PAGE here, i.e. check if dirty and bump count.  This would correct I believe and not unlike how buf_flush_validate_low() handles BUF_BLOCK_REMOVE_HASH pages.

I realize that redundant locking for one code path in debug builds is not a very critical bug, but am reporting it since happened to find it.


5.6$ bzr log -r 3690.34.32
revno: 3690.34.32
committer: Sunny Bains <Sunny.Bains@Oracle.Com>
branch nick: trunk
timestamp: Thu 2012-05-31 19:09:30 +1000
  Needs the buffer pool mutex to be held for the duration of the fetch count
  of pages matching some space id. When we relocate a page, first we change the
  state to remove from hash and then remove the page from the flush list.
  We release the page mutex after changing the state, therefore holding it
  is not sufficient. We must hold the buffer pool mutex while scanning the
  flush list. Note: This is debug code.
  Approved by Marko over IM

How to repeat:
Code analysis

Suggested fix:
In buf_pool_get_dirty_pages_count(), remove buf_pool_mutex_enter()/buf_pool_mutex_exit() calls, replace ut_ad(buf_page_in_file(...)) with ut_ad(... || buf_page_get_state(bpage) == BUF_BLOCK_REMOVE_HASH)
[24 Sep 2013 11:30] MySQL Verification Team
Hello Laurynas,

Thank you for the report.

[25 Sep 2013 7:28] Laurynas Biveinis
For the suggested fix, buf_pool_get_dirty_pages_count() also needs to replace buf_page_get_space() with direct bpage->space.

Which starts to look like a lot of trouble for a very minor bug :)