MySQL Bugs: #69258: does buf_LRU_buf_pool_running_out need to lock buffer pool mutexes

Bug #69258	does buf_LRU_buf_pool_running_out need to lock buffer pool mutexes
Submitted:	16 May 2013 14:57	Modified:	5 Sep 2013 18:21
Reporter:	Mark Callaghan	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:	5.6.11	OS:	Any
Assigned to:	Vasil Dimov	CPU Architecture:	Any

Description:
Yoshinori, my colleague, is looking at performance regressions for DROP & ALTER TABLE in MySQL 5.6 compared to 5.1 with the FB patch and more buffer pool mutex contention delaying the work done during the DDL might be part of the problem. Note that we do online schema changes so there are many row modifications in progress during the DDL.

He asked about buf_LRU_buf_pool_running_out. This is called on many of the pessimistic insert/update paths so that can be frequent. It locks all buffer pool mutexes. Given that it doesn't have to be exact can you change this so it doesn't lock buffer pool mutexes?

/******************************************************************//**
Returns TRUE if less than 25 % of the buffer pool in any instance is
available. This can be used in heuristics to prevent huge transactions
eating up the whole buffer pool for their locks.
@return TRUE if less than 25 % of buffer pool left */
UNIV_INTERN
ibool
buf_LRU_buf_pool_running_out(void)
/*==============================*/
{
        ulint   i;
        ibool   ret = FALSE;

        for (i = 0; i < srv_buf_pool_instances && !ret; i++) {
                buf_pool_t*     buf_pool;

                buf_pool = buf_pool_from_array(i);

                buf_pool_mutex_enter(buf_pool);

                if (!recv_recovery_on
                    && UT_LIST_GET_LEN(buf_pool->free)
                       + UT_LIST_GET_LEN(buf_pool->LRU)
                       < buf_pool->curr_size / 4) {

                        ret = TRUE;
                }

                buf_pool_mutex_exit(buf_pool);
        }

        return(ret);
}

How to repeat:
read the source

Suggested fix:
don't lock the mutexes

Thank you for this performance bug. The InnoDB team is looking at it.

Obviously this code can be optimized. It could return true earlier for example.

Hi Mark,

Yes, the mutex acquisition can be removed since this is heuristic and we currently do not do any dynamic buffer pool destroying at runtime (but we may do in the future!).

How much boost do you get if you remove the buf_pool_mutex_enter()/buf_pool_mutex_exit() calls from buf_LRU_buf_pool_running_out()?

Will find out soon after Yoshinori repeats tests

From Yoshinori
--- 
I tested below patch and verified buf_pool mutex waits were mostly eliminated at DROP/ALTER.
 
diff --git a/storage/innobase/buf/buf0lru.cc b/storage/innobase/buf/buf0lru.cc
index 8ad738d..19fbe46 100644
--- a/storage/innobase/buf/buf0lru.cc
+++ b/storage/innobase/buf/buf0lru.cc
@@ -1090,7 +1090,7 @@ buf_LRU_buf_pool_running_out(void)
{
        ulint   i;
        ibool   ret = FALSE;
-
+        return FALSE;
        for (i = 0; i < srv_buf_pool_instances && !ret; i++) {
                buf_pool_t*     buf_pool;
 
 
Now the hottest mutex that I have seen in timelinedb 5.6 is dict_sys->mutex.

> I tested below patch and verified buf_pool mutex waits were mostly eliminated at DROP/ALTER.

Ok, but how much boost does that amount to? Can you provide some numbers e.g. a given ALTER or DROP command executes for X seconds before the patch and Y seconds after the patch?

Your patch is too aggressive, it is not realistic to test with it because it cannot be pushed into the tree. Can you try with the following patch instead:

--- cut ---
--- storage/innobase/buf/buf0lru.cc	revid:vasil.dimov@oracle.com-20130521153102-zu6qrv0b984oc25s
+++ storage/innobase/buf/buf0lru.cc	2013-05-23 06:49:46 +0000
@@ -1075,6 +1075,4 @@
 		buf_pool = buf_pool_from_array(i);
 
-		buf_pool_mutex_enter(buf_pool);
-
 		if (!recv_recovery_on
 		    && UT_LIST_GET_LEN(buf_pool->free)
@@ -1084,6 +1082,4 @@
 			ret = TRUE;
 		}
-
-		buf_pool_mutex_exit(buf_pool);
 	}
 
--- cut ---

Thank you!

I asked Yoshinori for more details but my summary is that the current code makes it impossible for us do use 5.6 & multiple buffer pool instances and do online schema change using our PHP framework.

Hi,

I am asking for perf numbers before and after the patch I pasted above because they would give us an important overview on the effects of this change. Removing the mutex protection from buf_LRU_buf_pool_running_out() may create problems to us later if we decide to dynamically destroy buffer pool instances.

Can this also be implemented by updating a global variable whenever memory is allocated from the buffer pool for things other than database pages? Those allocations should be less frequent and I assume are already getting these mutexes.

Posted by developer:
 
Hello,

I am closing this bug entry as "not a bug".

My reasoning is that the mutex acquisition can be removed, but this shouldn't be done without a justification, on a gut feeling that it will be a beneficial change.

I understand that the problem reported in this bug has been confirmed by Yoshinori to be fixed via "Bug#69316 Drop/Alter table takes much longer time in 5.6 than 5.5".

Feel free to reopen this bug if you see performance issues with buf_LRU_buf_pool_running_out(). In that case we will do some performance testing to see how much gain does the removal of the mutex acquisition bring before proceeding further.

Thank you!

With respect to "gut feeling" we provided more than that. We told you it was faster, just not by how much. Besides, doesn't my gut feeling count extra? We have done a tremendous amount of work to make ALTER/DROP not kill InnoDB performance, and then we have reported a few regressions in 5.6 so we might be a bit frustrated on this topic.

Mark,

You are one of the most respected InnoDB experts outside of the InnoDB team! Of course your opinion matters to us and we take very seriously what you say.

For changes like this, even if the suggestion or the patch comes from an InnoDB developer, we want to see the performance numbers before pushing it. Thus I asked for that, assuming that you already had the numbers. Also, I considered that it may be difficult for us to reproduce your environment where you have seen the performance improvement and thus I did not jump into running perf tests at our end.

We have been confirmed by Yoshinori that the problem reported in this bug has been fixed by Bug#69316 and that there are currently no performance problems with buf_LRU_buf_pool_running_out(). Let us know if this is not the case or if, for some other reason, you think this bug should be reopened.

Sorry if this has caused frustration.

Thank you!