Bug #62294 buf_buddy_relocate calls gettimeofday while holder buffer pool mutex
Submitted: 30 Aug 2011 0:34 Modified: 7 Oct 2011 0:56
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S3 (Non-critical)
Version:5.1,5.5 OS:Any
Assigned to: Marko Mäkelä CPU Architecture:Any
Tags: innodb, performance

[30 Aug 2011 0:34] Mark Callaghan
Description:
As far as I can tell this call path can occur while the caller holds the buffer pool mutex:
buf_page_init_for_read -> buf_buddy_alloc/buf_buddy_free -> buf_buddy_free_low -> buf_buddy_relocate

And buf_buddy_relocate calls ut_time_us which wraps gettimeofday. It isn't a good idea to call that when holding a critical mutex like the buffer_pool mutex

How to repeat:
read the code

Suggested fix:
don't call gettimeofday there. if perf timers are needed use a cycle timer
[30 Aug 2011 4:02] Valeriy Kravchuk
Yes, buf_buddy_relocate() starts with:

        ut_ad(buf_pool_mutex_own(buf_pool));

and then has the following code:

        if (buf_page_can_relocate(bpage)) {
                /* Relocate the compressed page. */
...
                        buddy_stat->relocated++;
                        buddy_stat->relocated_usec
                                += ut_time_us(NULL) - usec;
...
        }

So I see no reason to disagree with your analysis.
[30 Aug 2011 6:35] Marko Mäkelä
Mark, the buddy relocation operation is expensive, and I thought that an additional system call would not hurt so much. You are probably aware that the fix of Bug#61188 reduces the number of relocation operations.

Which alternative to gettimeofday() would you suggest? Would clock_gettime(CLOCK_MONOTONIC, &tp) be any better? There are a few calls of ut_time_us() that seem to require microseconds, for INFORMATION_SCHEMA statistics.

Side note: The call in page_cur_open_on_rnd_user_rec() could use any entropy source (and it should use page_rec_get_nth() instead of traversing the next-rec chain). I guess that this function is not so time-critical.
[30 Aug 2011 6:40] Marko Mäkelä
Side note: we could probably replace gettimeofday() in ut_time_ms() as well, even though a quick check shows that we not holding the buffer pool mutex while calling it.
[30 Aug 2011 12:15] Marko Mäkelä
For what it is worth, gettimeofday can be implemented as a "cheap" system call without a context switch, depending on the operating system version:

http://davisdoesdownunder.blogspot.com/2011/02/linux-syscall-vsyscall-and-vdso-oh-my.html

One thing that could be done easily is to move the first ut_time_us() call after the buf_page_can_relocate() check. In that way, we would never be invoking a system call unless the relocation really takes place. This would reduce the reported relocated_usec a little.
[7 Oct 2011 0:56] John Russell
Added to changelog:

This fix improves the performance of instrumentation code for InnoDB
buffer pool operations.
[8 Feb 2012 0:45] John Russell
Confirmed that the change applies to 5.1.60, 5.5.17, 5.6.4.