Bug #43942 Make SyncObject::backoff really backoff
Submitted: 29 Mar 2009 20:53 Modified: 15 May 2009 16:05
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.10 OS:Any
Assigned to: Kelly Long CPU Architecture:Any
Tags: F_PERFORMANCE
Triage: Triaged: D3 (Medium)

[29 Mar 2009 20:53] Mark Callaghan
Description:
I think that SyncObject::backoff() is supposed to implement a busy-wait loop, but I can only guess because there are no comments in the *.h and *.cpp files. The loop as written will be removed by most optimizing compilers including gcc and the lack of a busy-wait loop kills performance. 

mutex_delay in mysys/thr_mutex.c does not have this problem.

The source is this:

void SyncObject::backoff(Thread* thread)
{
        //thread->sleep(1);
        int a = 0;

        for (int n = 0; n < thread->backoff; ++n)
                ++a;

        thread->backoff += a;
}

Code generated by gcc 4.3.1 with -O2

(gdb) disas SyncObject::backoff
Dump of assembler code for function _ZN10SyncObject7backoffEP6Thread:
0x00000000006afec0 <_ZN10SyncObject7backoffEP6Thread+0>:	mov    0xe8(%rsi),%edx
0x00000000006afec6 <_ZN10SyncObject7backoffEP6Thread+6>:	xor    %eax,%eax
0x00000000006afec8 <_ZN10SyncObject7backoffEP6Thread+8>:	push   %rbp
0x00000000006afec9 <_ZN10SyncObject7backoffEP6Thread+9>:	mov    %rsp,%rbp
0x00000000006afecc <_ZN10SyncObject7backoffEP6Thread+12>:	test   %edx,%edx
0x00000000006afece <_ZN10SyncObject7backoffEP6Thread+14>:	cmovns %edx,%eax
0x00000000006afed1 <_ZN10SyncObject7backoffEP6Thread+17>:	add    %edx,%eax
0x00000000006afed3 <_ZN10SyncObject7backoffEP6Thread+19>:	mov    %eax,0xe8(%rsi)
0x00000000006afed9 <_ZN10SyncObject7backoffEP6Thread+25>:	leaveq 
0x00000000006afeda <_ZN10SyncObject7backoffEP6Thread+26>:	retq   

Performance results are at http://mysqlha.blogspot.com/2009/03/make-falcon-faster.html

How to repeat:
Run sysbench or other test with a CPU bound workload and many concurrent users

Suggested fix:
Copy what is done in mutex_delay. Prevent an optimizing compiler from removing the loop.

Also, time a delay at storage engine init time and print this delay in SHOW STATUS output so this problem is easier to spot.
[31 Mar 2009 2:29] Kevin Lewis
Kelly confirms that the current Falcon backoff code is affectively a no-op because the ++a; inside a fixed size loop is optimized into a single addition.  Kelly agreed to provide a solution that will provide performance gains under higher concurrency without costing very much under low concurrency.
[31 Mar 2009 17:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/70938

3088 Kelly Long	2009-03-31
      Fix for Bug #43942 - keep compiler optimization from removing a 'noop' loop
      	that we use for a delay
      Initial  tests with SyncObjectTest show this is a gain in most situations.
      I'll run some DBT2 tests to see what the impact will be on benchmarks.
[2 Apr 2009 17:38] Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090402144811-yc5kp8g0rjnhz7vy) (version source revid:k.long@sun.com-20090331173027-orwhbrmd7ima50og) (merge vers: 6.0.11-alpha) (pib:6)
[15 May 2009 16:05] MC Brown
Internal/test fix. No changelog entry required.