Bug #72805 | mutex_delay() creating excess memory traffic, GCC mem barrier needed | ||
---|---|---|---|
Submitted: | 30 May 2014 6:17 | Modified: | 29 Apr 2015 0:14 |
Reporter: | Stewart Smith | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Compiling | Severity: | S3 (Non-critical) |
Version: | 5.7.7 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | MySQL, performance, PowerPC |
[30 May 2014 6:17]
Stewart Smith
[30 May 2014 6:20]
Stewart Smith
use GCC memory barrier in mutex_delay instead of generating memory traffic. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: mutex_delay_gcc_barrier.patch (text/x-patch), 471 bytes.
[30 May 2014 8:04]
MySQL Verification Team
Hello Stewart, Thank you for the report and contribution. Thanks, Umesh
[13 Nov 2014 6:17]
Stewart Smith
This is still present in 5.7.5 - and is also an issue on x86. When the server is built with fast mutexes (cmake -DWITH_FAST_MUTEXES=1), we get mutex_delay in mysys/thr_mutex.cc. As it stands, it looks like this: 0000000000000000 <.mutex_delay>: 0: fb e1 ff f8 std r31,-8(r1) 4: 1c 63 00 32 mulli r3,r3,50 8: f8 21 ff b1 stdu r1,-80(r1) c: 39 20 00 00 li r9,0 10: 7c 69 03 a6 mtctr r3 14: 2f a3 00 00 cmpdi cr7,r3,0 18: 7c 3f 0b 78 mr r31,r1 1c: f9 3f 00 30 std r9,48(r31) 20: 41 9e 00 34 beq cr7,54 <.mutex_delay+0x54> 24: 48 00 00 1c b 40 <.mutex_delay+0x40> 28: 60 00 00 00 nop 2c: 60 00 00 00 nop 30: 60 00 00 00 nop 34: 60 00 00 00 nop 38: 60 00 00 00 nop 3c: 60 00 00 00 nop 40: e9 5f 00 30 ld r10,48(r31) 44: 7d 49 52 14 add r10,r9,r10 48: 39 29 00 01 addi r9,r9,1 4c: f9 5f 00 30 std r10,48(r31) 50: 42 00 ff f0 bdnz 40 <.mutex_delay+0x40> 54: 38 3f 00 50 addi r1,r31,80 58: e8 7f 00 30 ld r3,48(r31) 5c: eb e1 ff f8 ld r31,-8(r1) 60: 4e 80 00 20 blr Which (just trust me on this): isn't what you want. As mentioned in this bug report (and in http://bugs.mysql.com/bug.php?id=72806 ) it misses relaxing the CPU and is just using a work around rather than an actual solution to avoiding the compiler from optimizing away the loop. My updated patch (attached) works well for POWER. Adding the pause instruction for x86 is left as an exercise for the reader. with my patch applied: 0000000000000000 <.mutex_delay>: 0: fb e1 ff f8 std r31,-8(r1) 4: f8 21 ff c1 stdu r1,-64(r1) 8: 7c 3f 0b 78 mr r31,r1 c: 7c 21 0b 78 mr r1,r1 10: 1c 63 00 32 mulli r3,r3,50 14: 7c 69 03 a6 mtctr r3 18: 2f a3 00 00 cmpdi cr7,r3,0 1c: 41 9e 00 08 beq cr7,24 <.mutex_delay+0x24> 20: 42 00 00 00 bdnz 20 <.mutex_delay+0x20> 24: 7c 42 13 78 mr r2,r2 28: 38 3f 00 40 addi r1,r31,64 2c: eb e1 ff f8 ld r31,-8(r1) 30: 4e 80 00 20 blr Which is pretty much *exactly* what we want - it sets low priority on the thread, spins for a while, restores priority and exits.
[13 Nov 2014 6:18]
Stewart Smith
Updated patch: sets thread priority on POWER, uses proper compiler barrier (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: mutex_delay_gcc_barrier.patch (text/x-patch), 969 bytes.
[10 Apr 2015 7:54]
Stewart Smith
I still see this in 5.7.7.
[29 Apr 2015 0:14]
Paul DuBois
Noted in 5.7.8 changelog. The so-called "fast mutex" code has been removed from the server sources. It provides no measurable benefit, complicates the code, and is problematic for certain architectures such as POWER8. The (undocumented) WITH_FAST_MUTEXES CMake option has also been removed.