Bug #87933 | Scalibility issue on Arm platform with the current UT_RELAX_CPU () code. | ||
---|---|---|---|
Submitted: | 2 Oct 2017 4:06 | Modified: | 12 Dec 2018 20:59 |
Reporter: | Sandeep Sethia (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | 5.7.19 | OS: | CentOS |
Assigned to: | CPU Architecture: | ARM | |
Tags: | aarch64, arm, barrier, innodb |
[2 Oct 2017 4:06]
Sandeep Sethia
[2 Oct 2017 4:19]
Sandeep Sethia
Added tags
[5 Oct 2017 8:26]
Sandeep Sethia
Any update on the issue?
[7 Nov 2017 3:42]
Daniel Black
UT_RELAX_CPU just needs to be a CPU instruction(s) that takes a bit of time to exectute, it doesn't needs to be a hardware barrier. Obviously a compile barrier takes no amount of time and ends up being and empty loop. On Power the following is suggested as this is a moderately slow instruction that has no impacts on other CPU core elements. (note - please include if changing - OCA submitted by me) #elif defined(__powerpc__) && defined __GLIBC__ # include <sys/platform/ppc.h> # define UT_RELAX_CPU() __ppc_get_timebase() For ARM I suggest looking for a similar slow instruction. The reverting to os_atomic operations have effects on other threads and cores. Alexey Kopytov was evaluating however I didn't see an end result (https://github.com/MariaDB/server/pull/168#r58172391 ). A compile barrier is needed always needed (Windows - _ReadWriteBarrier() ) as without it the loop could be removed in ut_delay if/when the compiler realises that the UT_RELAX_CPU implementation isn't required to be executed delay*50 times and the return value can a static number. In fact the entire j += i and return j (never used) can be removed when a compile barrier is there (and a non 0 UT_RELAX_CPU implementation) as this is exceptionally quick compared to UT_RELAX_CPU (and it ties up ALU CPU aspects if shared with concurrent threads). On replacing ut_delay(ut_rnd_interval(0, max_delay)); with just max_delay, this is probably because a larger than default value of innodb_spin_wait_delay suites this test better. Or alternately (perhaps less likely) ut_rnd_interval and the functions it calls are for some reason is exceptionally slow - check with a perf record -g/report -g --no-children. This could be the case if there is any synchronisation around ut_rnd_ulint_counter (because its not needed). When you mention x86 are you talking 32 or 64 bit? I think what's really needed is a measure the distribution of the number of CPU cycles the mutexes actually need to be delayed. Because "The argument gives the desired delay in microseconds on 100 MHz Pentium + Visual C++." is clearly not relevant any more.
[12 Dec 2018 20:59]
MySQL Verification Team
I'm having issues reproducing problem on my hardware but reviewing the code I see that 8.0.13 is still with for (i = 0; i < delay * 50; i++) { j += i; UT_RELAX_CPU(); } UT_RESUME_PRIORITY_CPU(); and while bug #74832 states that 5.7.11, 5.8.0 have this solved by "A compiler barrier was added to ut_relax_cpu()" I don't see it in 8.0.13. Bogdan Kecman