| Bug #80354 | Cache line contention on ut_rnd_ulint_counter | ||
|---|---|---|---|
| Submitted: | 12 Feb 2016 14:26 | Modified: | 10 Jul 2019 18:15 |
| Reporter: | Alexey Kopytov | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
| Version: | 5.6, 5.7 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[15 Feb 2016 15:48]
MySQL Verification Team
Kaamos, my dear friend, I do remember that code very well, as it is with us for quite a while. I also, totally, agree with your conclusion on how can that global variable be a bottleneck, as well as on the solution. Verified.
[3 Jun 2016 16:47]
Daniel Price
Posted by developer: Fixed as of the upcoming 5.7.14 release, and here's the changelog entry: The global counter, ut_rnd_ulint_counter, was changed to a thread-local counter to make it scalable on NUMA core systems. Thank you for the bug report.
[11 Jul 2016 11:28]
Daniel Price
The patch was reverted.
[8 Aug 2016 12:34]
Alexey Kopytov
What was the reason for reverting the patch? I don't see any explanations in the commit: --- Author: Aakanksha Verma <aakanksha.verma@oracle.com> AuthorDate: Mon Jul 11 12:24:54 2016 +0530 Commit: Hery Ramilison <hery.ramilison@oracle.com> CommitDate: Mon Jul 11 10:32:33 2016 +0200 Revert "Bug #22733635 CACHE LINE CONTENTION ON UT_RND_ULINT_COUNTER" This reverts commit 1994f43daafbb1b6b2cd44894ba6a7944bc22260. (cherry picked from commit be7b121f2cb49de03fa993599d203f616072853b) ---
[11 Aug 2017 15:35]
Alexey Kopytov
The patch has been released in 8.0.0, but is not mentioned in the 8.0.0 release notes.
[10 Jul 2019 18:15]
Daniel Price
Posted by developer: Fixed as of the upcoming 5.7.15, 8.0.0 releases, and here's the changelog entry: A global counter (ut_rnd_ulint_counter) was changed to a thread-local counter to make it scalable on multi-core systems.

Description: InnoDB implements the random backoff for spinlocks. There are multiple relevant sites in the code, but all of them look like this: for (n_spins = 0; n_spins < max_spins; ++n_spins) { if (!is_locked()) { lock_word_t lock = trylock(); if (lock == MUTEX_STATE_UNLOCKED) { /* Lock successful */ return(lock); } } ut_delay(ut_rnd_interval(0, max_delay)); } The problem here is that ut_rnd_interval() is not scalable on systems with many cores, especially NUMA ones, because it maintains a global state in ut_rnd_ulint_counter, which is updated on each call resulting in cache coherency and interconnect traffic. I'm not sure that using an RNG in this specific case is the best idea. But for an experiment I converted ut_rnd_ulint_counter to a thread-local variable and got up to 40% speedup in a workload with heavy mutex contention. How to repeat: Inspect the code. Profile cache misses on a workload with heavy mutex contention.