Bug #80354 Cache line contention on ut_rnd_ulint_counter
Submitted: 12 Feb 2016 14:26 Modified: 10 Jul 2019 18:15
Reporter: Alexey Kopytov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:5.6, 5.7 OS:Any
Assigned to: CPU Architecture:Any

[12 Feb 2016 14:26] Alexey Kopytov
Description:
InnoDB implements the random backoff for spinlocks. There are multiple
relevant sites in the code, but all of them look like this:

		for (n_spins = 0; n_spins < max_spins; ++n_spins) {

			if (!is_locked()) {

				lock_word_t	lock = trylock();

				if (lock == MUTEX_STATE_UNLOCKED) {
					/* Lock successful */
					return(lock);
				}
			}

			ut_delay(ut_rnd_interval(0, max_delay));
		}

The problem here is that ut_rnd_interval() is not scalable on systems
with many cores, especially NUMA ones, because it maintains a global
state in ut_rnd_ulint_counter, which is updated on each call resulting
in cache coherency and interconnect traffic.

I'm not sure that using an RNG in this specific case is the best
idea. But for an experiment I converted ut_rnd_ulint_counter to a
thread-local variable and got up to 40% speedup in a workload with heavy
mutex contention.

How to repeat:
Inspect the code. Profile cache misses on a workload with heavy mutex contention.
[15 Feb 2016 15:48] MySQL Verification Team
Kaamos, my dear friend,

I do remember that code very well, as it is with us for quite a while. I also, totally, agree with your conclusion on how can that global variable be a bottleneck, as well as on the solution.

Verified.
[3 Jun 2016 16:47] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.7.14 release, and here's the changelog entry:

The global counter, ut_rnd_ulint_counter, was changed to a thread-local
counter to make it scalable on NUMA core systems. 

Thank you for the bug report.
[11 Jul 2016 11:28] Daniel Price
The patch was reverted.
[8 Aug 2016 12:34] Alexey Kopytov
What was the reason for reverting the patch? I don't see any
explanations in the commit:

---
Author:     Aakanksha Verma <aakanksha.verma@oracle.com>
AuthorDate: Mon Jul 11 12:24:54 2016 +0530
Commit:     Hery Ramilison <hery.ramilison@oracle.com>
CommitDate: Mon Jul 11 10:32:33 2016 +0200

Revert "Bug #22733635   CACHE LINE CONTENTION ON UT_RND_ULINT_COUNTER"

This reverts commit 1994f43daafbb1b6b2cd44894ba6a7944bc22260.

(cherry picked from commit be7b121f2cb49de03fa993599d203f616072853b)
---
[11 Aug 2017 15:35] Alexey Kopytov
The patch has been released in 8.0.0, but is not mentioned in the 8.0.0 release notes.
[10 Jul 2019 18:15] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.7.15, 8.0.0  releases, and here's the changelog entry:

A global counter (ut_rnd_ulint_counter) was changed to a thread-local
counter to make it scalable on multi-core systems.