Bug #83568 InnoDB memory barriers bs
Submitted: 27 Oct 2016 9:54 Modified: 28 Nov 2016 5:31
Reporter: Sergey Vojtovich Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:ARM

[27 Oct 2016 9:54] Sergey Vojtovich
Description:
InnoDB started adding memory barriers (os_rmb/os_wmb) in an attempt to fix race conditions on weakly ordered hardware. E.g. see: https://github.com/mysql/mysql-server/commit/09eefae20693bbfd254c3dfa4f513bbd47165383

Exactly for the same reason they were removed from MariaDB, which actually supports weakly ordered hardware and is being actively tested on it: https://github.com/MariaDB/server/commit/5608a737ea7b5630452957b82deff4c76406041e

The reasons for removal were:
1. Performance. That's something MySQL actually observed and disabled memory barriers on x86: https://github.com/mysql/mysql-server/commit/0c6e277e00fddd30688227da7cbcbfa912927c24
2. None of added memory barriers is correct
3. They may reduce probability of race conditions but not eliminate them, which makes hunting for race conditions really hard

os_rmb/os_wmb issue memory barriers without associated atomic operation. First they can be slower than memory barrier associated with atomic operation on some platforms, because they impose stronger synchronization constraints. Second, it's only useful for some limited and very advanced set of use cases, which InnoDB doesn't have. For detailed explanation and valid use case example see http://en.cppreference.com/w/cpp/atomic/atomic_thread_fence

I can only guess why those memory barriers were added:
- to make things more atomic (that's not something memory barriers guarantee!)
- to force loads from memory (that's not something memory barriers guarantee!)
- to flush stores to memory (that's not something memory barriers guarantee!)

With decent gcc versions os_rmb/os_wmb follow release-acquire semantics. Below is an example of what it actually does:

thread1:
a= 1;
os_wmb;
b= 1;
c= 1;

thread2:
if (b == 1)
{
  os_rmb;
  assert(a == 1); // never hits
}

thread3:
if (b == 1)
  assert(a == 1); // may or may not hit

thread4:
if (c == 1)
{
  os_rmb;
  assert(b == 1); // may or may not hit
}

In other words release-acquire only enforce order and nothing else.

How to repeat:
Code analysis.

Suggested fix:
Review memory barriers. Most if not all can be removed without extra effort. However some may hide race conditions as a side effect of memory barriers, those needs to be fixed separately.

Use memory barriers associated with atomic operations.
Memory barriers should be issued independently of platform: compiler should decide if it actually needs to emit low level instruction or not.
[28 Oct 2016 13:19] Sergey Vojtovich
To support my statement that "they may reduce probability of race conditions but not eliminate them"... There was precedent in fact.

MariaDB also added these os_rmb/os_wmb in an attempt to fix some race conditions: https://github.com/MariaDB/server/commit/5569132ffebba3fd2e37964543f658ed24d8caaf

Which managed to hide at least one race condition that was reported again later (and it was damn hard to reproduce): https://github.com/MariaDB/server/commit/ed313e8a92a836422b9ae7b9ecf11c44ed4d5d66

The above doesn't seem to be fixed in MySQL, but is subject for another bug.