Bug #97230 rwlock: refine lock->waiters with C++11 atomics
Submitted: 15 Oct 2019 10:44 Modified: 15 Oct 2019 12:46
Reporter: Cai Yibo (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:ARM
Tags: Contribution

[15 Oct 2019 10:44] Cai Yibo
Description:
This is followup patch of Bug #96504(Refine atomics and barriers for weak memory order platform). It depends on the patches for Bug #97150, #97228.

lock->waiters is an integer(used as bool actually) to tell the lock
holder that some threads are sleeping and waiting for lock released.
The lock holder must check it and signal a conditional variable to
wake them up. Memory order is important to make sure sleeping threads
will be woken up in time.

The problem can be simplified as below code:

  // Initial state: lockword = 0; waiters = 0;

  // Thread1                       // Thread2
  waiters = 1;                     lockword = 1; // free the lock
  mb();                            mb();
  if (lockword == 0) {             if (waiters == 1) {
      sleep();                         waiters = 0;
  }                                    mb();
                                       wakeup(); // wakeup thread1
                                    }

  To make sure thread2 calls wakeup() if thread1 calls sleep(), full
  memory barrier is required for both thread, so the ordering of
  "waiters" and "lockword" is preserved.

  Besides, memory barrier is required after thread2 resets "waiters"
  to make sure it happens before calling wakeup(). Another instance
  of thread1 may set "waiters" at the same time thread2 calls wakeup(),
  without the barrier, that "waiters" flag may be cleared accidentally.

Actual code is more complex. Instead of direct load/store instructions,
TTAS is used for atomicity and to prevent cache line store contentions.
x86 ignores acquire/release options, the lock/xchg instructions imply
memory barrier. Arm respects acquire/release options and generates
instructions with necessary memory order.

  // Thread1
  waiters.cmpxchg(0, 1, acquire);
  if (lockword.cmpxchg(1, 0, acquire) == 0) {   // try lock
      sleep();
  }

  // Thread2
  lockword.cmpxchg(0, 1, release);      // unlock
  fence(acquire);
  if (waiters.load(relaxed) == 1) {
      waiters.cmpxchg(1, 0, acquire);
      wakeup();
  }

How to repeat:
NA
[15 Oct 2019 10:45] Cai Yibo
patch

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 0003-rwlock-refine-lock-waiters-with-C-11-atomics.patch (text/x-patch), 6.08 KiB.

[15 Oct 2019 12:46] MySQL Verification Team
Hi Mr. Yibo,

Thank you for your bug report and contribution for improving our performance on the ARM CPU family.

Verified as reported.