Description:
This is followup patch of Bug #96504(Refine atomics and barriers for weak memory order platform). It depends on the patches for Bug #97150, #97228.
lock->waiters is an integer(used as bool actually) to tell the lock
holder that some threads are sleeping and waiting for lock released.
The lock holder must check it and signal a conditional variable to
wake them up. Memory order is important to make sure sleeping threads
will be woken up in time.
The problem can be simplified as below code:
// Initial state: lockword = 0; waiters = 0;
// Thread1 // Thread2
waiters = 1; lockword = 1; // free the lock
mb(); mb();
if (lockword == 0) { if (waiters == 1) {
sleep(); waiters = 0;
} mb();
wakeup(); // wakeup thread1
}
To make sure thread2 calls wakeup() if thread1 calls sleep(), full
memory barrier is required for both thread, so the ordering of
"waiters" and "lockword" is preserved.
Besides, memory barrier is required after thread2 resets "waiters"
to make sure it happens before calling wakeup(). Another instance
of thread1 may set "waiters" at the same time thread2 calls wakeup(),
without the barrier, that "waiters" flag may be cleared accidentally.
Actual code is more complex. Instead of direct load/store instructions,
TTAS is used for atomicity and to prevent cache line store contentions.
x86 ignores acquire/release options, the lock/xchg instructions imply
memory barrier. Arm respects acquire/release options and generates
instructions with necessary memory order.
// Thread1
waiters.cmpxchg(0, 1, acquire);
if (lockword.cmpxchg(1, 0, acquire) == 0) { // try lock
sleep();
}
// Thread2
lockword.cmpxchg(0, 1, release); // unlock
fence(acquire);
if (waiters.load(relaxed) == 1) {
waiters.cmpxchg(1, 0, acquire);
wakeup();
}
How to repeat:
NA