| Bug #88399 | Using CAS for trylock in place of TAS for EventMutex (arm64) | ||
|---|---|---|---|
| Submitted: | 8 Nov 2017 8:39 | Modified: | 20 Dec 2017 3:30 |
| Reporter: | Debayan Ghosh (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
| Version: | 5.7,8.0 | OS: | Linux |
| Assigned to: | CPU Architecture: | ARM | |
| Tags: | Contribution, mysql-5.7*, mysql-8.0 | ||
[8 Nov 2017 8:53]
Debayan Ghosh
patch (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: 0001-InnoDB-Use-CAS-for-Eventmutex-trylock.patch (application/octet-stream, text), 2.40 KiB.
[8 Nov 2017 9:10]
MySQL Verification Team
Hello Debayan, Thank you for the report and contribution. Thanks, Umesh
[21 Nov 2017 15:37]
Debayan Ghosh
Any comments on this one ? Has someone seen the impact of this on other platforms including PPC64 ?
[8 Dec 2017 15:43]
Eric Anger
I have tested this patch on several different Arm platforms and have seen it improve performance for large core counts under high lock contention.
[15 Dec 2017 17:25]
Daniel Frazier
I tested on a 128 CPU ppc64le system, and I also see improvements there. My commandline: $ sysbench --max-requests=0 --test=oltp --num-threads=128 --max-time=60 --mysql-user=testuser --mysql-password=testpassword run sysbench.orig.1: transactions: 60718 (1005.43 per sec.) sysbench.orig.1: deadlocks: 8173 (135.34 per sec.) sysbench.orig.1: read/write requests: 1284365 (21267.85 per sec.) sysbench.orig.1: other operations: 129609 (2146.20 per sec.) sysbench.cas.1: transactions: 70365 (1170.24 per sec.) sysbench.cas.1: deadlocks: 9346 (155.43 per sec.) sysbench.cas.1: read/write requests: 1486625 (24724.10 per sec.) sysbench.cas.1: other operations: 150076 (2495.92 per sec.)

Description: Hi, Currently MYSQL event mutex code uses test and set semantics to try acquiring a lock. file:: storage/innobase/include/ib0mutex.h bool tas_lock() UNIV_NOTHROW { return(TAS(&m_lock_word, MUTEX_STATE_LOCKED) == MUTEX_STATE_UNLOCKED); } When the contention is high with several threads attempting the atomic_exchange , I find compare and swap (__atomic_compare_exchange) to be performing quite better on some arm64 platforms compared to test and set. bool cas_lock() UNIV_NOTHROW { return (CAS(&m_lock_word, MUTEX_STATE_UNLOCKED, MUTEX_STATE_LOCKED) == MUTEX_STATE_UNLOCKED); } I also see the Futexlock implementation to also use a CAS for try lock. I used the latest sysbench 1.1 oltp update/write only benchmarks to test the impact. The improvement was seen with 32 or more number of threads. In addition to this, reducing the strength of the atomic_compare_exchange barrier from ATOMIC_SEQ_CST/ATOMIC_SEQ_CST to ATOMIC_ACQUIRE/ATOMIC_RELAXED gives some additional improvement but this may need to be verified on all other platforms and scenarios. How to repeat: Used SysBench 1.1 oltp writes/update on ARM64 platforms with number of threads 32 and more.