Bug #94699 Mysql deadlock and bugcheck on aarch64 under stress test
Submitted: 19 Mar 2019 11:01 Modified: 28 Mar 2019 16:23
Reporter: Cai Yibo (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.7 OS:Ubuntu
Assigned to: CPU Architecture:ARM
Tags: Bugcheck, Contribution, deadlock, rwlock

[19 Mar 2019 11:01] Cai Yibo
Description:
We encountered Mysql deadlock and bugcheck during stress test on aarch64 server. 
Deadlock leads to permanent out of service of mysql, never recover.
Bugcheck will trigger mysql restart and kill all current connections.
The issue can be reproduced steadily at our test bed.

Errorlog#1: bugcheck
---------------------
2019-01-31 16:59:35 0xffe617a951c0  InnoDB: Assertion failure in thread 281363704533440 in file sync0rw.cc line 560
InnoDB: Failing assertion: !lock->recursive
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.

Errorlog#2: deadlock
--------------------
2019-01-30T06:23:22.614979Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 281365295645120 has waited at btr0sea.ic line 90 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0xaaabc8bb56a8 created in file btr0sea.cc line 195
a writer (thread id 281365295104448) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file btr0sea.ic line 128
Last time write locked in file /home/linux/mysql/0-source/mysql-server/storage/innobase/include/btr0sea.ic line 90

How to repeat:
On aarch64 server
- Build and install mysql server 5.7 from source code
- Run mysql service
- Server should have at least 128G ram, and db storage should be SSD or NVMe

On x86 client
- Install sysbench 0.5
- Run sysbench for stress testing "sysbench --test=sysbench/parallel_prepare.lua --oltp-tables-count=64 --oltp-table-size=10000000 --mysql-host=<MYSQL-SERVER-IP>  --mysql-db=testdbx --mysql-user=root --mysql-password=root --num-threads=64 --max-requests=64 run"
- Network connection should be at least 1G

Suggested fix:
We found a memory order issue in mysql rwlock implementation, which will happen on weak memory order platforms like aarch64.

We already prepared a patch and validated it thoroughly. Will upstream soon.
[19 Mar 2019 13:13] Cai Yibo
patch to fix this bug

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 0001-Bug-94699-Mysql-deadlock-and-bugcheck-on-aarch64.patch (application/octet-stream, text), 4.76 KiB.

[19 Mar 2019 14:27] MySQL Verification Team
Hi, 

Thank you for your bug report and thank you for your patch.

Verified as reported.
[28 Mar 2019 16:23] Daniel Price
Posted by developer:
 
commit 29572c9d10b3545ef57674402104c42c0961c779
Author: Jakub Łopuszański <jakub.lopuszanski@oracle.com>
Date:   Wed Mar 27 10:02:53 2019 +0100

Bug #29508001 MYSQL DEADLOCK AND BUGCHECK ON AARCH64 UNDER STRESS TEST
    
Insufficient memory barriers in rw-lock implementation caused deadlocks 
on ARM architecture. This bugfix is a contribution by Yibo Cai from ARM Inc.
[28 Mar 2019 16:40] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 8.0.17 release, and here's the changelog entry:

Insufficient memory barriers in the rw-lock implementation caused
deadlocks on ARM. 

Thanks to Yibo Cai from Arm Technology
for the contribution.
[9 May 2019 13:03] Erlend Dahl
Bug#94742 Mysql bugcheck on aarch64 under stress test

was marked as a duplicate