Bug #76135 | data corruption on arm64 | ||
---|---|---|---|
Submitted: | 3 Mar 2015 17:11 | Modified: | 12 Aug 2015 15:21 |
Reporter: | Daniel Frazier | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S1 (Critical) |
Version: | 5.6.23 | OS: | Linux (Ubuntu 15.04) |
Assigned to: | CPU Architecture: | Any |
[3 Mar 2015 17:11]
Daniel Frazier
[4 Mar 2015 0:42]
Daniel Frazier
Note that the proposed fix from MariaDB apparently introduced a regression on x86: https://mariadb.atlassian.net/browse/MDEV-7026 But the proposed fix for that regression brings back the arm64 problem: https://mariadb.atlassian.net/browse/MDEV-7658
[31 Mar 2015 19:29]
Daniel Frazier
Patch 1/4
Attachment: MDEV-6450-MariaDB-crash-on-Power8-when-built-with-advance-tool-chain.patch (text/x-patch), 2.31 KiB.
[31 Mar 2015 19:29]
Daniel Frazier
Patch 2/4
Attachment: MDEV-6483-Deadlock_around_rw_lock_debug_mutex_on_PPC64.patch (text/x-patch), 3.88 KiB.
[31 Mar 2015 19:30]
Daniel Frazier
Patch 3/4
Attachment: MDEV-7026-Race-in-InnoDB-XtraDB-mutex-implementation.patch (text/x-patch), 12.33 KiB.
[31 Mar 2015 19:30]
Daniel Frazier
Patch 4/4
Attachment: MDEV-7658-fix-AArch64-MDEV-6615-regression.patch (text/x-patch), 2.14 KiB.
[31 Mar 2015 19:33]
Daniel Frazier
I've attached my port of the fixes from MariaDB, and modified them to apply to ARM64 in addition to PowerPC. For more details, see: https://bugs.launchpad.net/ubuntu/+source/mysql-5.6/+bug/1427406/comments/5
[17 Jul 2015 5:02]
Stewart Smith
The __sync_synchronize() is wrong, you don't require a heavyweight sync. The correct thing to use would be __atomic_test_and_set() and __atomic_clear.
[17 Jul 2015 6:08]
Sunny Bains
After discussing with Stewart, the change looks like this on Linux. If the new primitives are available then we use them on Intel x86 too. Otherwise we fall back to the old code on platforms with strong memory ordering. # if defined(HAVE_IB_GCC_ATOMIC_TEST_AND_SET) /**********************************************************//** Do an atomic test-and-set. @param[in,out] ptr Memory location to set to non-zero @return the previous value */ inline lock_word_t os_atomic_test_and_set(volatile lock_word_t* ptr) { return(__atomic_test_and_set(ptr, __ATOMIC_ACQUIRE)); } /**********************************************************//** Do an atomic clear. @param[in,out] ptr Memory location to set to zero */ inline void os_atomic_clear(volatile lock_word_t* ptr) { __atomic_clear(ptr, __ATOMIC_RELEASE); } # elif defined(IB_STRONG_MEMORY_MODEL) /**********************************************************//** Do an atomic test and set. @param[in,out] ptr Memory location to set to non-zero @return the previous value */ inline lock_word_t os_atomic_test_and_set(volatile lock_word_t* ptr) { return(__sync_lock_test_and_set(ptr, 1)); } /**********************************************************//** Do an atomic release. In theory __sync_lock_release should be used to release the lock. Unfortunately, it does not work properly alone. The workaround is that more conservative __sync_lock_test_and_set is used instead. Performance regression was observed at some conditions for Intel architecture. Disable release barrier on Intel architecture for now. @param[in,out] ptr Memory location to write to @return the previous value */ inline lock_word_t os_atomic_clear(volatile lock_word_t* ptr) { return(__sync_lock_test_and_set(ptr, 0)); } # else # error "Unsupported platform" # endif /* HAVE_IB_GCC_ATOMIC_TEST_AND_SET */
[10 Aug 2015 8:53]
Shaohua Wang
Posted by developer: The root cause is that x86 has a stronger memory model than the ARM processors. And the GCC builtins didn't issue the correct fences when setting/unsetting the lock word. In particular during the mutex release. The solution is rewriting atomic TAS operations: replace '__sync_' by '__atomic_' if possible.
[12 Aug 2015 15:21]
Daniel Price
Fixed as of the upcoming 5.5.46, 5.6.27, 5.7.9, 5.8.0 releases, and here's the changelog entry: A data corruption occurred on ARM64. GCC builtins did not issue the correct fences when setting or unsetting the lock word. Thank you for the bug report.
[3 Dec 2015 8:17]
Laurynas Biveinis
The fix for this has caused bug 79185.
[3 Dec 2015 8:44]
Sunny Bains
Laurynas, Can you please post your HW/OS/Compiler versions here.
[3 Dec 2015 9:38]
Laurynas Biveinis
Several combinations: 1) 5.5 / IBM X3650M4, 2x Intel Xeon E5-2620, 64G RAM, RAID 10 (4x 240G SSD) / Ubuntu Trusty / GCC 4.8.4 2) 5.6 / N/A / CentOS 6.6 / GCC 4.8.2 3) 5.6 / don't have details handy right now, will post later if I get them
[3 Dec 2015 10:20]
Sunny Bains
Laurynas, Can you also please post the asm from the mutex enter and release code that is generated.
[3 Dec 2015 10:44]
Sunny Bains
gcc -O3 -S -fverbose-asm main.c cat main.s .file "main.c" # GNU C (Debian 4.8.1-8) version 4.8.1 (x86_64-linux-gnu) # compiled by GNU C version 4.8.1, GMP version 5.1.2, MPFR version 3.1.1-p2, MPC version 1.0.1 # warning: GMP header version 5.1.2 differs from library version 6.0.0. # warning: MPFR header version 3.1.1-p2 differs from library version 3.1.2-p3. # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: -imultiarch x86_64-linux-gnu main.c -mtune=generic # -march=x86-64 -O3 -fverbose-asm # options enabled: -faggressive-loop-optimizations # -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg # -fcaller-saves -fcombine-stack-adjustments -fcommon -fcompare-elim # -fcprop-registers -fcrossjumping -fcse-follow-jumps -fdefer-pop # -fdelete-null-pointer-checks -fdevirtualize -fdwarf2-cfi-asm # -fearly-inlining -feliminate-unused-debug-types -fexpensive-optimizations # -fforward-propagate -ffunction-cse -fgcse -fgcse-after-reload -fgcse-lm # -fgnu-runtime -fguess-branch-probability -fhoist-adjacent-loads -fident # -fif-conversion -fif-conversion2 -findirect-inlining -finline # -finline-atomics -finline-functions -finline-functions-called-once # -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-profile # -fipa-pure-const -fipa-reference -fipa-sra -fira-hoist-pressure # -fira-share-save-slots -fira-share-spill-slots -fivopts # -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants # -fmerge-debug-strings -fmove-loop-invariants -fomit-frame-pointer # -foptimize-register-move -foptimize-sibling-calls -foptimize-strlen # -fpartial-inlining -fpeephole -fpeephole2 -fpredictive-commoning # -fprefetch-loop-arrays -free -freg-struct-return -fregmove # -freorder-blocks -freorder-functions -frerun-cse-after-loop # -fsched-critical-path-heuristic -fsched-dep-count-heuristic # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic # -fsched-stalled-insns-dep -fschedule-insns2 -fshow-column -fshrink-wrap # -fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types # -fstrict-aliasing -fstrict-overflow -fstrict-volatile-bitfields # -fsync-libcalls -fthread-jumps -ftoplevel-reorder -ftrapping-math # -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch # -ftree-coalesce-vars -ftree-copy-prop -ftree-copyrename -ftree-cselim # -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre # -ftree-loop-distribute-patterns -ftree-loop-if-convert -ftree-loop-im # -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops= # -ftree-partial-pre -ftree-phiprop -ftree-pre -ftree-pta -ftree-reassoc # -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize -ftree-slsr -ftree-sra # -ftree-switch-conversion -ftree-tail-merge -ftree-ter # -ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time # -funswitch-loops -funwind-tables -fvect-cost-model -fverbose-asm # -fzero-initialized-in-bss -m128bit-long-double -m64 -m80387 # -maccumulate-outgoing-args -malign-stringops -mfancy-math-387 # -mfp-ret-in-387 -mfxsr -mglibc -mieee-fp -mlong-double-80 -mmmx -mno-sse4 # -mpush-args -mred-zone -msse -msse2 -mtls-direct-seg-refs .text .p2align 4,,15 .globl old .type old, @function old: .LFB11: .cfi_startproc xorl %eax, %eax # tmp61 xchgl (%rdi), %eax #,* t, tmp60 ret .cfi_endproc .LFE11: .size old, .-old .p2align 4,,15 .globl tas .type tas, @function tas: .LFB12: .cfi_startproc movl $1, %eax #, tmp60 movq %rdi, -8(%rsp) # t, t xchgb -8(%rsp), %al #,, tmp59 ret .cfi_endproc .LFE12: .size tas, .-tas .p2align 4,,15 .globl clear .type clear, @function clear: .LFB13: .cfi_startproc movb $0, -8(%rsp) #,, ret .cfi_endproc .LFE13: .size clear, .-clear .section .text.startup,"ax",@progbits .p2align 4,,15 .globl main .type main, @function main: .LFB14: .cfi_startproc xorl %eax, %eax # tmp61 movl $65535, -40(%rsp) #, t xchgl -40(%rsp), %eax #,, tmp60 leaq -40(%rsp), %rax #, tmp62 movq %rax, -24(%rsp) # tmp62, t movl $1, %eax #, tmp64 xchgb -24(%rsp), %al #,, tmp63 movb $0, -16(%rsp) #,, ret .cfi_endproc .LFE14: .size main, .-main .ident "GCC: (Debian 4.8.1-8) 4.8.1" .section .note.GNU-stack,"",@progbits cat main.c #include <stdio.h> void old(int* t) { __sync_lock_test_and_set(t, 0); } void tas(int* t) { __atomic_test_and_set(&t, __ATOMIC_ACQUIRE); } void clear(int* t) { __atomic_clear(&t, __ATOMIC_RELEASE); } int main() { int t = 0xffff; old(&t); tas(&t); clear(&t); } The clear() doesn't seem to have any barrier, looks like it defaults to RELAXED on x86_64.
[3 Dec 2015 11:15]
Laurynas Biveinis
Ubuntu Trusty, GCC 4.8.4. $ /usr/bin/cc -DHAVE_CONFIG_H -DHAVE_IB_ATOMIC_PTHREAD_T_GCC=1 -DHAVE_IB_GCC_ATOMIC_BUILTINS=1 -DHAVE_IB_GCC_ATOMIC_TEST_AND_SET=1 -DLINUX_NATIVE_AIO=1 -DSIZEOF_PTHREAD_T=8 -fPIC -Wall -Wextra -Wformat-security -Wvla -Wwrite-strings -Wdeclaration-after-statement -O3 -g -static-libgcc -fno-omit-frame-pointer -fno-strict-aliasing -DDBUG_OFF -DMY_PTHREAD_FASTMUTEX=1 -I/home/laurynas/percona/obj-5.5-release/include -I/home/laurynas/percona/mysql-server/storage/innobase/include -I/home/laurynas/percona/mysql-server/storage/innobase/handler -I/home/laurynas/percona/mysql-server/include -I/home/laurynas/percona/mysql-server/sql -I/home/laurynas/percona/mysql-server/regex -I/home/laurynas/percona/mysql-server/extra/yassl/include -I/home/laurynas/percona/mysql-server/extra/yassl/taocrypt/include -I/home/laurynas/percona/mysql-server/zlib -DUNIV_LINUX -D_GNU_SOURCE=1 -o CMakeFiles/innobase.dir/sync/sync0sync.c.I -S -fverbose-asm /home/laurynas/percona/mysql-server/storage/innobase/sync/sync0sync.c -fno-inline Current MySQL 5.5.46 os_atomic_clear is MOV: .p2align 4,,15 .type os_atomic_clear, @function os_atomic_clear: .LFB71: .file 3 "/home/laurynas/percona/mysql-server/storage/innobase/include/os0sync.h" .loc 3 338 0 .cfi_startproc .LVL12: pushq %rbp # .cfi_def_cfa_offset 16 .cfi_offset 6, -16 .loc 3 339 0 movb $0, (%rdi) #,,* ptr .loc 3 338 0 movq %rsp, %rbp #, .cfi_def_cfa_register 6 .loc 3 340 0 popq %rbp # .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE71: .size os_atomic_clear, .-os_atomic_clear With a quick and dirty fix to prefer IB_STRONG_MEMORY_MODEL it's XCHG: .p2align 4,,15 .type os_atomic_clear, @function os_atomic_clear: .LFB71: .file 3 "/home/laurynas/percona/mysql-server/storage/innobase/include/os0sync.h" .loc 3 368 0 .cfi_startproc .LVL12: pushq %rbp # .cfi_def_cfa_offset 16 .cfi_offset 6, -16 .loc 3 369 0 xorl %eax, %eax # tmp62 .loc 3 368 0 movq %rsp, %rbp #, .cfi_def_cfa_register 6 .loc 3 369 0 xchgb (%rdi), %al #,* ptr, D.14153 .loc 3 370 0 popq %rbp # .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE71: .size os_atomic_clear, .-os_atomic_clear
[3 Dec 2015 23:43]
Sunny Bains
diff --git a/storage/innobase/include/os0sync.h b/storage/innobase/include/os0sync.h index 0762840..8768d43 100644 --- a/storage/innobase/include/os0sync.h +++ b/storage/innobase/include/os0sync.h @@ -336,7 +336,7 @@ static inline void os_atomic_clear(volatile lock_word_t* ptr) { - __atomic_clear(ptr, __ATOMIC_RELEASE); + __atomic_clear(ptr, __ATOMIC_SEQ_CST); } # elif defined(IB_STRONG_MEMORY_MODEL) The only thing that needs to be investigated is if SEQ_CST is too strong on non Intel HW e.g., ARM64.
[22 Mar 2018 20:57]
MySQL Verification Team
https://bugs.mysql.com/bug.php?id=77795 marked as duplicate of this one.