Bug #63451 atomic/x86-gcc.h:make_atomic_cas_body64 potential miscompilation bug
Submitted: 28 Nov 2011 3:37 Modified: 3 Feb 2014 10:36
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:5.5 OS:Any
Assigned to: CPU Architecture:Any

[28 Nov 2011 3:37] Laurynas Biveinis
Description:
The make_atomic_cas_body64 implementation on IA32 with GCC but without GCC builtins might get miscompiled due to a wrong constraint. The code reads

#define make_atomic_cas_body64                                    \
  asm volatile ("push %%ebx;"                                     \
                "movl (%%ecx), %%ebx;"                            \
                "movl 4(%%ecx), %%ecx;"                           \
                LOCK_prefix "; cmpxchg8b %0;"                     \
                "setz %2; pop %%ebx"                              \
                : "=m" (*a), "+A" (*cmp), "=c" (ret)              \
                : "c" (&set), "m" (*a)                            \
                : "memory", "esp")
#endif

Note that *a is specified twice in the constraint list: as the 1st output arg ("=m" (*a)) and as the 5th input arg ("m" (*a)). The code obviously expects these two to be a single memory location (i.e. single arg of CMPXCHG8B that is both input and output), but such constraints do not guarantee this. GCC is free to use two separate memory locations - one for the input value of *a and one for the output one. Which then results to CMPXCHG8B argument pointing to uninitialized memory and runtime crash.

This is pointed out by the GCC manual:
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Extended-Asm.html#Extended-Asm

<quote>
The ordinary output operands must be write-only; GCC will assume that the values in these operands before the instruction are dead and need not be generated.
</quote>

I.e. there is no need for GCC to store the input value of *a to the 1st arg.

And

<quote>
Only a number in the constraint can guarantee that one operand will be in the same place as another. The mere fact that foo is the value of both operands is not enough to guarantee that they will be in the same place in the generated assembler code. The following would not work reliably:

     asm ("combine %2,%0" : "=r" (foo) : "r" (foo), "g" (bar));

Various optimizations or reloading could cause operands 0 and 1 to be in different registers; GCC knows no reason not to do so. For example, the compiler might find a copy of the value of foo in one register and use it for operand 1, but generate the output operand 0 in a different register (copying it afterward to foo's own address). Of course, since the register for operand 1 is not even mentioned in the assembler code, the result will not work, but GCC can't tell that. 
</quote>

How to repeat:
The current MySQL code does not trigger a miscompilation. We have observed it in Percona Server 5.1 where we have backported the MySQL 5.5 atomic access library.

Still the fix is very simple I suggest to fix this proactively without waiting for miscompilation reports.

Suggested fix:
Instead of two separate input and output args use a single input/output arg: "+m" (*a)

Or require at least -march=i686 for compiling MySQL and drop the code altogether.
[2 Jan 2012 20:10] Davi Arnaut
Simply marking the argument as a input/output operand is wrong too because of the semantics of CMPXCHG8b. The major problem is that the atomic API is too complex for a reasonable implementation in assembly.
[4 Jan 2012 10:00] Laurynas Biveinis
Davi -

More out of curiosity than anything else, could you elaborate why marking CMPXCHG8B arg as input/output is wrong?

In any case I think we are spending way too much time with this. -march=i686 seems to be the best solution.
[4 Jan 2012 15:35] Davi Arnaut
There is an explanation in the commit that changed it.
[30 Jul 2013 17:28] MySQL Verification Team
Both arguments are correct. 

This bug is verified and will be processed.
[5 Nov 2013 9:55] Marc ALFF
For context, the previous change set comment that modified the code is:

Bug#22320: my_atomic-t unit test fails
Bug#52261: 64 bit atomic operations do not work on Solaris i386
           gcc in debug compilation

One of the various problems was that the source operand to
CMPXCHG8b was marked as a input/output operand, causing GCC
to use the EBX register as the destination register for the
CMPXCHG8b instruction. This could lead to crashes as the EBX
register is also implicitly used by the instruction, causing
the value to be potentially garbaged and a protection fault
once the value is used to access a position in memory.

Another problem was the lack of proper clobbers for the atomic
operations and, also, a discrepancy between the implementations
for the Compare and Set operation. The specific problems are
described and fixed by Kristian Nielsen patches:

Patch: 1

Fix bugs in my_atomic_cas*(val,cmp,new) that *cmp is accessed
after CAS succeds.

In the gcc builtin implementation, problem was that *cmp was
read again after atomic CAS to check if old *val == *cmp;
this fails if CAS is successful and another thread modifies
*cmp in-between.

In the x86-gcc implementation, problem was that *cmp was set
also in the case of successful CAS; this means there is a
window where it can clobber a value written by another thread
after successful CAS.

Patch 2:

Add a GCC asm "memory" clobber to primitives that imply a
memory barrier.

This signifies to GCC that any potentially aliased memory
must be flushed before the operation, and re-read after the
operation, so that read or modification in other threads of
such memory values will work as intended.

In effect, it makes these primitives work as memory barriers
for the compiler as well as the CPU. This is better and more
correct than adding "volatile" to variables.
[22 Nov 2013 14:59] Paul DuBois
Noted in 5.6.16, 5.7.4 changelogs.

The make_atomic_cas_body64 implementation on IA32 with gcc but
without gcc builtins could be miscompiled due to an incorrect
constraint. The patch also causes MySQL to use builtin atomics when
compiled using Clang.
[3 Feb 2014 10:36] Laurynas Biveinis
5.6$ bzr log -r 5641
------------------------------------------------------------
revno: 5641
committer: Nisha Gopalakrishnan <nisha.gopalakrishnan@oracle.com>
branch nick: mysql-5.6-17242996
timestamp: Thu 2013-11-21 10:36:45 +0530
message:
  BUG#17242996: ATOMIC/X86-GCC.H:MAKE_ATOMIC_CAS_BODY64 POTENTIAL
                MISCOMPILATION BUG
  
  Analysis:
  --------
  The make_atomic_cas_body64 implementation on IA32 with GCC but
  without GCC builtins might get miscompiled due to a wrong
  constraint.
  
  make_atomic_cas_body64 implementation uses CMPXCHG8B with a
  single memory operand as input/output operand. However the
  constraint '+' was not used to indicate the same due to
  issue reported by Bug#11746008. Hence it could result in
  using two different memory locations, thus accessing an
  uninitialized memory with probable runtime crash.
  
  Fix:
  ---
  The assembly atomic implementation (x86-gcc.h) is not used on
  Windows and Solaris. The oldest version of GCC supported by
  Linux variants on any platform is GCC 4.1.2 which supports
  builtin atomics. On MAC, GCC versions higher than 4.1.2 or
  Clang is used which supports builtin atomics as well.
  
  Hence this patch removes assembly atomic implementation
  (x86-gcc.h) and relies on the GCC builtins. Upon using
  an older unsupported versions of GCC(A warning is flagged),
  we may see some performance regressions since we fallback
  to rw locks.