Bug #22320 | my_atomic-t unit test fails | ||
---|---|---|---|
Submitted: | 13 Sep 2006 17:57 | Modified: | 14 Oct 2010 13:09 |
Reporter: | Guilhem Bichot | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | Celosia (M3) | OS: | Linux (Ubuntu x86 debug) |
Assigned to: | Davi Arnaut | CPU Architecture: | Any |
Tags: | pb2, regression, test failure |
[13 Sep 2006 17:57]
Guilhem Bichot
[13 Sep 2006 18:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/11878 ChangeSet@1.2323, 2006-09-13 19:58:57+02:00, guilhem@gbichot3.local +1 -0 fixes for the my_atomic-t unit test: - compiler warning - detection of pthread_create failure (you will see this message only if you run with "make test-verbose" in unittest; otherwise unit.pl masks all messages from the test but "ok" ones. - the test fails randomly on some machines (I filed it as BUG#22320), on one host it looks like a crash at exit() which a sleep(2) makes disappear. So I add the sleep(2), which can be removed when BUG#22320 is fixed.
[13 Sep 2006 18:52]
Guilhem Bichot
It may well be a problem with Test::Harness (i.e. not a crash of the test program itself), because this: (while true; do HARNESS_VERBOSE=1 ./mysys/my_atomic-t || exit 1; done) never fails.
[13 Sep 2006 20:01]
Kristian Nielsen
This seems to be a Perl problem. Here is a simpler way to reproduce: perl -e 'for(;;) { open(FH, "mysys/my_atomic-t|") || die "open"; 1 while(<FH>); if(!close FH) { print "close() failed: ", 0+$!, ": $!\n"; exit 1}; print "ok: $?\n"; }' I get this (fails randomly): ok: 0 close() failed: 10: No child processes
[13 Sep 2006 22:12]
Kristian Nielsen
I got curious, and investigated a bit more. It actually is not a Perl bug, looks more like a kernel/NPTL bug, since the problem can be repeated also with a C program (attached). Basically, there seems to be a race where the parent fork()'s and exec()'s the my_atomit-t program, then calls waitpid(), but sometimes waitpid() fails with ECHILD, so the exit status of the child is lost. My guess is that the main thread in my_atomic-t exits before one or more other threads. Then in the small window between the exit of the main thread and the exit of the last thread, the waitpid() wrongly fails, because the main thread (= child pid) is gone, and the exit status for the whole thread group (=pid) has not been stored yet. Note that this failure in Pushbuild is only seen on hosts rh-x86-32 and rhas4-ia64, both of which have kernel 2.6.9-22.0.1. Also note that it only fails when using NPTL: $ LD_ASSUME_KERNEL=2.4.20 ~/bug22320 mysys/my_atomic-t Child=3958 waitpid=3958 wait4: No child processes $ LD_ASSUME_KERNEL=2.4.19 ~/bug22320 mysys/my_atomic-t Child=9841 waitpid=9841 Child=10218 waitpid=10218 Child=10528 waitpid=10528 Child=10837 waitpid=10837 Child=11164 waitpid=11164 ... If my guess is correct, we can fix it by explicitly joining the threads in my_atomic-t, instead of spawning then PTHREAD_CREATE_DETACHED. Or maybe a kernel upgrade ...
[13 Sep 2006 22:13]
Kristian Nielsen
C program to expose the kernel/NPTL bug with my_atomic-t
Attachment: bug22320.c (text/x-csrc), 956 bytes.
[4 Dec 2006 10:13]
Guilhem Bichot
if I remove the sleep(2) and use pthread_join() instead of a threads counter, it still fails on rh-x86-32: mysys/my_atomic-t.............................# N CPUs: 2, atomic ops: dubious ... (all subtests say "ok") and then: Test returned status 255 (wstat -1, 0xffffffff) test program seems to have generated a core after all the subtests completed successfully
[4 Feb 2010 18:17]
Alexander Nozdrin
my_atomic-t now fails in Celosia (M3) on 'Ubuntu x86 debug only'. Symptoms: mysys/my_atomic-t......FAILED--Further testing stopped: Signal 11 thrown It seems that it does not fail in 5.1 anymore. It also seems to work in Betony (M2). So, it might be regression. Requesting re-triage.
[4 Mar 2010 13:52]
Olav Sandstå
Running the my_atomic-t test manually gives the following output: # N CPUs: 2, atomic ops: gcc-x86lock 1..6 ok 1 - my_atomic_initialize() returned 0 # Testing my_atomic_add32 with 30 threads, 3000 iterations... ok 2 - tested my_atomic_add32 in 0.001289 secs (0) # Testing my_atomic_fas32 with 30 threads, 3000 iterations... ok 3 - tested my_atomic_fas32 in 0.001762 secs (0) # Testing my_atomic_cas32 with 30 threads, 3000 iterations... ok 4 - tested my_atomic_cas32 in 0.002988 secs (0) Bail out! Signal 11 thrown
[4 Mar 2010 13:55]
Olav Sandstå
Running the test in gdb gives the following call stack: #0 0x0804d696 in my_atomic_cas64 (U_a={i = 0x807e860, u = 0x807e860}, U_cmp= {i = 0xbf870e78, u = 0xbf870e78}, U_set= {i = 1152956689784258560, u = 1152956689784258560}) at ../../include/my_atomic.h:222 #1 0x0804d647 in my_atomic_add64 (U_a={i = 0x807e860, u = 0x807e860}, U_v= {i = 1152956689784258560, u = 1152956689784258560}) at ../../include/my_atomic.h:230 #2 0x0804da86 in do_tests () at my_atomic-t.c:176 #3 0x0804d37f in main (argc=1, argv=0xbf870f74) at thr_template.c:79
[4 Mar 2010 14:05]
Olav Sandstå
Note that this crash occurs when I compiled using gcc version 4.2.4 on a Ubuntu 8.04.3 LTS server. When doing the same using gcc version 4.3.2 on a Ubuntu 8.04.02 server the test runs fine.
[5 Jul 2010 12:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/112882 3463 Davi Arnaut 2010-07-05 Bug#22320: my_atomic-t unit test fails The atomic operations implementation on 5.1 has a few problems, which might cause tests to abort randomly. Since no code in 5.1 uses atomic operations, simply remove the code.
[5 Jul 2010 13:26]
Davi Arnaut
Removal queued to mysql-5.1-bugteam, null merged into mysql-trunk-merge.
[6 Jul 2010 0:48]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/112922 3086 Davi Arnaut 2010-07-05 Bug#22320: my_atomic-t unit test fails Bug#52261: 64 bit atomic operations do not work on Solaris i386 gcc in debug compilation One of the various problems was that the source operand to CMPXCHG8b was marked as a input/output operand, causing GCC to use the EBX register as the destination register for the CMPXCHG8b instruction. This could lead to crashes as the EBX register is also implicitly used by the instruction, causing the value to be potentially garbaged and a protection fault once the value is used to access a position in memory. Another problem was the lack of proper clobbers for the atomic operations and, also, a discrepancy between the implementations for the Compare and Set operation. The specific problems are described and fixed by Kristian Nielsen patches: Patch: 1 Fix bugs in my_atomic_cas*(val,cmp,new) that *cmp is accessed after CAS succeds. In the gcc builtin implementation, problem was that *cmp was read again after atomic CAS to check if old *val == *cmp; this fails if CAS is successful and another thread modifies *cmp in-between. In the x86-gcc implementation, problem was that *cmp was set also in the case of successful CAS; this means there is a window where it can clobber a value written by another thread after successful CAS. Patch 2: Add a GCC asm "memory" clobber to primitives that imply a memory barrier. This signifies to GCC that any potentially aliased memory must be flushed before the operation, and re-read after the operation, so that read or modification in other threads of such memory values will work as intended. In effect, it makes these primitives work as memory barriers for the compiler as well as the CPU. This is better and more correct than adding "volatile" to variables. @ include/atomic/gcc_builtins.h Do not read from *cmp after the operation as it might be already gone if the operation was successful. @ include/atomic/nolock.h Prefer system provided atomics over the broken x86 asm. @ include/atomic/x86-gcc.h Do not mark source operands as input/output operands. Add proper memory clobbers. @ include/my_atomic.h Add notes about my_atomic_add and my_atomic_cas behaviors. @ unittest/mysys/my_atomic-t.c Remove work around, if it fails, there is either a problem with the atomic operations code or the specific compiler version should be black-listed.
[8 Jul 2010 16:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/113160 3095 Davi Arnaut 2010-07-08 Bug#22320: my_atomic-t unit test fails Bug#52261: 64 bit atomic operations do not work on Solaris i386 gcc in debug compilation One of the various problems was that the source operand to CMPXCHG8b was marked as a input/output operand, causing GCC to use the EBX register as the destination register for the CMPXCHG8b instruction. This could lead to crashes as the EBX register is also implicitly used by the instruction, causing the value to be potentially garbaged and a protection fault once the value is used to access a position in memory. Another problem was the lack of proper clobbers for the atomic operations and, also, a discrepancy between the implementations for the Compare and Set operation. The specific problems are described and fixed by Kristian Nielsen patches: Patch: 1 Fix bugs in my_atomic_cas*(val,cmp,new) that *cmp is accessed after CAS succeds. In the gcc builtin implementation, problem was that *cmp was read again after atomic CAS to check if old *val == *cmp; this fails if CAS is successful and another thread modifies *cmp in-between. In the x86-gcc implementation, problem was that *cmp was set also in the case of successful CAS; this means there is a window where it can clobber a value written by another thread after successful CAS. Patch 2: Add a GCC asm "memory" clobber to primitives that imply a memory barrier. This signifies to GCC that any potentially aliased memory must be flushed before the operation, and re-read after the operation, so that read or modification in other threads of such memory values will work as intended. In effect, it makes these primitives work as memory barriers for the compiler as well as the CPU. This is better and more correct than adding "volatile" to variables. @ include/atomic/gcc_builtins.h Do not read from *cmp after the operation as it might be already gone if the operation was successful. @ include/atomic/nolock.h Prefer system provided atomics over the broken x86 asm. @ include/atomic/x86-gcc.h Do not mark source operands as input/output operands. Add proper memory clobbers. @ include/my_atomic.h Add notes about my_atomic_add and my_atomic_cas behaviors. @ unittest/mysys/my_atomic-t.c Remove work around, if it fails, there is either a problem with the atomic operations code or the specific compiler version should be black-listed.
[23 Jul 2010 12:28]
Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100723121820-jryu2fuw3pc53q9w) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:18)
[23 Jul 2010 12:35]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100723121929-90e9zemk3jkr2ocy) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (pib:18)
[23 Jul 2010 21:30]
Davi Arnaut
Queued to mysql-trunk-bugfixing
[4 Aug 2010 7:50]
Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100731131027-1n61gseejyxsqk5d) (version source revid:marko.makela@oracle.com-20100621094008-o9fa153s3f09merw) (merge vers: 5.1.49) (pib:18)
[4 Aug 2010 8:10]
Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:marko.makela@oracle.com-20100621094008-o9fa153s3f09merw) (merge vers: 5.1.49) (pib:18)
[4 Aug 2010 8:26]
Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:marko.makela@oracle.com-20100621094008-o9fa153s3f09merw) (merge vers: 5.1.49) (pib:18)
[4 Aug 2010 9:05]
Bugs System
Pushed into mysql-next-mr (revid:alik@ibmvm-20100804081630-ntapn8bf9pko9vj3) (version source revid:marko.makela@oracle.com-20100621094008-o9fa153s3f09merw) (pib:20)
[12 Aug 2010 19:43]
Paul DuBois
Noted in 5.5.6 changelog. Problems in the atomic operations implementation could lead to server crashes.
[19 Aug 2010 15:41]
Bugs System
Pushed into mysql-5.1 5.1.51 (revid:build@mysql.com-20100819151858-muaaor6jojb5ouzj) (version source revid:build@mysql.com-20100819151858-muaaor6jojb5ouzj) (merge vers: 5.1.51) (pib:20)
[14 Oct 2010 8:37]
Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (version source revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)
[14 Oct 2010 8:52]
Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (version source revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (merge vers: 5.1.51-ndb-6.3.39) (pib:21)
[14 Oct 2010 9:07]
Bugs System
Pushed into mysql-5.1-telco-6.2 5.1.51-ndb-6.2.19 (revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (version source revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (merge vers: 5.1.51-ndb-6.2.19) (pib:21)
[14 Oct 2010 13:09]
Jon Stephens
Also noted in the 5.1.51 changelog. No additional changelog entries required. Closed.