Bug #105522 Crash in pthread_cond_signal on bus error on Mac OS X ARM
Submitted: 10 Nov 2021 19:04 Modified: 3 Dec 2021 0:59
Reporter: Mikael Ronström Email Updates:
Status: Closed Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:8.0.27 OS:MacOS
Assigned to: CPU Architecture:ARM

[10 Nov 2021 19:04] Mikael Ronström
Crash in pthread_cond_signal with bus error

0   ndbmtd                              0x0000000102d1f734 my_print_stacktrace(unsigned char const*, unsigned long) + 72
1   ndbmtd                              0x00000001036d1270 ndb_print_stacktrace() + 104
2   ndbmtd                              0x0000000102cfd434 handler_error + 220
3   libsystem_platform.dylib            0x00000001ae9644e4 _sigtramp + 56
4   libsystem_pthread.dylib             0x00000001ae94c754 pthread_cond_signal + 756
5   ndbmtd                              0x00000001036c9b78 native_cond_signal(_opaque_pthread_cond_t*) + 24
6   ndbmtd                              0x00000001036c9b44 NdbCondition_Signal + 44
7   ndbmtd                              0x00000001034d3ff4 AsyncIoThread::run() + 124
8   ndbmtd                              0x00000001034d3f68 runAsyncIoThread + 24
9   ndbmtd                              0x00000001036ca9a4 ndb_thread_wrapper(void*) + 484
10  libsystem_pthread.dylib             0x00000001ae94d4ec _pthread_start + 148
11  libsystem_pthread.dylib             0x00000001ae9482d0 thread_start + 8
For help with below stacktrace consult:

How to repeat:
First fix Cmake issue in Cmake to handle compilation with Mac OS X 12.0
Next compile using debug version
with the commands:
make -j16
Next run ./mtr --suite=ndb --parallel=6 --force

The crash is not always happening. The pointer passed to pthread_cond_signal is both
allocated and aligned correctly. Unfortunately I haven't been able to produce a core dump
yet. Thus hard to say why it crashes on a Bus error.

Suggested fix:
[10 Nov 2021 19:07] Mikael Ronström
It could be related some memory issue. It works better when one runs with
no parallelism. The machine has only 16 GB of memory.
./mtr --suite=ndb
passes quite a few test cases properly.
[11 Nov 2021 7:24] MySQL Verification Team
Hello Mikael,

Thank you for the report and feedback.

[29 Nov 2021 16:24] Mikael Ronström
By moving the NdbCondition_Signal call in AsyncIoThread.cpp such that it
happens before NdbMutex_Unlock I haven't been able to reproduce the issue.
I have no idea why this solves the problem, presumably the mutex somehow
ensures that we don't use uninitialised memory somehow is my guess since
this code only executes when starting a file system thread.
[3 Dec 2021 0:59] Jon Stephens
Documented fix as follows in the NDB 8.0.29 changelog:

    An unplanned data node shutdown occurred following a bus error
    on Mac OS X for ARM. We fix this by moving the call to
    NdbCondition_Signal() (in AsyncIoThread.cpp) such that it
    executes prior to NdbMutex_Unlock()—that is, into the
    mutex, so that the condition being signalled is not lost during