Description:
ndbd reproducibly crashes at launch when built for sparc64 using current Fedora tools.
The cause turns out to be a klugy declaration of a local Signal data structure which
fails to meet the required 8-byte alignment of the struct type. In recent gcc versions,
the memcpy at the end of Dbacc::execREAD_PSEUDO_REQ can be optimized into an aligned
8-byte load and store, and on an architecture that is actually picky about alignment,
this leads to SIGBUS. Even on architectures that are more forgiving, the inadequate
alignment of the struct is surely leading to wasted cycles.
You have a clean way to declare suitably aligned and sized Signal structs, which is the
parameterized type SignalT, but somebody didn't know enough to use it. The immediate
cause of the crash is the bogus declaration in dbtup/DbtupRoutines.cpp. I went looking
for other similar declarations, and found one in ndbfs/AsyncFile.cpp. There might be
others --- it would be prudent to do a more thorough code search.
BTW, that memcpy itself looks like it might be an old workaround for this same bug, in
view of the comment showing that someone didn't fully understand why he was putting it
in. However, the tools have now caught up to the point where the memcpy can be optimized
into a 8-byte load/store too, so you need to fix it for real.
How to repeat:
Build mysql with ndbcluster enabled on sparc64 using a reasonably recent gcc (4.4.1 or
up), and run the regression tests. It will crash instantly on the first ndb test.
Suggested fix:
diff -Naur mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp
mysql-5.1.39/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp
--- mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp 2009-09-04
12:21:18.000000000 -0400
+++ mysql-5.1.39/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp 2009-10-17
14:24:56.000000000 -0400
@@ -1144,8 +1144,9 @@
KeyReqStruct *req_struct,
Uint32* outBuffer)
{
- Uint32 tmp[sizeof(SignalHeader)+25];
- Signal * signal = (Signal*)&tmp;
+ SignalT<25> signalT;
+ Signal *signal= (Signal*)&signalT;
+
switch(attrId){
case AttributeHeader::FRAGMENT:
* outBuffer = fragptr.p->fragmentId;
diff -Naur mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp
mysql-5.1.39/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp
--- mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp 2009-09-04
12:21:19.000000000 -0400
+++ mysql-5.1.39/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp 2009-10-17
14:26:21.000000000 -0400
@@ -529,8 +529,8 @@
{
off_t off = 0;
const off_t sz = request->par.open.file_size;
- Uint32 tmp[sizeof(SignalHeader)+25];
- Signal * signal = (Signal*)(&tmp[0]);
+ SignalT<25> signalT;
+ Signal *signal= (Signal*)&signalT;
FsReadWriteReq* req = (FsReadWriteReq*)signal->getDataPtrSend();
Uint32 index = 0;
Description: ndbd reproducibly crashes at launch when built for sparc64 using current Fedora tools. The cause turns out to be a klugy declaration of a local Signal data structure which fails to meet the required 8-byte alignment of the struct type. In recent gcc versions, the memcpy at the end of Dbacc::execREAD_PSEUDO_REQ can be optimized into an aligned 8-byte load and store, and on an architecture that is actually picky about alignment, this leads to SIGBUS. Even on architectures that are more forgiving, the inadequate alignment of the struct is surely leading to wasted cycles. You have a clean way to declare suitably aligned and sized Signal structs, which is the parameterized type SignalT, but somebody didn't know enough to use it. The immediate cause of the crash is the bogus declaration in dbtup/DbtupRoutines.cpp. I went looking for other similar declarations, and found one in ndbfs/AsyncFile.cpp. There might be others --- it would be prudent to do a more thorough code search. BTW, that memcpy itself looks like it might be an old workaround for this same bug, in view of the comment showing that someone didn't fully understand why he was putting it in. However, the tools have now caught up to the point where the memcpy can be optimized into a 8-byte load/store too, so you need to fix it for real. How to repeat: Build mysql with ndbcluster enabled on sparc64 using a reasonably recent gcc (4.4.1 or up), and run the regression tests. It will crash instantly on the first ndb test. Suggested fix: diff -Naur mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp mysql-5.1.39/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp --- mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp 2009-09-04 12:21:18.000000000 -0400 +++ mysql-5.1.39/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp 2009-10-17 14:24:56.000000000 -0400 @@ -1144,8 +1144,9 @@ KeyReqStruct *req_struct, Uint32* outBuffer) { - Uint32 tmp[sizeof(SignalHeader)+25]; - Signal * signal = (Signal*)&tmp; + SignalT<25> signalT; + Signal *signal= (Signal*)&signalT; + switch(attrId){ case AttributeHeader::FRAGMENT: * outBuffer = fragptr.p->fragmentId; diff -Naur mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp mysql-5.1.39/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp --- mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp 2009-09-04 12:21:19.000000000 -0400 +++ mysql-5.1.39/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp 2009-10-17 14:26:21.000000000 -0400 @@ -529,8 +529,8 @@ { off_t off = 0; const off_t sz = request->par.open.file_size; - Uint32 tmp[sizeof(SignalHeader)+25]; - Signal * signal = (Signal*)(&tmp[0]); + SignalT<25> signalT; + Signal *signal= (Signal*)&signalT; FsReadWriteReq* req = (FsReadWriteReq*)signal->getDataPtrSend(); Uint32 index = 0;