Bug #48132 Misaligned Signal variable leads to ndbd crash on some architectures
Submitted: 17 Oct 23:05 Modified: 19 Oct 10:24
Reporter: [ name withheld ]
Status: Analyzing
Category:Server: Cluster Severity:S1 (Critical)
Version:mysql-5.1 OS:Any (recent gcc needed)
Assigned to: Gustaf Thorslund Target Version:
Tags: Contribution, 5.1.39
Triage: Triaged: D3 (Medium) / R6 (Needs Assessment) / E6 (Needs Assessment)

[17 Oct 23:05] [ name withheld ]
Description:
ndbd reproducibly crashes at launch when built for sparc64 using current Fedora tools. 
The cause turns out to be a klugy declaration of a local Signal data structure which
fails to meet the required 8-byte alignment of the struct type.  In recent gcc versions,
the memcpy at the end of Dbacc::execREAD_PSEUDO_REQ can be optimized into an aligned
8-byte load and store, and on an architecture that is actually picky about alignment,
this leads to SIGBUS.  Even on architectures that are more forgiving, the inadequate
alignment of the struct is surely leading to wasted cycles.

You have a clean way to declare suitably aligned and sized Signal structs, which is the
parameterized type SignalT, but somebody didn't know enough to use it.  The immediate
cause of the crash is the bogus declaration in dbtup/DbtupRoutines.cpp.  I went looking
for other similar declarations, and found one in ndbfs/AsyncFile.cpp.  There might be
others --- it would be prudent to do a more thorough code search.

BTW, that memcpy itself looks like it might be an old workaround for this same bug, in
view of the comment showing that someone didn't fully understand why he was putting it
in.  However, the tools have now caught up to the point where the memcpy can be optimized
into a 8-byte load/store too, so you need to fix it for real.

How to repeat:
Build mysql with ndbcluster enabled on sparc64 using a reasonably recent gcc (4.4.1 or
up), and run the regression tests.  It will crash instantly on the first ndb test.

Suggested fix:
diff -Naur mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp
mysql-5.1.39/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp
--- mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp	2009-09-04
12:21:18.000000000 -0400
+++ mysql-5.1.39/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp	2009-10-17
14:24:56.000000000 -0400
@@ -1144,8 +1144,9 @@
                    KeyReqStruct *req_struct,
                    Uint32* outBuffer)
 {
-  Uint32 tmp[sizeof(SignalHeader)+25];
-  Signal * signal = (Signal*)&tmp;
+  SignalT<25> signalT;
+  Signal *signal= (Signal*)&signalT;
+
   switch(attrId){
   case AttributeHeader::FRAGMENT:
     * outBuffer = fragptr.p->fragmentId;
diff -Naur mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp
mysql-5.1.39/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp
--- mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2009-09-04
12:21:19.000000000 -0400
+++ mysql-5.1.39/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2009-10-17
14:26:21.000000000 -0400
@@ -529,8 +529,8 @@
   {
     off_t off = 0;
     const off_t sz = request->par.open.file_size;
-    Uint32 tmp[sizeof(SignalHeader)+25];
-    Signal * signal = (Signal*)(&tmp[0]);
+    SignalT<25> signalT;
+    Signal *signal= (Signal*)&signalT;
     FsReadWriteReq* req = (FsReadWriteReq*)signal->getDataPtrSend();
 
     Uint32 index = 0;
[19 Oct 11:00] Gustaf Thorslund
Since awhile back now the cluster source lives in it's own tree and isn't merged into the
main server tree. Looking at the source in the MySQL Cluster 7.0 tree it's been changed
some since the version you're looking at and the patch you're suggesting wouldn't even
apply.