Bug #48132 Misaligned Signal variable leads to ndbd crash on some architectures
Submitted: 17 Oct 2009 21:05 Modified: 9 Mar 2016 16:16
Reporter: [ name withheld ] Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1 OS:Any (recent gcc needed)
Assigned to: CPU Architecture:Any
Tags: 5.1.39, Contribution

[17 Oct 2009 21:05] [ name withheld ]
Description:
ndbd reproducibly crashes at launch when built for sparc64 using current Fedora tools.  The cause turns out to be a klugy declaration of a local Signal data structure which fails to meet the required 8-byte alignment of the struct type.  In recent gcc versions, the memcpy at the end of Dbacc::execREAD_PSEUDO_REQ can be optimized into an aligned 8-byte load and store, and on an architecture that is actually picky about alignment, this leads to SIGBUS.  Even on architectures that are more forgiving, the inadequate alignment of the struct is surely leading to wasted cycles.

You have a clean way to declare suitably aligned and sized Signal structs, which is the parameterized type SignalT, but somebody didn't know enough to use it.  The immediate cause of the crash is the bogus declaration in dbtup/DbtupRoutines.cpp.  I went looking for other similar declarations, and found one in ndbfs/AsyncFile.cpp.  There might be others --- it would be prudent to do a more thorough code search.

BTW, that memcpy itself looks like it might be an old workaround for this same bug, in view of the comment showing that someone didn't fully understand why he was putting it in.  However, the tools have now caught up to the point where the memcpy can be optimized into a 8-byte load/store too, so you need to fix it for real.

How to repeat:
Build mysql with ndbcluster enabled on sparc64 using a reasonably recent gcc (4.4.1 or up), and run the regression tests.  It will crash instantly on the first ndb test.

Suggested fix:
diff -Naur mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp mysql-5.1.39/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp
--- mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp	2009-09-04 12:21:18.000000000 -0400
+++ mysql-5.1.39/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp	2009-10-17 14:24:56.000000000 -0400
@@ -1144,8 +1144,9 @@
                    KeyReqStruct *req_struct,
                    Uint32* outBuffer)
 {
-  Uint32 tmp[sizeof(SignalHeader)+25];
-  Signal * signal = (Signal*)&tmp;
+  SignalT<25> signalT;
+  Signal *signal= (Signal*)&signalT;
+
   switch(attrId){
   case AttributeHeader::FRAGMENT:
     * outBuffer = fragptr.p->fragmentId;
diff -Naur mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp mysql-5.1.39/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp
--- mysql-5.1.39.orig/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2009-09-04 12:21:19.000000000 -0400
+++ mysql-5.1.39/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2009-10-17 14:26:21.000000000 -0400
@@ -529,8 +529,8 @@
   {
     off_t off = 0;
     const off_t sz = request->par.open.file_size;
-    Uint32 tmp[sizeof(SignalHeader)+25];
-    Signal * signal = (Signal*)(&tmp[0]);
+    SignalT<25> signalT;
+    Signal *signal= (Signal*)&signalT;
     FsReadWriteReq* req = (FsReadWriteReq*)signal->getDataPtrSend();
 
     Uint32 index = 0;
[19 Oct 2009 9:00] Gustaf Thorslund
Since awhile back now the cluster source lives in it's own tree and isn't merged into the main server tree. Looking at the source in the MySQL Cluster 7.0 tree it's been changed some since the version you're looking at and the patch you're suggesting wouldn't even apply.
[9 Mar 2016 15:41] Gustaf Thorslund
Posted by developer:
 
If looking in right tree, this was fixed awhile ago.

-->
commit 9b995aa2699f8b14e483f960a694269b1c89edab
Author: jonas@perch.ndb.mysql.com <>
Date:   Sun Dec 23 13:52:25 2007 +0100

    ndb - change long signal interface (in blocks) so that
    1) sections are never forwarded, but need to be explicitly handled
    2) unhandled section(s) will give ndbassert-ion failure (in post execXXX-handling)

diff --git a/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp b/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp
index 54d3a97..47431bd 100644
--- a/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp
+++ b/storage/ndb/src/kernel/blocks/dbtup/DbtupRoutines.cpp
@@ -2185,8 +2185,9 @@ Dbtup::read_pseudo(const Uint32 * inBuffer, Uint32 inPos,
   Uint32* outBuffer = outBuf + ((outPos - 1) >> 2);
   
   Uint32 sz;
-  Uint32 tmp[sizeof(SignalHeader)+25];
-  Signal * signal = (Signal*)&tmp;
+  SignalT<4> signalT;
+  Signal * signal = (Signal*)&signalT;
+  bzero(signal, sizeof(signalT));
   switch(attrId){
   case AttributeHeader::READ_PACKED:
   case AttributeHeader::READ_ALL:
-->

And some reorganisation

-->
commit 3b94736a6c6acb11d205e97fd48ac0af801f80ad
Author: stewart@flamingspork.com[stewart] <>
Date:   Thu Nov 15 11:30:00 2007 +1100

    [PATCH] Cleanup AsyncFile, make modular and nice to read
    
    Move all platform specific AsyncFile functionality out into sep classes.
    
    Generic functionality in AsyncFile, POSIX specific in PosixAsyncFile.
    
    In future, will have azioAsyncFile (maybe a mysysAsyncFile) and can have
    specific numbers of each instantiated in kernel. Then, NDBFS can decide
    which AsyncFile should be used for that file - e.g. we can keep the number
    of azioAsyncFiles to a minimum.
-->

Then fixed in PosixAsyncFile.

-->
commit 20a2d59eb480d83ebeb6ada58fe1149574d2c92d
Author: Jonas Oreland <jonas@mysql.com>
Date:   Tue Aug 17 11:43:19 2010 +0200

    ndb - fix strict aliasing problems

diff --git a/storage/ndb/src/kernel/blocks/ndbfs/PosixAsyncFile.cpp b/storage/ndb/src/kernel/blocks/ndbfs/PosixAsyncFile.cpp
index 3f888ed..74a97ce 100644
--- a/storage/ndb/src/kernel/blocks/ndbfs/PosixAsyncFile.cpp
+++ b/storage/ndb/src/kernel/blocks/ndbfs/PosixAsyncFile.cpp
@@ -312,7 +312,7 @@ no_odirect:
     off_t off = 0;
     const off_t sz = request->par.open.file_size;
     SignalT<25> tmp;
-    Signal * signal = (Signal*)(&tmp);
+    Signal * signal = new (&tmp) Signal(0);
     bzero(signal, sizeof(tmp));
     FsReadWriteReq* req = (FsReadWriteReq*)signal->getDataPtrSend();
-->

/Gustaf