Bug #32708 segfault in TransporterFacade::remove_from_cond_wait_queue
Submitted: 26 Nov 2007 7:58 Modified: 18 Jan 2009 9:51
Reporter: Monty Taylor Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: NDB API Severity:S1 (Critical)
Version:5.1.22-telco-6.3.5 OS:Linux
Assigned to: Assigned Account CPU Architecture:Any

[26 Nov 2007 7:58] Monty Taylor
Description:
While running NDB/J test code for a client, we run in to a condition where we get a segfault in TransporterFacade::remove_from_cond_wait_queue(). Since this isn't a method JNI is calling directly, nor does this method operate on any objects that JNI is really modifying, it doesn't seem to be a direct cause/effect... although it is certainly possible that something in the JNI layer is exacerbating the problem. 

I've tracked down the problem to the calling method TransporterFacade::rem_last_from_cond_wait_queue(). In some instances (and I don't know why) tWaiter = cond_wait_array[index].cond_wait_object == NULL and thus when passed to remove_from_cond_wait_queue where it is assumed to be a good object, it causes the segfault. 

How to repeat:
Download and build the latest version of ndb-connectors: 

https://launchpad.net/ndb-connectors/telco-6.3/0.5.6.3.5.5/+download/ndb-connectors-0.5.6....

In the java dir, run: 

java -Djava.library.path=.libs  -Dcom.mysql.jdbc.testsuite.url=jdbc:mysql://localhost/test -classpath .:./lib/mysql-connector-java-5.0.4-bin.jar:./lib/junit.jar  junit.textui.TestRunner testsuite.ndbj.OutOfConnectionProblemTest

Suggested fix:
diff -urNad telco-5.1-ndb-6.3.6~/storage/ndb/src/ndbapi/TransporterFacade.cpp telco-5.1-ndb-6.3.6/storage/ndb/src/ndbapi/TransporterFacade.cpp
--- telco-5.1-ndb-6.3.6~/storage/ndb/src/ndbapi/TransporterFacade.cpp   2007-11-07 14:23:42.000000000 -0800
+++ telco-5.1-ndb-6.3.6/storage/ndb/src/ndbapi/TransporterFacade.cpp    2007-11-24 20:50:09.239153362 -0800
@@ -649,7 +649,8 @@
   if (last_in_cond_wait == MAX_NO_THREADS)
     return NULL;
   tWaiter = cond_wait_array[index].cond_wait_object;
-  remove_from_cond_wait_queue(tWaiter);
+  if (tWaiter != NULL)
+    remove_from_cond_wait_queue(tWaiter);
   return tWaiter;
 }
[8 Jul 2008 0:40] Brian Morin
I ran into the same problem with a high performance C++ NDBAPI app using CGE 6.3.15 GPL.  Applying the suggested fix solved the problem.
[13 Jul 2008 19:02] Jonas Oreland
Hi Brian,

Did you manage to get this reproducable?
Cause we never did...if so can you upload a program that repeats the problem
together with a description of environment that you have

/Jonas
[19 Jan 2009 0:02] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[6 Oct 2010 12:41] Jon Stephens
See also BUG#51775.