Bug #30366 NDB fails to start on OS X, 64 bit
Submitted: 10 Aug 2007 19:33 Modified: 17 Jan 2008 22:34
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.1 OS:MacOS (64bit)
Assigned to: Magnus Blåudd CPU Architecture:Any
Tags: sr5_1

[10 Aug 2007 19:33] Joerg Bruehe
I do not know how long-standing this issue is,
definitely *not* new in 5.1.21 -
but sadly, I could not find a bug report for it.
(I have not searched for a working version in history,
AFAIR 5.1.19 was failing, 5.1.20 definitely was.)

The problem is specific to OS X ppc-64bit, has not been observed on any other platform.

The symptom is a bit vague:
When starting the test suite ("mysql-test-run.pl") *without* "--skip-ndbcluster", it hangs.
This hang seems infinite, there is no progress until we notice it.
The existing processes are "ndb_waiter" and "ndb_mgmd",
killing both of them lets the test run finish (no tests attempted),
and "make test-bt" (what we use to run the tests) proceeds to the next run.
If that again involves NDB, the problem occurs again.

The following is a log extract:

Logging: ./mysql-test-run.pl --comment=ps+rowrepl+NDB --force --timer --ps-protocol --mysqld=--binlog-format=row
070809 23:50:47 [Warning] Setting lower_case_table_names=2 because file system for /Users/mysqldev/tmp-200708081852-5.1.21-beta-26112/osx-tiger-ppc-64bit/test/mysql-5.1.21-beta-osx10.4-powerpc-64bit/share/mysql/english/ is case insensitive
MySQL Version 5.1.21

# ps+rowrepl+NDB

Using binlog format 'row'
Using ndbcluster when necessary, mysqld supports it
Setting mysqld to support SSL connections
Using MTR_BUILD_THREAD      = 201
Using MASTER_MYPORT         = 12010
Using MASTER_MYPORT1        = 12011
Using SLAVE_MYPORT          = 12012
Using SLAVE_MYPORT1         = 12013
Using SLAVE_MYPORT2         = 12014
Using NDBCLUSTER_PORT       = 12015
Using IM_PORT               = 12017
Using IM_MYSQLD1_PORT       = 12018
Using IM_MYSQLD2_PORT       = 12019
Killing Possible Leftover Processes
mysql-test-run: WARNING: Found non pid file master-slow.log in /Users/mysqldev/tmp-200708081852-5.1.21-beta-26112/osx-tiger-ppc-64bit/test/mysql-5.1.21-beta-osx10.4-powerpc-64bit/mysql-test/var/run
Removing Stale Files
Creating Directories
Installing Master Database
Installing Master Database
Installing Slave1 Database
Installing Master Cluster
mysql-test-run: *** ERROR: Failed to wait for start of ndb_mgmd
Autoreleasing /tmp/mysql-test-ports:201
make: [test-bt] Error 1 (ignored)

From current "make test-bt", these runs are affected:
./mysql-test-run.pl --comment=ps+rowrepl+NDB --force --timer --ps-protocol --mysqld=--binlog-format=row
./mysql-test-run.pl --comment=NDB --force --timer --with-ndbcluster-only
./mysql-test-run.pl --force --comment=funcs1_ps --ps-protocol --suite=funcs_1
./mysql-test-run.pl --force --comment=funcs2 --suite=funcs_2
./mysql-test-run.pl --force --comment=partitions --suite=parts

These runs are *not*:
./mysql-test-run.pl --comment=debug --force --timer --skip-ndbcluster --skip-rpl --report-features
   (that was a debug build)
./mysql-test-run.pl --comment=normal --force --timer --skip-ndbcluster --report-features
./mysql-test-run.pl --comment=ps --force --timer --skip-ndbcluster --ps-protocol
./mysql-test-run.pl --comment=normal+rowrepl --force --timer --skip-ndbcluster --mysqld=--binlog-format=row
./mysql-test-run.pl --comment=embedded --force --timer --embedded-server --skip-rpl --skip-ndbcluster
./mysql-test-run.pl --force --comment=rpl --suite=rpl
./mysql-test-run.pl --comment=NIST+normal --force --suite=nist
./mysql-test-run.pl --comment=NIST+ps --force --suite=nist --ps-protocol

(I have not checked why "suite=rpl" and "suite=nist" worked,
even without "--skip-ndbcluster".)

I do *not* think it is a load problem from the parallel 32 + 64 bit build+test runs,
because at least the second and following hangs happened when the 32 bit run had already finished.

How to repeat:
Run a build (including NDB) and test on that platform.

I will save the current build tree here:


(using the bug# when I have it).
[13 Aug 2007 7:36] Stewart Smith
Hi Joerg!

Could you please:
- check output of 'ndb_mgm -e "show"' when it "hangs" (pass -c for connectstring for test or set NDB_CONNECTSTRING env variable)
- check (and attach) the cluster log as well as logs for mgm server and data nodes
  (basically *.log in the ndbcluster directory)

This should help in tracking it down.

I gather we don't have a host like this in pb running this sort of build regularly.... :(
[13 Aug 2007 8:01] Joerg Bruehe
I will try to do as requested, but I have to repeat:

This happens while automated builds and tests are running,
so in general we have little chance for manual intervention and analysis.

Currently, the "classic" build is nearly done -
if "advanced" gets into this hang, I can try as requested;
if not, the saved tree must be used to reproduce the bug.
[13 Sep 2007 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[22 Sep 2007 14:13] Joerg Bruehe
Bug was reproduced in a 5.1.22-rc build,
and shown to Cluster support (Stewart).
[29 Oct 2007 16:41] Magnus Blåudd
The mgm client can't connect properly:

osx-tiger-ppc:~/magnus/mysql-5.1.23-beta-pb1577/mysql-test mysqldev$ ../storage/ndb/src/mgmclient/ndb_mgm --ndb-connectstring=host=localhost:10175 -e "show"
Connected to Management Server at: localhost:10175
[29 Oct 2007 16:54] Magnus Blåudd
Repeatable with 64-bit debug compile on osx-tiger-ppc

The ndb_mgmd starts and set up the listening socket. It does not seem to respond when you connect to it with ndb_mgm, but telnet works. See below.

osx-tiger-ppc:~/magnus/mysql-5.1.23-beta-pb1577/mysql-test mysqldev$ telnet localhost 10175
Connected to localhost.
Escape character is '^]'.
get version

id: 327959
major: 5
minor: 1
string: Version 5.1.23 (beta)
[29 Oct 2007 16:55] Magnus Blåudd
But telnet + "get status" hangs half way through.

get status

node status
nodes: 11
node.1.type: NDB
<< hangs here
[29 Oct 2007 20:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:


ChangeSet@1.2569, 2007-10-29 21:33:30+01:00, msvensson@pilot.mysql.com +1 -0
  Bug#30366 NDB fails to start on OS X, PPC, 64 bit
   - The errno variable should only be used when the previous socket
     write failed, it should be regarded as undefined at other times
[29 Oct 2007 20:36] Magnus Blåudd
The client was now hanging half way through the response. It would probably be better it the server closed the connection when a timeout has occured.
[29 Oct 2007 20:41] Magnus Blåudd
Something like this, but prefferably for all our users of SocketServer.

msvensson@pilot:~/mysql/my51-ndb-bug30366/storage/ndb/src/common$ bk -r diffs -u
===== storage/ndb/src/mgmsrv/Services.cpp 1.95 vs edited =====
--- 1.95/storage/ndb/src/mgmsrv/Services.cpp    2007-07-11 14:36:40 +02:00
+++ edited/storage/ndb/src/mgmsrv/Services.cpp  2007-10-29 21:40:11 +01:00
@@ -349,6 +349,10 @@ MgmApiSession::runSession()
     m_parser->run(ctx, *this);
+    if (m_output->timedout() ||
+        m_input->timedout())
+      m_stop= true;
     if(ctx.m_currentToken == 0)
[26 Nov 2007 17:54] Magnus Blåudd
Pushed to mysql-5.1-ndb
[4 Dec 2007 8:08] Mattias Jonsson
I can verify this on an intel macbook with Mac OS X 10.5.1 (uname -a: Darwin witty 9.1.0 Darwin Kernel Version 9.1.0: Wed Oct 31 17:46:22 PDT 2007; root:xnu-1228.0.2~1/RELEASE_I386 i386).

The patch works, now I finally start the full test suite on my new macbook!
[10 Dec 2007 23:24] Omer Barnir
Root Cause Analysis
The problem was a result of a change made back in March 22, 2007.
The result behavior is different on different platforms so the problem was observed only on OS-X
From a testing point of view, once packaged verification is in place, similar problems will be caught
[15 Jan 2008 14:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:


ChangeSet@1.2652, 2008-01-15 15:01:21+01:00, msvensson@pilot.mysql.com +1 -0
  Bug#30366 NDB fails to start on OS X, PPC, 64 bit
     - The errno variable should only be used when the previous socket
       write failed, it should be regarded as undefined at other times
    Only use "errno" after the attempt to write to the socket has failed
[16 Jan 2008 16:03] Magnus Blåudd
Pushed to mysql-5.1-release
[17 Jan 2008 22:34] Jon Stephens
Documented bugfix in 5.1.23 changelog.
[24 Jan 2008 11:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:


ChangeSet@1.2657, 2008-01-24 12:06:40+01:00, tomas@whalegate.ndb.mysql.com +1 -0
  Bug#30366 (recommit) NDB fails to start on OS X, PPC, 64 bit
  - The errno variable should only be used when the previous socket
    write failed, it should be regarded as undefined at other times
[7 Feb 2008 9:51] Magnus Blåudd
Pushed also to mysql-5.1-ndb, mysql-5.1-telco-6.2, mysql-5.1-telco-6.3 and mysql-5.1-telco-6.4
[20 Feb 2008 16:02] Bugs System
Pushed into 5.1.24-rc
[20 Feb 2008 16:02] Bugs System
Pushed into 6.0.5-alpha
[25 Feb 2008 15:58] Bugs System
Pushed into 5.1.24-rc
[25 Feb 2008 16:04] Bugs System
Pushed into 6.0.5-alpha
[30 Mar 2008 18:57] Jon Stephens
Fix also documented for 6.0.5.