| Bug #30366 | NDB fails to start on OS X, 64 bit | ||
|---|---|---|---|
| Submitted: | 10 Aug 2007 19:33 | Modified: | 17 Jan 2008 22:34 |
| Reporter: | Joerg Bruehe | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
| Version: | 5.1 | OS: | MacOS (64bit) |
| Assigned to: | Magnus Blåudd | CPU Architecture: | Any |
| Tags: | sr5_1 | ||
[13 Aug 2007 7:36]
Stewart Smith
Hi Joerg! Could you please: - check output of 'ndb_mgm -e "show"' when it "hangs" (pass -c for connectstring for test or set NDB_CONNECTSTRING env variable) - check (and attach) the cluster log as well as logs for mgm server and data nodes (basically *.log in the ndbcluster directory) This should help in tracking it down. I gather we don't have a host like this in pb running this sort of build regularly.... :(
[13 Aug 2007 8:01]
Joerg Bruehe
I will try to do as requested, but I have to repeat: This happens while automated builds and tests are running, so in general we have little chance for manual intervention and analysis. Currently, the "classic" build is nearly done - if "advanced" gets into this hang, I can try as requested; if not, the saved tree must be used to reproduce the bug.
[13 Sep 2007 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[22 Sep 2007 14:13]
Joerg Bruehe
Bug was reproduced in a 5.1.22-rc build, and shown to Cluster support (Stewart).
[29 Oct 2007 16:41]
Magnus Blåudd
The mgm client can't connect properly: osx-tiger-ppc:~/magnus/mysql-5.1.23-beta-pb1577/mysql-test mysqldev$ ../storage/ndb/src/mgmclient/ndb_mgm --ndb-connectstring=host=localhost:10175 -e "show" Connected to Management Server at: localhost:10175
[29 Oct 2007 16:54]
Magnus Blåudd
Repeatable with 64-bit debug compile on osx-tiger-ppc The ndb_mgmd starts and set up the listening socket. It does not seem to respond when you connect to it with ndb_mgm, but telnet works. See below. osx-tiger-ppc:~/magnus/mysql-5.1.23-beta-pb1577/mysql-test mysqldev$ telnet localhost 10175 Connected to localhost. Escape character is '^]'. get version version id: 327959 major: 5 minor: 1 string: Version 5.1.23 (beta)
[29 Oct 2007 16:55]
Magnus Blåudd
But telnet + "get status" hangs half way through. get status node status nodes: 11 node.1.type: NDB << hangs here
[29 Oct 2007 20:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/36601 ChangeSet@1.2569, 2007-10-29 21:33:30+01:00, msvensson@pilot.mysql.com +1 -0 Bug#30366 NDB fails to start on OS X, PPC, 64 bit - The errno variable should only be used when the previous socket write failed, it should be regarded as undefined at other times
[29 Oct 2007 20:36]
Magnus Blåudd
The client was now hanging half way through the response. It would probably be better it the server closed the connection when a timeout has occured.
[29 Oct 2007 20:41]
Magnus Blåudd
Something like this, but prefferably for all our users of SocketServer.
msvensson@pilot:~/mysql/my51-ndb-bug30366/storage/ndb/src/common$ bk -r diffs -u
===== storage/ndb/src/mgmsrv/Services.cpp 1.95 vs edited =====
--- 1.95/storage/ndb/src/mgmsrv/Services.cpp 2007-07-11 14:36:40 +02:00
+++ edited/storage/ndb/src/mgmsrv/Services.cpp 2007-10-29 21:40:11 +01:00
@@ -349,6 +349,10 @@ MgmApiSession::runSession()
m_parser->run(ctx, *this);
+ if (m_output->timedout() ||
+ m_input->timedout())
+ m_stop= true;
+
if(ctx.m_currentToken == 0)
{
NdbMutex_Unlock(m_mutex);
[26 Nov 2007 17:54]
Magnus Blåudd
Pushed to mysql-5.1-ndb
[4 Dec 2007 8:08]
Mattias Jonsson
I can verify this on an intel macbook with Mac OS X 10.5.1 (uname -a: Darwin witty 9.1.0 Darwin Kernel Version 9.1.0: Wed Oct 31 17:46:22 PDT 2007; root:xnu-1228.0.2~1/RELEASE_I386 i386). The patch works, now I finally start the full test suite on my new macbook!
[10 Dec 2007 23:24]
Omer Barnir
Root Cause Analysis ------------------- The problem was a result of a change made back in March 22, 2007. The result behavior is different on different platforms so the problem was observed only on OS-X From a testing point of view, once packaged verification is in place, similar problems will be caught
[15 Jan 2008 14:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/41012 ChangeSet@1.2652, 2008-01-15 15:01:21+01:00, msvensson@pilot.mysql.com +1 -0 Bug#30366 NDB fails to start on OS X, PPC, 64 bit - The errno variable should only be used when the previous socket write failed, it should be regarded as undefined at other times OutputStream.cpp: Only use "errno" after the attempt to write to the socket has failed
[16 Jan 2008 16:03]
Magnus Blåudd
Pushed to mysql-5.1-release
[17 Jan 2008 22:34]
Jon Stephens
Documented bugfix in 5.1.23 changelog.
[24 Jan 2008 11:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/41198 ChangeSet@1.2657, 2008-01-24 12:06:40+01:00, tomas@whalegate.ndb.mysql.com +1 -0 Bug#30366 (recommit) NDB fails to start on OS X, PPC, 64 bit - The errno variable should only be used when the previous socket write failed, it should be regarded as undefined at other times
[7 Feb 2008 9:51]
Magnus Blåudd
Pushed also to mysql-5.1-ndb, mysql-5.1-telco-6.2, mysql-5.1-telco-6.3 and mysql-5.1-telco-6.4
[20 Feb 2008 16:02]
Bugs System
Pushed into 5.1.24-rc
[20 Feb 2008 16:02]
Bugs System
Pushed into 6.0.5-alpha
[25 Feb 2008 15:58]
Bugs System
Pushed into 5.1.24-rc
[25 Feb 2008 16:04]
Bugs System
Pushed into 6.0.5-alpha
[30 Mar 2008 18:57]
Jon Stephens
Fix also documented for 6.0.5.

Description: I do not know how long-standing this issue is, definitely *not* new in 5.1.21 - but sadly, I could not find a bug report for it. (I have not searched for a working version in history, AFAIR 5.1.19 was failing, 5.1.20 definitely was.) The problem is specific to OS X ppc-64bit, has not been observed on any other platform. The symptom is a bit vague: When starting the test suite ("mysql-test-run.pl") *without* "--skip-ndbcluster", it hangs. This hang seems infinite, there is no progress until we notice it. The existing processes are "ndb_waiter" and "ndb_mgmd", killing both of them lets the test run finish (no tests attempted), and "make test-bt" (what we use to run the tests) proceeds to the next run. If that again involves NDB, the problem occurs again. The following is a log extract: Logging: ./mysql-test-run.pl --comment=ps+rowrepl+NDB --force --timer --ps-protocol --mysqld=--binlog-format=row 070809 23:50:47 [Warning] Setting lower_case_table_names=2 because file system for /Users/mysqldev/tmp-200708081852-5.1.21-beta-26112/osx-tiger-ppc-64bit/test/mysql-5.1.21-beta-osx10.4-powerpc-64bit/share/mysql/english/ is case insensitive MySQL Version 5.1.21 ############################################################################## # ps+rowrepl+NDB ############################################################################## Using binlog format 'row' Using ndbcluster when necessary, mysqld supports it Setting mysqld to support SSL connections Using MTR_BUILD_THREAD = 201 Using MASTER_MYPORT = 12010 Using MASTER_MYPORT1 = 12011 Using SLAVE_MYPORT = 12012 Using SLAVE_MYPORT1 = 12013 Using SLAVE_MYPORT2 = 12014 Using NDBCLUSTER_PORT = 12015 Using NDBCLUSTER_PORT_SLAVE = 12016 Using IM_PORT = 12017 Using IM_MYSQLD1_PORT = 12018 Using IM_MYSQLD2_PORT = 12019 Killing Possible Leftover Processes mysql-test-run: WARNING: Found non pid file master-slow.log in /Users/mysqldev/tmp-200708081852-5.1.21-beta-26112/osx-tiger-ppc-64bit/test/mysql-5.1.21-beta-osx10.4-powerpc-64bit/mysql-test/var/run Removing Stale Files Creating Directories Installing Master Database Installing Master Database Installing Slave1 Database Installing Master Cluster mysql-test-run: *** ERROR: Failed to wait for start of ndb_mgmd Autoreleasing /tmp/mysql-test-ports:201 make: [test-bt] Error 1 (ignored) From current "make test-bt", these runs are affected: ./mysql-test-run.pl --comment=ps+rowrepl+NDB --force --timer --ps-protocol --mysqld=--binlog-format=row ./mysql-test-run.pl --comment=NDB --force --timer --with-ndbcluster-only ./mysql-test-run.pl --force --comment=funcs1_ps --ps-protocol --suite=funcs_1 ./mysql-test-run.pl --force --comment=funcs2 --suite=funcs_2 ./mysql-test-run.pl --force --comment=partitions --suite=parts These runs are *not*: ./mysql-test-run.pl --comment=debug --force --timer --skip-ndbcluster --skip-rpl --report-features (that was a debug build) ./mysql-test-run.pl --comment=normal --force --timer --skip-ndbcluster --report-features ./mysql-test-run.pl --comment=ps --force --timer --skip-ndbcluster --ps-protocol ./mysql-test-run.pl --comment=normal+rowrepl --force --timer --skip-ndbcluster --mysqld=--binlog-format=row ./mysql-test-run.pl --comment=embedded --force --timer --embedded-server --skip-rpl --skip-ndbcluster ./mysql-test-run.pl --force --comment=rpl --suite=rpl ./mysql-test-run.pl --comment=NIST+normal --force --suite=nist ./mysql-test-run.pl --comment=NIST+ps --force --suite=nist --ps-protocol (I have not checked why "suite=rpl" and "suite=nist" worked, even without "--skip-ndbcluster".) I do *not* think it is a load problem from the parallel 32 + 64 bit build+test runs, because at least the second and following hangs happened when the 32 bit run had already finished. How to repeat: Run a build (including NDB) and test on that platform. I will save the current build tree here: mysqldev@osx-tiger-ppc:tmp-bug#####-5.1.21-beta-build (using the bug# when I have it).