Bug #44806 | 2009-05-11 17:27:19 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Hand | ||
---|---|---|---|
Submitted: | 11 May 2009 22:48 | Modified: | 25 May 2009 13:33 |
Reporter: | James Eastman | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | mysql-5.1-telco-7.0 | OS: | Linux (Linux ast4.nts.BLAH.com 2.6.9-55.ELsmp #1 SMP Fri Apr 20 17:03:35 EDT 2007 i686 i686 i386 GNU/Linux) |
Assigned to: | CPU Architecture: | Any | |
Tags: | 7.0.5-0.rhel4.i386, MySQL 7.0 Cluster storage node, ndb, NDB startup |
[11 May 2009 22:48]
James Eastman
[12 May 2009 5:13]
James Eastman
I attempted a startup of another ndb storage node and now I have some interesting files to attach. So, I'm attaching those files. also the ndb_mgm node says -> ndb_mgm> show Cluster Configuration --------------------- [ndbd(NDB)] 3 node(s) id=2 (not connected, accepting connect from 146.18.237.115) id=3 (not connected, accepting connect from 146.18.237.49) id=4 @146.18.237.104 (mysql-5.1.32 ndb-7.0.5, starting, Nodegroup: 0) [ndb_mgmd(MGM)] 1 node(s) id=1 @146.18.237.48 (mysql-5.1.32 ndb-7.0.5) [mysqld(API)] 3 node(s) id=5 (not connected, accepting connect from 146.18.237.48) id=6 (not connected, accepting connect from 146.18.237.59) id=7 (not connected, accepting connect from 146.18.237.103) ndb_mgm> Node 4: Forced node shutdown completed. Occured during startphase 4. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
[12 May 2009 5:20]
James Eastman
the error log
Attachment: ndb_4_error.log (application/octet-stream, text), 2.02 KiB.
[12 May 2009 5:20]
James Eastman
the out.log
Attachment: ndb_4_out.log (application/octet-stream, text), 15.04 KiB.
[12 May 2009 5:22]
James Eastman
Can't brign cluster back up. Web apps not working.
[12 May 2009 5:29]
Jonas Oreland
6.3 to 7.0 upgrade is not online(bug#44294) / filesystem. you must backup/restore your 6.3 cluster, start a 7.0 initial and restore...
[12 May 2009 14:25]
James Eastman
Uh oh .... since I've already done the upgrade and done an initial start on one of my ndb nodes is there any way to dump that data and/or revert my upgrade to the previous version and then dump my data .... please, please, please :-D
[12 May 2009 14:42]
Jonas Oreland
Hi, If you did "--initial" on node=2, you can start the cluster partially (i.e wo/ node 2) by starting node 3 and 4 like ndbd --ndb-nodeid=3 --nowait-nodes=2 ndbd --ndb-nodeid=4 --nowait-nodes=2 So do this *with 6.3 binaries*, and then take a backup. --- But I would first take an file-system backup of the ndb_X_fs directories that you have...in case something goes wrong...
[12 May 2009 14:45]
James Eastman
That idea sounds nice ..... Now, I've also already upgraded the mgmt node(s) and the mysql api nodes. Do I need to down grade them as well *OR* can I leave them as mysql 7?
[12 May 2009 16:51]
James Eastman
I've followed your instructions and downgraded box 1 to have the admin management server and MySQL server at ver 6.3.2 *AND* I've downgraded the *ONE* ndb node I'm bringing up to do this backup to ver 6.3.2. I've started the management server *nad* then a ndb storage engine and in ndb_mgm client I see -> ndb_mgm> show Cluster Configuration --------------------- [ndbd(NDB)] 3 node(s) id=2 (not connected, accepting connect from 146.18.237.115) id=3 (not connected, accepting connect from 146.18.237.49) id=4 @146.18.237.104 (mysql-5.1.30 ndb-6.3.20, starting, Nodegroup: 0, Master) [ndb_mgmd(MGM)] 1 node(s) id=1 @146.18.237.48 (mysql-5.1.30 ndb-6.3.20) [mysqld(API)] 3 node(s) id=5 (not connected, accepting connect from 146.18.237.48) id=6 (not connected, accepting connect from 146.18.237.59) id=7 (not connected, accepting connect from 146.18.237.103) ndb_mgm> On the ndb node I see lots of schtuff in the logs. I'll attach that log as a file. The thing is, that NDB node seems to be hung in any case. What gives? Thx. James
[12 May 2009 16:52]
James Eastman
Latest ndb node log file
Attachment: ndb_4_out.log (application/octet-stream, text), 16.56 KiB.
[13 May 2009 16:53]
James Eastman
I've finally been able to downgrade my NDB nodes to 6.3, as well as my managemetn node. I've done the saem to my MySQL Client/API nodes. There is one problem, however. When I start *ONLY ONE* MySQL client / API node everything is fine. I can see my databases and get to them from an external client / PHP. However, when I star a second API node all of the client access falls apart. I've noticed that on the mysql client nodes there are directories in /var/lib/mysql that correspond to my DB names. Is this supposed to be the case. In any case I'll now attach the /var/lib/mysql/ast1.nts.BLAH.com.err fiel so you can see what I'm seeing. What gives?
[13 May 2009 16:59]
James Eastman
mysql API node error file
Attachment: ast1.nts.BLAH.com.err (application/octet-stream, text), 14.81 KiB.
[13 May 2009 17:01]
James Eastman
Also, here's what ndb_mgm showed at the time of the MySQL API node errors -> ndb_mgm> show Cluster Configuration --------------------- [ndbd(NDB)] 3 node(s) id=2 @146.18.237.115 (mysql-5.1.30 ndb-6.3.20, Nodegroup: 0) id=3 @146.18.237.49 (mysql-5.1.30 ndb-6.3.20, Nodegroup: 0) id=4 @146.18.237.104 (mysql-5.1.30 ndb-6.3.20, Nodegroup: 0, Master) [ndb_mgmd(MGM)] 1 node(s) id=1 @146.18.237.48 (mysql-5.1.30 ndb-6.3.20) [mysqld(API)] 15 node(s) id=5 (not connected, accepting connect from 146.18.237.48) id=6 (not connected, accepting connect from 146.18.237.48) id=7 (not connected, accepting connect from 146.18.237.48) id=8 (not connected, accepting connect from 146.18.237.48) id=9 (not connected, accepting connect from 146.18.237.48) id=10 (not connected, accepting connect from 146.18.237.59) id=11 (not connected, accepting connect from 146.18.237.59) id=12 (not connected, accepting connect from 146.18.237.59) id=13 (not connected, accepting connect from 146.18.237.59) id=14 (not connected, accepting connect from 146.18.237.59) id=15 (not connected, accepting connect from 146.18.237.103) id=16 (not connected, accepting connect from 146.18.237.103) id=17 (not connected, accepting connect from 146.18.237.103) id=18 (not connected, accepting connect from 146.18.237.103) id=19 (not connected, accepting connect from 146.18.237.103) ndb_mgm>
[13 May 2009 18:46]
Jonas Oreland
Hi again I didnt notice before, but using 3 replicas is not supported in 6.3 (but is fixed in 7.0, see http://bugs.mysql.com/bug.php?id=18621) It sounds like a miracle that you didn't get any problems with this before. So now I urge you to take a backup (using START BACKUP) and restore it on either a clean cluster. 6.3 or 7.0 And I also strongly recommend to instead use NoOfReplicas=2 (and hence 2 or 4 data nodes instead of 3) given that even if we dont't have any know bugs, our test-coverage of 3-replica is quite low. Read the manual carefully on how to take backup and restore it.
[13 May 2009 19:37]
James Eastman
ok, Ok .... I'll go to two API nodes and two NDB data nodes. In any case, can I just use mysqldump to get the result I seek?
[25 May 2009 13:33]
Jonathan Miller
Please do upgrade by on-line backup, using --initial and then restore or wait for 7.0.6 as the upgrade issues is fixed in that version
[13 Mar 2014 13:37]
Omer Barnir
This bug is not scheduled to be fixed at this time.