Bug #46289 | Error 2341 after reboot and then initial when changing redo logfile size | ||
---|---|---|---|
Submitted: | 18 Jul 2009 19:36 | Modified: | 27 Sep 2009 20:04 |
Reporter: | Clint Alexander | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-5.1.34 ndb-7.0.6 | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | 2341, initial restart, Redo logfile |
[18 Jul 2009 19:36]
Clint Alexander
[18 Jul 2009 19:40]
Clint Alexander
There is a reference to 192.168.0.86 which should be 192.168.0.1 (for NDBMTD01) -- addresses were changed for clarity and ease of reading but I obviously typo'd on that line.
[30 Jul 2009 14:06]
Geert Vanderkelen
Clint, Just a correction in your description, you did actually a: ndb_mgmd> 2 restart -n right? Did you change something to the configuration around FragmentLogFileSize or the number of them? I was not succesful to repeat with the steps given. Cheers, Geert
[30 Jul 2009 14:24]
Geert Vanderkelen
I can't repeat with the following steps, using ndbd or ndbmtd: * Changing configuration, for example FragmentLogFileSize from 16M to 32M. * Reload ndb_mgmd configuration * ndb_mgm> 3 restart -n * ndb_mgm> 3 start * Failes because we need initial node restart with the above change * Start ndbd --initial * Node joines as normal. Was this repeatable or was this a single incident? Cheers, Geert
[31 Jul 2009 3:04]
Clint Alexander
There was a total of 2 restarts on the data node. I did -not- restart the node with -n or any other arguments; the first attempt was a normal 'restart' with no option, the second attempt I had to start on the node itself with --initial. I copied and pasted from the terminal so what I described is exactly what I had seen/typed (except for the IP addresses of course). Although I usually give a good walk-thru and some code to recreate the problem, this one happened once and I only had the time to create a bug report. I'm finishing up 5.1 CMCDBA certifications this week, so I will try to create an easy recreation script over the weekend and will post it here. Stay tuned... //Clint
[4 Aug 2009 9:17]
Attila Domján
Hi, Last night I run into this problem. We had a maintenance (software upgrade on network stuffs). I made a fresh backup, didn't stop the cluster, I wanted to examine the recover. It is failed. none of the 2 of ndbd node came online.... Time: Tuesday 4 August 2009 - 00:34:53 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbdict/Dbdict.cpp Error object: DBDICT (Line: 2837) 0x0000000a Program: /usr/sbin/ndbmtd Pid: 6401 Trace: /var/lib/mysql-cluster-trr/ndb_3_trace.log.4 /var/lib/mysql-cluster-trr/n db_3_trace.log.4_t1 /var/lib/mysql-cluster-trr/ndb_3_trace.log.4_t2 /var/lib/mys ql-c Time: Tuesday 4 August 2009 - 00:39:53 Status: Temporary error, restart node Message: Another node failed during system restart, please investigate error(s) on other node(s) (Restart error) Error: 2308 Error data: Node 4 disconnected Error object: QMGR (Line: 2910) 0x0000000a Program: /usr/sbin/ndbmtd Pid: 6546 Trace: /var/lib/mysql-cluster-trr/ndb_3_trace.log.5 /var/lib/mysql-cluster-trr/n db_3_trace.log.5_t1 /var/lib/mysql-cluster-trr/ndb_3_trace.log.5_t2 /var/lib/mys ql-cluster-trr/ndb_3_trac Time: Tuesday 4 August 2009 - 00:25:48 Status: Temporary error, restart node Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error) Error: 2305 Error data: Arbitrator decided to shutdown this node Error object: QMGR (Line: 5300) 0x0000000a Program: /usr/sbin/ndbmtd Pid: 4304 Trace: /var/lib/mysql-cluster-trr/ndb_4_trace.log.3 /var/lib/mysql-cluster-trr/ndb_4_trace.log.3_t1 /var/ lib/mys Time: Tuesday 4 August 2009 - 00:39:52 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error m essage, please report a bug) Error: 2341 Error data: dbdict/Dbdict.cpp Error object: DBDICT (Line: 2837) 0x0000000a Program: /usr/sbin/ndbmtd Pid: 13714 Trace: /var/lib/mysql-cluster-trr/ndb_4_trace.log.4 /var/lib/mysql-cluster-trr/ndb_4_trace.log.4_t1 /var/ lib/mysql-cluster-trr/ndb_4_trace.log.4_t2 /var/lib/mysql- so I had to recover from the backup.
[27 Aug 2009 19:24]
Matthew Bilek
Had the same problem today. Issued a SHUTDOWN and then restarted the management and data nodes. Time: Thursday 27 August 2009 - 10:52:09 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbdict/Dbdict.cpp Error object: DBDICT (Line: 2837) 0x0000000a Program: /usr/local/mysql/libexec/ndbmtd Pid: 4149 Trace: /usr/local/mysql/data/ndb_21_trace.log.12 /usr/local/mysql/data/ndb_21_trace.log.12_t1 /usr/local/mysql/data/ndb_21_trace.log.12_t2 /usr/loc
[27 Aug 2009 19:26]
Matthew Bilek
I also had to restart the data nodes with --initial and restore from a backup.
[27 Aug 2009 20:04]
Jonas Oreland
Hi, 1) There is not sufficient logs to know exactly what has failed. 2) We're releasing 7.0.7 today (?) (note:: source only, binaries takes a bit longer) that has fixed a few problems resembling this. In short I have high hopes that problem is actually fixed. Setting it need-feedback so you can test with 7.0.7 /Jonas
[27 Aug 2009 20:20]
Matthew Bilek
Jonas, Thanks you for the information. I will be looking for the new 7.0.7 release since we use the source distribution. I will build it tomorrow and see how it goes. Thanks, Matthew Bilek Webroot Software, Inc. mbilek@webroot.com
[27 Sep 2009 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".