Bug #48232 | Crash in DBDICT (Line: 4115) | ||
---|---|---|---|
Submitted: | 22 Oct 2009 14:13 | Modified: | 27 Oct 2009 6:43 |
Reporter: | Andy Lintner | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-5.1-telco-7.0 | OS: | Linux (RHEL 5.4) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
Tags: | 7.0.8a |
[22 Oct 2009 14:13]
Andy Lintner
[22 Oct 2009 14:13]
Andy Lintner
Trace files from the crash
Attachment: ndb_4_logs.tar.gz (application/x-gzip, text), 129.53 KiB.
[22 Oct 2009 14:14]
Andy Lintner
config.ini
Attachment: config.ini (text/plain), 4.59 KiB.
[22 Oct 2009 14:26]
Jonas Oreland
cluster log would also be good (note: i havent actually checked traces yet...but cluster log is always good to have around)
[22 Oct 2009 14:38]
Andy Lintner
Cluster Log
Attachment: ndb_1_cluster.log (application/octet-stream, text), 110.74 KiB.
[22 Oct 2009 16:14]
Jonas Oreland
The problem seems to be that the alive node has bigger SharedGlobalMemory than the starting node. My guess is that you 1) Started cluster with a value for SharedGlobalMemory 2) changed the value 3) restarted this node with a lower value Not entirely sure though, but pretty sure that setting that value restarting the "ndb_mgmd --reload" and then start the problematic node will make problem go away.
[22 Oct 2009 17:42]
Andy Lintner
I restarted both management nodes, followed by the active node, and then the inactive node experienced the same fault. However, your comment on memory made me dig deeper, and I discovered an unrelated runaway process consuming memory on that server. There were only 2G available to the the node, instead of the normal 8G. Killing that process allowed the node to startup. However, the error message was obviously less than helpful. Since your diagnosis indicated a mismatching SharedGlobalMemory, is is there anything that would have dynamically resized SharedGlobalMemory down in response to insufficient available memory? Either way, my issue is resolved, so I moved this down to Non-critical since it seems to just be an issue of error reporting.
[26 Oct 2009 14:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/88181 3163 Jonas Oreland 2009-10-26 ndb - bug#48232 - improve error reporting when failure to recreate/drop object during restore of schema
[26 Oct 2009 14:42]
Jonas Oreland
Added informative error message Pushed to 7.0.9
[27 Oct 2009 6:43]
Jon Stephens
Bugfix documented in the NDB-7.0.9 changelog as follows: When a data node failed to start due to inability to recreate or drop objects during schema restoration (for example: insufficient memory was available to the data node process on account of issues not directly related to MySQL Cluster on the host machine), the reason for the failure was not provided. Now is such cases, a more informative error message is logged. Closed.