| Bug #48232 | Crash in DBDICT (Line: 4115) | ||
|---|---|---|---|
| Submitted: | 22 Oct 16:13 | Modified: | 27 Oct 7:43 |
| Reporter: | Andy Lintner | ||
| Status: | Closed | ||
| Category: | Server: Cluster | Severity: | S3 (Non-critical) |
| Version: | mysql-5.1-telco-7.0 | OS: | Linux (RHEL 5.4) |
| Assigned to: | Jonas Oreland | Target Version: | |
| Tags: | 7.0.8a | ||
[22 Oct 16:13]
Andy Lintner
[22 Oct 16:13]
Andy Lintner
Trace files from the crash
Attachment: ndb_4_logs.tar.gz (application/x-gzip, text), 129.53 KiB.
[22 Oct 16:14]
Andy Lintner
config.ini
Attachment: config.ini (text/plain), 4.59 KiB.
[22 Oct 16:26]
Jonas Oreland
cluster log would also be good (note: i havent actually checked traces yet...but cluster log is always good to have around)
[22 Oct 16:38]
Andy Lintner
Cluster Log
Attachment: ndb_1_cluster.log (application/octet-stream, text), 110.74 KiB.
[22 Oct 18:14]
Jonas Oreland
The problem seems to be that the alive node has bigger SharedGlobalMemory than the starting node. My guess is that you 1) Started cluster with a value for SharedGlobalMemory 2) changed the value 3) restarted this node with a lower value Not entirely sure though, but pretty sure that setting that value restarting the "ndb_mgmd --reload" and then start the problematic node will make problem go away.
[22 Oct 19:42]
Andy Lintner
I restarted both management nodes, followed by the active node, and then the inactive node experienced the same fault. However, your comment on memory made me dig deeper, and I discovered an unrelated runaway process consuming memory on that server. There were only 2G available to the the node, instead of the normal 8G. Killing that process allowed the node to startup. However, the error message was obviously less than helpful. Since your diagnosis indicated a mismatching SharedGlobalMemory, is is there anything that would have dynamically resized SharedGlobalMemory down in response to insufficient available memory? Either way, my issue is resolved, so I moved this down to Non-critical since it seems to just be an issue of error reporting.
[26 Oct 15:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/88181 3163 Jonas Oreland 2009-10-26 ndb - bug#48232 - improve error reporting when failure to recreate/drop object during restore of schema
[26 Oct 15:42]
Jonas Oreland
Added informative error message Pushed to 7.0.9
[27 Oct 7:43]
Jon Stephens
Bugfix documented in the NDB-7.0.9 changelog as follows:
When a data node failed to start due to inability to recreate or
drop objects during schema restoration (for example:
insufficient memory was available to the data node process on
account of issues not directly related to MySQL Cluster on the
host machine), the reason for the failure was not provided. Now
is such cases, a more informative error message is logged.
Closed.
