Bug #37985 | Forced node shutdown completed. Initiated by signal 11 (Caused by error 2308) | ||
---|---|---|---|
Submitted: | 9 Jul 2008 9:52 | Modified: | 17 Nov 2008 16:35 |
Reporter: | Hans-Christian Andersen | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | Mysql-cluster GPL 6.2.15 | OS: | Linux (Debian Etch 4.0 r3 AMD64) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | cluster, shutdown, Signal 11 |
[9 Jul 2008 9:52]
Hans-Christian Andersen
[9 Jul 2008 9:59]
Hans-Christian Andersen
ndb_2_error.log
Attachment: ndb_2_error.log (application/octet-stream, text), 1.67 KiB.
[9 Jul 2008 9:59]
Hans-Christian Andersen
ndb_2_out.log
Attachment: ndb_2_out.log (application/octet-stream, text), 157.98 KiB.
[9 Jul 2008 10:04]
Hans-Christian Andersen
ndb_2_trace.log.12
Attachment: ndb_2_trace.log.12.gz (application/x-gzip, text), 29.87 KiB.
[9 Jul 2008 10:04]
Hans-Christian Andersen
ndb_3_error.log
Attachment: ndb_3_error.log (text/x-log), 1.18 KiB.
[9 Jul 2008 10:05]
Hans-Christian Andersen
ndb_3_out.log
Attachment: ndb_3_out.log (text/x-log), 111.27 KiB.
[9 Jul 2008 10:05]
Hans-Christian Andersen
ndb_3_trace.log.9
Attachment: ndb_3_trace.log.9.gz (application/x-gzip, text), 29.98 KiB.
[17 Oct 2008 16:35]
Frazer Clement
Hi, Thanks for the comprehensive bug report, but I think you need to send more infomration. The trace files sent are for node shutdowns due to 'some other node failed during startup'. I would expect that when some node suffers from signal 11 (SEGV), we would get an entry in its error log, and an associated trace file. If this is not happening for some reason then perhaps the ndbd process can be run with the --core-file option, and a core file can be obtained? Getting a stack traceback from the core file would be useful for understanding what is going on. Looking at the log files you sent, there seem to be a lot of shutdown and restarts and file corruption is reported by both nodes at various points. Can you give a fuller description of the actions which have been recorded in the log files? Have you managed to get past this bug? Have you been able to reproduce it? Any further feedback appreciated. Thanks, Frazer Interesting entries in ndb_3_out.log: 2008-07-07 11:14:48 [ndbd] INFO -- Error 0 during restore of 2/T6F0 2008-07-07 11:14:48 [ndbd] INFO -- RESTORE (Line: 1173) 0x0000000a 2008-07-07 11:14:48 [ndbd] INFO -- Error handler startup shutting down system 2008-07-07 11:14:48 [ndbd] INFO -- Error handler shutdown completed - exiting 2008-07-07 11:14:48 [ndbd] INFO -- Angel received ndbd startup failure count 1. 2008-07-07 11:14:48 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Occured during startphase 4. Caused by error 2352: 'Invalid LCP(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'. ... 2008-07-07 15:47:15 [ndbd] INFO -- DBLQH: File system open failed. OS errno: 2 2008-07-07 15:47:15 [ndbd] INFO -- DBLQH (Line: 1861) 0x0000000a 2008-07-07 15:47:15 [ndbd] INFO -- Error handler startup shutting down system 2008-07-07 15:47:15 [ndbd] INFO -- Error handler shutdown completed - exiting 2008-07-07 15:47:15 [ndbd] INFO -- Angel received ndbd startup failure count 1. 2008-07-07 15:47:15 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Occured during startphase 4. Caused by error 2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'. Then multiple startup failures like this : RESTORE table: 250 21195 rows applied 2008-07-08 11:37:09 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Initiated by signal 11. In ndb_2_out.log I see : 2008-07-04 00:42:57 [ndbd] INFO -- Error opening DIH schema files for table: 10 2008-07-04 00:42:57 [ndbd] INFO -- DBDIH (Line: 9509) 0x0000000a 2008-07-04 00:42:57 [ndbd] INFO -- Error handler startup shutting down system 2008-07-04 00:42:58 [ndbd] INFO -- Error handler shutdown completed - exiting 2008-07-04 00:42:58 [ndbd] INFO -- Angel received ndbd startup failure count 1. 2008-07-04 00:42:58 [ndbd] ALERT -- Node 2: Forced node shutdown completed. Occured during startphase 4. Caused by error 2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'. ... 2008-07-04 00:44:24 [ndbd] INFO -- Error opening DIH schema files for table: 10 2008-07-04 00:44:24 [ndbd] INFO -- DBDIH (Line: 9509) 0x0000000a 2008-07-04 00:44:24 [ndbd] INFO -- Error handler startup shutting down system 2008-07-04 00:44:25 [ndbd] INFO -- Error handler shutdown completed - exiting 2008-07-04 00:44:25 [ndbd] INFO -- Angel received ndbd startup failure count 1. 2008-07-04 00:44:25 [ndbd] ALERT -- Node 2: Forced node shutdown completed. Occured during startphase 4. Caused by error 2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'. ... 2008-07-04 10:37:32 [ndbd] INFO -- /home/backups 2008-07-04 10:37:32 [ndbd] INFO -- BackupDataDir 2008-07-04 10:37:32 [ndbd] INFO -- Error handler shutting down system 2008-07-04 10:37:32 [ndbd] INFO -- Error handler shutdown completed - exiting 2008-07-04 10:37:32 [ndbd] ALERT -- Node 2: Forced node shutdown completed. Occured during startphase 0. Caused by error 2805: 'Illegal file system path(Configuration error). Permanent error, external action needed'. ... 2008-07-07 17:41:15 [ndbd] INFO -- DBLQH: File system open failed. OS errno: 2 2008-07-07 17:41:15 [ndbd] INFO -- DBLQH (Line: 1861) 0x0000000a 2008-07-07 17:41:15 [ndbd] INFO -- Error handler startup shutting down system 2008-07-07 17:41:15 [ndbd] INFO -- Error handler shutdown completed - exiting 2008-07-07 17:41:15 [ndbd] INFO -- Angel received ndbd startup failure count 1. 2008-07-07 17:41:15 [ndbd] ALERT -- Node 2: Forced node shutdown completed. Occured during startphase 4. Caused by error 2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.
[18 Nov 2008 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[29 Dec 2009 13:14]
Oli Sennhauser
Can reproduce this at will on ndbsup-1 with 7.0.11 on a single node cluster.
[29 Dec 2009 13:48]
Oli Sennhauser
Occured during startphase 4. Caused by error 2352: 'Invalid LCP(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'. Could be related to index memory full.