Bug #70217 | Internal program error (failed ndbrequire) caused a Data Node to Shutdown | ||
---|---|---|---|
Submitted: | 2 Sep 2013 13:33 | Modified: | 15 Oct 2015 4:13 |
Reporter: | Maycon Santos | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 7.2.3 | OS: | Linux (CentOS release 6.3 (Final)) |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
Tags: | DBTC, error 2341, internal program error, Internal program error (failed ndbrequire) |
[2 Sep 2013 13:33]
Maycon Santos
[2 Sep 2013 13:35]
Maycon Santos
ndb_error_report part 01
Attachment: ndb_error_report_20130902002655.part01.rar (application/octet-stream, text), 2.00 MiB.
[2 Sep 2013 13:36]
Maycon Santos
ndb_error_report part 02
Attachment: ndb_error_report_20130902002655.part02.rar (application/octet-stream, text), 1.57 MiB.
[3 Sep 2013 13:56]
Ben Reeves
Experiencing the same error with version 7.3.2. One node will crash every few hours with: 2013-09-03 14:02:36 [ndbd] WARNING -- Ndb kernel thread 2 is stuck in: Performing Send elapsed=100 2013-09-03 14:02:37 [ndbd] INFO -- Watchdog: User time: 2089522 System time: 535344 2013-09-03 14:02:37 [ndbd] WARNING -- Ndb kernel thread 3 is stuck in: Performing Send elapsed=100 2013-09-03 14:02:37 [ndbd] INFO -- Watchdog: User time: 2089522 System time: 535344 2013-09-03 14:02:37 [ndbd] WARNING -- Ndb kernel thread 4 is stuck in: Performing Send elapsed=100 2013-09-03 14:02:37 [ndbd] INFO -- Watchdog: User time: 2089522 System time: 535344 2013-09-03 14:02:37 [ndbd] WARNING -- Ndb kernel thread 5 is stuck in: Performing Send elapsed=100 2013-09-03 14:02:37 [ndbd] INFO -- Watchdog: User time: 2089522 System time: 535344 2013-09-03 14:02:37 [ndbd] WARNING -- Ndb kernel thread 7 is stuck in: Performing Send elapsed=100 2013-09-03 14:02:37 [ndbd] INFO -- Watchdog: User time: 2089522 System time: 535344 2013-09-03 14:02:37 [ndbd] WARNING -- Ndb kernel thread 13 is stuck in: Scanning Timers elapsed=100 2013-09-03 14:02:37 [ndbd] INFO -- Watchdog: User time: 2089522 System time: 535344 2013-09-03 14:02:37 [ndbd] WARNING -- Ndb kernel thread 14 is stuck in: Scanning Timers elapsed=100 2013-09-03 14:02:37 [ndbd] INFO -- Watchdog: User time: 2089522 System time: 535344 2013-09-03 14:02:37 [ndbd] INFO -- Watchdog: User time: 2089527 System time: 535346 2013-09-03 14:02:37 [ndbd] WARNING -- Watchdog: Warning overslept 1339 ms, expected 100 ms. 2013-09-03 14:02:38 [ndbd] INFO -- Internal program error (failed ndbrequire) 2013-09-03 14:02:38 [ndbd] INFO -- DBTC (Line: 7783) 0x00000002 2013-09-03 14:02:38 [ndbd] INFO -- Error handler shutting down system 2013-09-03 14:02:38 [ndbd] INFO -- Error handler shutdown completed - exiting 2013-09-03 14:02:39 [ndbd] ALERT -- Node 6: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
[16 Dec 2013 11:09]
ARNAUD DELANOUE
Same issue with 7.3.3: Time: Monday 16 December 2013 - 01:15:26 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: Error object: DBTC (Line: 7791) 0x00000002 Program: ndbmtd Pid: 8208 thr: 7 Version: mysql-5.6.14 ndb-7.3.3 Trace: /data/mysqlcluster/ndb_4_trace.log.5 [t1..t9] Any update? Do you need the trace files?
[25 Jan 2014 11:39]
Maycon Santos
After many months without this bug, it happened again last night with the same node, I am sending the trace files next.
[25 Jan 2014 11:40]
Maycon Santos
Trace files
Attachment: ndb_4_trace.log.1.tgz (application/octet-stream, text), 1.09 MiB.
[7 Apr 2014 2:05]
Sergei Turchanov
This bug also happens on mysql-5.6.15 ndb-7.3.4. The system was realy idle at the moment when crash happened: neither inserts/updates nor selects were issued to cluster. Last lines from logs: 2014-04-05 18:24:57 [ndbd] WARNING -- Ndb kernel thread 6 is stuck in: Job Handling elapsed=2908 2014-04-05 18:24:57 [ndbd] INFO -- Watchdog: User time: 30495303 System time: 4114011 2014-04-05 18:24:57 [ndbd] WARNING -- Ndb kernel thread 7 is stuck in: Job Handling elapsed=2908 2014-04-05 18:24:57 [ndbd] INFO -- Watchdog: User time: 30495303 System time: 4114011 Warning: 7 thread(s) did not stop before starting crash dump. Warning: 7 thread(s) did not stop before starting crash dump. 2014-04-05 18:24:57 [ndbd] INFO -- Internal program error (failed ndbrequire) 2014-04-05 18:24:57 [ndbd] INFO -- DBTC (Line: 7839) 0x00000002 2014-04-05 18:24:57 [ndbd] INFO -- Watchdog shutting down system 2014-04-05 18:24:57 [ndbd] INFO -- Watchdog shutdown completed - exiting 2014-04-05 18:25:00 [ndbd] ALERT -- Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
[7 Apr 2014 2:08]
Sergei Turchanov
Trace files
Attachment: ndb_4_trace.log.tgz (application/x-compressed-tar, text), 309.39 KiB.
[2 Jul 2014 7:37]
Ivan Grljusic
Does anyone have any update about this issue? I have the similar problem during backup procedure in mysql
[2 Jul 2015 14:43]
Donghua Luo
I encountered simpler issue on mysql-5.6.24 with ndb-7.4.6. And I found it was caused by some files inside "datadir" not owned by "mysql" any more, it was changed to "root". After change permission back to "mysql", this problem resolved. [mysql@localhost server2-ndb]$ find . -exec ls -l {} \; |grep root -rw-r--r--. 1 root root 1244 Jul 1 15:11 T52F0.Data -rw-r--r--. 1 root root 1244 Jul 1 15:11 T52F1.Data -rw-r--r--. 1 root root 1244 Jul 1 15:11 ./ndb_data/ndb_11_fs/LCP/1/T52F1.Data -rw-r--r--. 1 root root 1244 Jul 1 15:11 ./ndb_data/ndb_11_fs/LCP/1/T52F0.Data
[14 Oct 2015 22:17]
MySQL Verification Team
Fixed in 7.2.17 and 7.3.6 changelogs: When running with a very slow main thread, and one or more transaction coordinator threads, on different CPUs, it was possible to encounter a timeout when sending a DIH_SCAN_GET_NODESREQ signal, which could lead to a crash of the data node. Now in such cases the timeout is avoided.