Bug #70217 Internal program error (failed ndbrequire) caused a Data Node to Shutdown
Submitted: 2 Sep 2013 13:33 Modified: 15 Oct 2015 4:13
Reporter: Maycon Santos Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.2.3 OS:Linux (CentOS release 6.3 (Final))
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: DBTC, error 2341, internal program error, Internal program error (failed ndbrequire)

[2 Sep 2013 13:33] Maycon Santos
Description:
Hi,

Yesterday we notice that one of our Data Node were shutdown due to a Internal program error (failed ndbrequire). As I couldn't found any history online based on the logs information, I am doing as sugested and opening this bug report.

Logs:

ndb_4_out.log

2013-09-01 18:11:25 [ndbd] WARNING  -- Ndb kernel thread 27 is stuck in: Scanning Timers elapsed=100
2013-09-01 18:11:28 [ndbd] INFO     -- Watchdog: User time: 40830051  System time: 37161110
2013-09-01 18:11:28 [ndbd] WARNING  -- Ndb kernel thread 28 is stuck in: Scanning Timers elapsed=100
2013-09-01 18:11:28 [ndbd] INFO     -- Watchdog: User time: 40830051  System time: 37161110
2013-09-01 18:11:28 [ndbd] WARNING  -- Ndb kernel thread 26 is stuck in: Scanning Timers elapsed=100
2013-09-01 18:11:28 [ndbd] INFO     -- Watchdog: User time: 40830052  System time: 37161110
2013-09-01 18:11:28 [ndbd] WARNING  -- timerHandlingLab now: 881854613 sent: 881851843 diff: 2770
2013-09-01 18:11:28 [ndbd] INFO     -- Watchdog: User time: 40830083  System time: 37161117
2013-09-01 18:11:28 [ndbd] WARNING  -- Watchdog: Warning overslept 3339 ms, expected 100 ms.
2013-09-01 18:11:29 [ndbd] INFO     -- Internal program error (failed ndbrequire)
2013-09-01 18:11:29 [ndbd] INFO     -- DBTC (Line: 7783) 0x00000002
2013-09-01 18:11:29 [ndbd] INFO     -- Error handler shutting down system
2013-09-01 18:11:29 [ndbd] INFO     -- Error handler shutdown completed - exiting
2013-09-01 18:11:33 [ndbd] ALERT    -- Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

ndb_4_error.log

Time: Sunday 1 September 2013 - 18:11:28
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: 
Error object: DBTC (Line: 7783) 0x00000002
Program: ndbmtd
Pid: 8697 thr: 20
Version: mysql-5.6.11 ndb-7.3.2
Trace: /opt/data/mysqlcluster/7.3.2/data/ndb_4_trace.log.1 [t1..t26]

Server Hardware:
2 x Xeon(R) CPU E5-2670
128GB
Raid 10 (6x sata)

How to repeat:
N/A
[2 Sep 2013 13:35] Maycon Santos
ndb_error_report part 01

Attachment: ndb_error_report_20130902002655.part01.rar (application/octet-stream, text), 2.00 MiB.

[2 Sep 2013 13:36] Maycon Santos
ndb_error_report part 02

Attachment: ndb_error_report_20130902002655.part02.rar (application/octet-stream, text), 1.57 MiB.

[3 Sep 2013 13:56] Ben Reeves
Experiencing the same error with version 7.3.2. One node will crash every few hours with:

2013-09-03 14:02:36 [ndbd] WARNING  -- Ndb kernel thread 2 is stuck in: Performing Send elapsed=100
2013-09-03 14:02:37 [ndbd] INFO     -- Watchdog: User time: 2089522  System time: 535344
2013-09-03 14:02:37 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck in: Performing Send elapsed=100
2013-09-03 14:02:37 [ndbd] INFO     -- Watchdog: User time: 2089522  System time: 535344
2013-09-03 14:02:37 [ndbd] WARNING  -- Ndb kernel thread 4 is stuck in: Performing Send elapsed=100
2013-09-03 14:02:37 [ndbd] INFO     -- Watchdog: User time: 2089522  System time: 535344
2013-09-03 14:02:37 [ndbd] WARNING  -- Ndb kernel thread 5 is stuck in: Performing Send elapsed=100
2013-09-03 14:02:37 [ndbd] INFO     -- Watchdog: User time: 2089522  System time: 535344
2013-09-03 14:02:37 [ndbd] WARNING  -- Ndb kernel thread 7 is stuck in: Performing Send elapsed=100
2013-09-03 14:02:37 [ndbd] INFO     -- Watchdog: User time: 2089522  System time: 535344
2013-09-03 14:02:37 [ndbd] WARNING  -- Ndb kernel thread 13 is stuck in: Scanning Timers elapsed=100
2013-09-03 14:02:37 [ndbd] INFO     -- Watchdog: User time: 2089522  System time: 535344
2013-09-03 14:02:37 [ndbd] WARNING  -- Ndb kernel thread 14 is stuck in: Scanning Timers elapsed=100
2013-09-03 14:02:37 [ndbd] INFO     -- Watchdog: User time: 2089522  System time: 535344
2013-09-03 14:02:37 [ndbd] INFO     -- Watchdog: User time: 2089527  System time: 535346
2013-09-03 14:02:37 [ndbd] WARNING  -- Watchdog: Warning overslept 1339 ms, expected 100 ms.
2013-09-03 14:02:38 [ndbd] INFO     -- Internal program error (failed ndbrequire)
2013-09-03 14:02:38 [ndbd] INFO     -- DBTC (Line: 7783) 0x00000002
2013-09-03 14:02:38 [ndbd] INFO     -- Error handler shutting down system
2013-09-03 14:02:38 [ndbd] INFO     -- Error handler shutdown completed - exiting
2013-09-03 14:02:39 [ndbd] ALERT    -- Node 6: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
[16 Dec 2013 11:09] ARNAUD DELANOUE
Same issue with 7.3.3:

Time: Monday 16 December 2013 - 01:15:26
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data:
Error object: DBTC (Line: 7791) 0x00000002
Program: ndbmtd
Pid: 8208 thr: 7
Version: mysql-5.6.14 ndb-7.3.3
Trace: /data/mysqlcluster/ndb_4_trace.log.5 [t1..t9]

Any update? Do you need the trace files?
[25 Jan 2014 11:39] Maycon Santos
After many months without this bug, it happened again last night with the same node, I am sending the trace files next.
[25 Jan 2014 11:40] Maycon Santos
Trace files

Attachment: ndb_4_trace.log.1.tgz (application/octet-stream, text), 1.09 MiB.

[7 Apr 2014 2:05] Sergei Turchanov
This bug also happens on mysql-5.6.15 ndb-7.3.4. The system was realy idle at the moment when crash happened: neither inserts/updates nor selects were issued to cluster.

Last lines from  logs:
2014-04-05 18:24:57 [ndbd] WARNING  -- Ndb kernel thread 6 is stuck in: Job Handling elapsed=2908
2014-04-05 18:24:57 [ndbd] INFO     -- Watchdog: User time: 30495303  System time: 4114011
2014-04-05 18:24:57 [ndbd] WARNING  -- Ndb kernel thread 7 is stuck in: Job Handling elapsed=2908
2014-04-05 18:24:57 [ndbd] INFO     -- Watchdog: User time: 30495303  System time: 4114011
Warning: 7 thread(s) did not stop before starting crash dump.
Warning: 7 thread(s) did not stop before starting crash dump.
2014-04-05 18:24:57 [ndbd] INFO     -- Internal program error (failed ndbrequire)
2014-04-05 18:24:57 [ndbd] INFO     -- DBTC (Line: 7839) 0x00000002
2014-04-05 18:24:57 [ndbd] INFO     -- Watchdog shutting down system
2014-04-05 18:24:57 [ndbd] INFO     -- Watchdog shutdown completed - exiting
2014-04-05 18:25:00 [ndbd] ALERT    -- Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
[7 Apr 2014 2:08] Sergei Turchanov
Trace files

Attachment: ndb_4_trace.log.tgz (application/x-compressed-tar, text), 309.39 KiB.

[2 Jul 2014 7:37] Ivan Grljusic
Does anyone have any update about this issue?
I have the similar problem during backup procedure in mysql
[2 Jul 2015 14:43] Donghua Luo
I encountered simpler issue on mysql-5.6.24 with ndb-7.4.6. And I found it was caused by some files inside "datadir" not owned by "mysql" any more, it was changed to "root".
After change permission back to "mysql", this problem resolved. 

[mysql@localhost server2-ndb]$ find . -exec ls -l {} \; |grep root
-rw-r--r--. 1 root  root   1244 Jul  1 15:11 T52F0.Data
-rw-r--r--. 1 root  root   1244 Jul  1 15:11 T52F1.Data
-rw-r--r--. 1 root root 1244 Jul  1 15:11 ./ndb_data/ndb_11_fs/LCP/1/T52F1.Data
-rw-r--r--. 1 root root 1244 Jul  1 15:11 ./ndb_data/ndb_11_fs/LCP/1/T52F0.Data
[14 Oct 2015 22:17] MySQL Verification Team
Fixed in 7.2.17 and 7.3.6 changelogs:

When running with a very slow main thread, and one or more transaction coordinator threads, on different CPUs, it was possible to encounter a timeout when sending a DIH_SCAN_GET_NODESREQ signal, which could lead to a crash of the data node. Now in such cases the timeout is avoided.