Bug #66840 | Reapetable ndb nodes crashes with error 2341 | ||
---|---|---|---|
Submitted: | 17 Sep 2012 4:47 | Modified: | 14 Jul 2016 8:57 |
Reporter: | vladysla chrn | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 7.2.6 | OS: | Linux (Red Hat Enterprise Linux Server release 6.1 ) |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
Tags: | 1309, DBSPJ, failed ndbrequire, ndbmtd, SimulatedBlock.cpp |
[17 Sep 2012 4:47]
vladysla chrn
[17 Sep 2012 4:57]
vladysla chrn
Ndb error report for this issue
Attachment: ndb_error_report_20120916234857.tar.bz2 (application/octet-stream, text), 300.74 KiB.
[13 Nov 2012 15:42]
Russell Knighton
I think I have now encountered this error twice. Here are the relevant snips from the error log. -- First instance -- Time: Tuesday 4 September 2012 - 16:31:08 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 99090 thr: 19 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.5 [t1..t29] -- latest instance -- Time: Tuesday 13 November 2012 - 14:38:56 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 122615 thr: 22 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.8 [t1..t29] I will attach the log files if they will be of any use - but they will be quite large of course.
[15 Nov 2012 10:41]
Russell Knighton
Okay, this has suddenly become a major issue and will now prevent us going live. It has happened again: Time: Wednesday 14 November 2012 - 19:41:54 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 27148 thr: 24 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.9 [t1..t29] ***EOM*** Has anyone looked into this Would whoever is investigating this bug like all/any of the log files to help pin-point the problem?
[15 Nov 2012 13:11]
Russell Knighton
And now it appears I may not be able to restart the node. It's happened again, this time in multiple threads: Time: Thursday 15 November 2012 - 12:49:16 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 58906 thr: 23 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.10 [t1..t29] ***EOM*** Time: Thursday 15 November 2012 - 12:49:16 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 58906 thr: 18 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.10 [t1..t29] ***EOM*** Time: Thursday 15 November 2012 - 12:49:16 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 58906 thr: 22 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.10 [t1..t29] ***EOM*** Time: Thursday 15 November 2012 - 12:49:16 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 58906 thr: 24 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.10 [t1..t29] ***EOM*** Could someone please give some pointers where I should be looking to diagnose the cause of this?
[15 Nov 2012 14:18]
Russell Knighton
And again... My suspicions are correct. I am now unable to restart my cluster node: Time: Thursday 15 November 2012 - 14:16:29 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 123993 thr: 18 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.11 [t1..t29] ***EOM*** Time: Thursday 15 November 2012 - 14:16:29 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 123993 thr: 19 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.11 [t1..t29] ***EOM*** Time: Thursday 15 November 2012 - 14:16:29 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 123993 thr: 25 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.11 [t1..t29] ***EOM*** Time: Thursday 15 November 2012 - 14:16:29 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 123993 thr: 20 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.11 [t1..t29] ***EOM***
[16 Nov 2012 9:42]
Russell Knighton
And Again. This is Node1: ======================================================================= Time: Friday 16 November 2012 - 06:02:10 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 21826 thr: 22 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.12 [t1..t29] ***EOM*** Time: Friday 16 November 2012 - 06:02:10 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 21826 thr: 24 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_11_trace.log.12 [t1..t29] ***EOM*** ======================================================================= Node 2: ======================================================================= Time: Friday 16 November 2012 - 06:01:27 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000006 Program: ndbmtd Pid: 42789 thr: 22 Version: mysql-5.5.25 ndb-7.2.7 Trace: /srv/data/cluster/ndb_data/ndb_12_trace.log.5 [t1..t29] ***EOM*** ======================================================================= Can some one please comment on this bug.
[16 Nov 2012 14:27]
Russell Knighton
File uploaded to FTP with ndb_error_report output. File-name: bug-data-66840.tar.bz2 MD5: d9c43059ae5ce4d05d848c6ad3120f11
[5 Dec 2012 13:07]
Russell Knighton
Just to confirm that this issue is not resolved in 7.2.9: Time: Tuesday 4 December 2012 - 19:13:36 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1263) 0x00000002 Program: ndbmtd Pid: 30474 thr: 24 Version: mysql-5.5.28 ndb-7.2.9 Trace: /srv/data/cluster/ndb_data/ndb_12_trace.log.6 [t1..t29] ***EOM***
[27 May 2013 6:59]
Alexey Asemov
Confirming the same issue: Current byte-offset of file-pointer is: 1067 Time: Monday 20 May 2013 - 11:18:52 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1297) 0x00000000 Program: ndbmtd Pid: 15796 thr: 0 Version: mysql-5.5.30 ndb-7.2.12 Trace: /db/cluster/ndbd/ndb_1_trace.log.1 [t1..t4] ***EOM*** Time: Monday 27 May 2013 - 10:42:37 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBSPJ (Line: 1297) 0x00000000 Program: ndbmtd Pid: 29516 thr: 0 Version: mysql-5.5.30 ndb-7.2.12 Trace: /db/cluster/ndbd/ndb_1_trace.log.2 [t1..t4] ***EOM***
[23 Sep 2014 13:10]
Hartmut Holzgraefe
I'm right now looking at a post mortem that killed a 7.2.15 cluster with the same ndbrequire(). Three nodes killed by hitting the same ndbrequire within a second, and the fourth node failed as it could not continue working on its own. The line number in SimulatedBlock.cpp is 1299 with 7.2.15 but it is the same ndbrequire(ss == SEND_OK || ss == SEND_BLOCKED || ss == SEND_DISCONNECTED); though. From a quick look at the prepareSend() method that returned the 'ss' value I can see that it can also return SEND_BUFFER_FULL, SEND_MESSAGE_TOO_BIG, SEND_UNKNOWN_NODE which could all trigger the ndbrequire() we're seeing here. I *think* we can rule out SEND_UNKNOWN_NODE ...? So it could be either SEND_BUFFER_FULL or SEND_MESSAGE_TOO_BIG ... Now looking again at prepareSend() in TransporterRegistry.cpp I can see: [...] WARNING("Signal to " << nodeId << " lost(buffer)"); report_error(nodeId, TE_SIGNAL_LOST_SEND_BUFFER_FULL); return SEND_BUFFER_FULL; } else { return SEND_MESSAGE_TOO_BIG; } Unfortunately WARNING() would only be active in debug builds AFAICT, so even with absence of a "Signal to ... lost(buffer)" message in the output log we can't simply conclude that we've been hitting SEND_MESSAGE_TOO_BIG and not SEND_BUFFER_FULL ... :/ As all nodes failed on the same ndbrequire() at about the same time my educated guess would be that SEND_MESSAGE_TOO_BIG was the reason for this, but I can't rule out SEND_BUFFER_FULL either ...
[23 Sep 2014 13:54]
Hartmut Holzgraefe
I can provide ndb_error_reporter files if needed, but I don't think there's any more to see in those than what I already wrote ...
[14 Jul 2016 8:57]
MySQL Verification Team
With provided config I can reproduce this bug on 7.2.6 but I cannot reproduce the bug on 7.2.24!. Also on 7.2.6 increasing TCP_DEFAULT values reduced the ability to reproduce the bug (I could still reproduce it with 16M send/receive buffer memory but not easily).