Bug #68686 MySQL-cluster node crash - job buffer full
Submitted: 15 Mar 2013 20:07 Modified: 14 Jul 2016 13:13
Reporter: Nenad Merdanovic Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.5.29-ndb-7.2.10 OS:Linux (Debian Squeeze)
Assigned to: Bogdan Kecman CPU Architecture:Any
Tags: Job buffer NDB data node sleeploop

[15 Mar 2013 20:07] Nenad Merdanovic
Description:
Under heavy load I experience NDB data node crashes due to:

[lots of these]
6 - sleeploop 10!!
6 - sleeploop 10!!
6 - sleeploop 10!!
6 - sleeploop 10!!
6 - sleeploop 10!!
job buffer full
2013-03-15 13:36:07 [ndbd] INFO     -- Received signal 6. Running error handler.
2013-03-15 13:36:07 [ndbd] INFO     -- Signal 6 received; Aborted
2013-03-15 13:36:07 [ndbd] INFO     -- /pb2/build/sb_0-7932439-1355951702.81/mysql-cluster-gpl-7.2.10/storage/ndb/src/kernel/ndbd.cpp
2013-03-15 13:36:07 [ndbd] INFO     -- Error handler signal shutting down system
2013-03-15 13:36:07 [ndbd] INFO     -- Error handler shutdown completed - exiting
2013-03-15 13:36:08 [ndbd] ALERT    -- Node 1: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a 
bug). Temporary error, restart node'.

I have attached the traces and config.ini. Haven't run the utility to get it all as it stalls when finishing the collections process. So if anything else is needed let me know.

From what I have seen in the changelogs this should be fixed. I have marked this S2 as workaround is to use single-threaded ndbd.

Both nodes in Nodegroup 0 did this and then whole cluster crashes.

How to repeat:
Happens on random so not sure how to repeat.
[15 Mar 2013 20:08] Nenad Merdanovic
Trace files from the failing data node

Attachment: traces.tar.gz (application/x-gzip, text), 801.54 KiB.

[15 Mar 2013 20:09] Nenad Merdanovic
Configuration file

Attachment: config.ini (application/octet-stream, text), 1.89 KiB.

[18 Mar 2013 8:27] Umesh Shastry
Hello Nenad,

Could you please attach the complete cluster logs? Preferably using the ndb_error_reporter utility:

  http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-programs-ndb-error-reporter.html

Regards,
Umesh
[18 Mar 2013 8:29] Umesh Shastry
Looks similar one - http://bugs.mysql.com/bug.php?id=65454
[18 Mar 2013 10:16] Nenad Merdanovic
Hello,

Added report to the FTP (ndb_report_68686.tar.bz2).

Regards,
Nenad
[14 Jul 2016 13:13] Bogdan Kecman
- can reproduce on 7.2.4
- cannot reproduce on 7.2.23 nor 7.4.11
trace looks like 14143553 (fixed)