Bug #77014 SIGSEGV for ndbmtd
Submitted: 12 May 2015 10:54 Modified: 26 May 2015 13:36
Reporter: Сергей Кукуев Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.4.4 OS:Linux (2.6.39-200.24.1.el6uek.x86_64)
Assigned to: CPU Architecture:Any
Tags: ndbmtd, segmentation fault, Signal 11, SIGSEGV

[12 May 2015 10:54] Сергей Кукуев
Description:
Two of four data nodes were shut down due signal 11

In data node log:
 
2015-05-08 12:07:11 [ndbd] INFO     -- Node started
2015-05-12 10:25:09 [ndbd] INFO     -- This is the last table
2015-05-12 10:25:09 [ndbd] INFO     -- And all tables are written to already written disk
2015-05-12 10:28:35 [ndbd] INFO     -- This is the last table
2015-05-12 10:28:35 [ndbd] INFO     -- And all tables are written to already written disk
2015-05-12 10:52:57 [ndbd] ALERT    -- Node 4: Forced node shutdown completed. Occured during startphase 0. Initiated by signal 11.

In syslog several times occured following :

May 12 10:31:55 pi1bs5 kernel: INFO: task ndbmtd:62294 blocked for more than 120 seconds.
May 12 10:31:55 pi1bs5 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 12 10:31:55 pi1bs5 kernel: ndbmtd          D ffff881fdc0dcb60     0 62294  62276 0x00000080
May 12 10:31:55 pi1bs5 kernel: ffff881fc2b0fcb8 0000000000000086 ffff881fc2b0fc98 ffffffff810a1bf8
May 12 10:31:55 pi1bs5 kernel: 00000000000121c0 ffff881fc2b0ffd8 ffff881fc2b0e010 00000000000121c0
May 12 10:31:55 pi1bs5 kernel: ffff881fc2b0ffd8 00000000000121c0 ffffffff81781020 ffff881fdc0dc5c0
May 12 10:31:55 pi1bs5 kernel: Call Trace:
May 12 10:31:55 pi1bs5 kernel: [<ffffffff810a1bf8>] ? exit_robust_list+0x88/0x160
May 12 10:31:55 pi1bs5 kernel: [<ffffffff8150396f>] schedule+0x3f/0x60
May 12 10:31:55 pi1bs5 kernel: [<ffffffff8106f505>] exit_mm+0x85/0x170
May 12 10:31:55 pi1bs5 kernel: [<ffffffff8106f769>] do_exit+0x179/0x430
May 12 10:31:55 pi1bs5 kernel: [<ffffffff810a08bf>] ? __unqueue_futex+0x3f/0x80
May 12 10:31:55 pi1bs5 kernel: [<ffffffff8106fa75>] do_group_exit+0x55/0xd0
May 12 10:31:55 pi1bs5 kernel: [<ffffffff8108103f>] get_signal_to_deliver+0x21f/0x480
May 12 10:31:55 pi1bs5 kernel: [<ffffffff81013919>] do_signal+0x69/0x190
May 12 10:31:55 pi1bs5 kernel: [<ffffffff810a38ac>] ? do_futex+0xec/0x1c0
May 12 10:31:55 pi1bs5 kernel: [<ffffffff810a39fb>] ? sys_futex+0x7b/0x180
May 12 10:31:55 pi1bs5 kernel: [<ffffffff81013aa5>] do_notify_resume+0x65/0x80
May 12 10:31:55 pi1bs5 kernel: [<ffffffff8150df50>] int_signal+0x12/0x17

and then

May 12 10:52:50 pi1bs5 abrt[28617]: Saved core dump of pid 62277 (/usr/sbin/ndbmtd) to /var/spool/abrt/ccpp-2015-05-12-10:28:35-62277 (11945696460
May 12 10:52:50 pi1bs5 abrtd: Directory 'ccpp-2015-05-12-10:28:35-62277' creation detected
May 12 10:52:52 pi1bs5 abrtd: Package 'MySQL-Cluster-server-advanced' isn't signed with proper key
May 12 10:52:52 pi1bs5 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2015-05-12-10:28:35-62277' exited with 1
May 12 10:52:52 pi1bs5 abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2015-05-12-10:28:35-62277, deleting

How to repeat:
Have no idea
[12 May 2015 11:01] Сергей Кукуев
Error report uploaded to sftp.oracle.com
file mysql-bug-data-77014.tar.bz2
[26 May 2015 13:32] MySQL Verification Team
Hi,

In order to investigate this further a full cluster logs are required. You can collect them using ndb_error_reporter tool. 

all best
Bogdan Kecman
[26 May 2015 13:36] Сергей Кукуев
Hi, Bogdan!

I've already attached full logs from ndb_error_reporter - see comment from 12 of May
[26 May 2015 13:45] MySQL Verification Team
Hi Sergey,

Sorry I seen only the "messages" missed that one. 

Thanks for the report
Bogdan Kecman
[7 Jul 2015 12:14] Mikael Ronström
Posted by developer:
 
Seen similar issues in test runs occasionally. This happens during the memory allocation phase of a
data node start (ndbtmd start). Most likely some issue in interaction between OS and ndbmtd.
Unfortunately we never get a core file which is analysable which seems kind of to be part of
the problem.