MySQL Bugs: #51630: Internal error with shm transport

Bug #51630	Internal error with shm transport
Submitted:	2 Mar 2010 9:43	Modified:	14 Apr 2010 13:51
Reporter:	artem gorbyk	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	ndb-6.3.26, ndb-7.0.9	OS:	Linux (2.6.18-164.el5)
Assigned to:		CPU Architecture:	Any
Tags:	ndb, shm

Description:
Internal error when trying to setup SHM transport between sql and ndb processes.

For 6.3.26 error stack looks like

Failed to ADD epollfd: 3 fd 1048576 node 4 to epoll-set, errno: 9 Bad file descriptor
2010-03-01 18:47:39 [ndbd] INFO     -- Received signal 6. Running error handler.
2010-03-01 18:47:39 [ndbd] INFO     -- Signal 6 received; Aborted
2010-03-01 18:47:39 [ndbd] INFO     -- main.cpp
2010-03-01 18:47:39 [ndbd] INFO     -- Error handler signal shutting down system
2010-03-01 18:47:41 [ndbd] INFO     -- Error handler shutdown completed - exiting
2010-03-01 18:47:41 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

For 7.0.9 a little different -

Failed to ADD epollfd: 3 fd 27734 node 4 to epoll-set, errno: 9 Bad file descriptor
2010-03-01 17:50:02 [ndbd] INFO     -- Received signal 6. Running error handler.
2010-03-01 17:50:02 [ndbd] INFO     -- Signal 6 received; Aborted
2010-03-01 17:50:02 [ndbd] INFO     -- ndbd.cpp
2010-03-01 17:50:02 [ndbd] INFO     -- Error handler signal shutting down system
2010-03-01 17:50:02 [ndbd] INFO     -- Error handler shutdown completed - exiting
2010-03-01 17:50:02 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

No segfaults or other errors in /var/log/messages

How to repeat:
[shm] section of the ndb_mgmd config.ini file looks like
[SHM]
NodeId1=2
NodeId2=4
ShmKey=123
SigNum=10

Where 2 and 4 are nodeids of sql and ndbd processes, located on the same box.
Ndb node (id=2) starts ok, joins cluster and accepts tcp connections from sqls/apis on another hosts.
Then when trying to startup sql node on the same box, after several second I get the above error and ndbd goes down.
Shm segment with given shmkey remains in the system with nattch=0 and I had to remove it with ipcrm.