Description:
Internal error when trying to setup SHM transport between sql and ndb processes.
For 6.3.26 error stack looks like
Failed to ADD epollfd: 3 fd 1048576 node 4 to epoll-set, errno: 9 Bad file descriptor
2010-03-01 18:47:39 [ndbd] INFO -- Received signal 6. Running error handler.
2010-03-01 18:47:39 [ndbd] INFO -- Signal 6 received; Aborted
2010-03-01 18:47:39 [ndbd] INFO -- main.cpp
2010-03-01 18:47:39 [ndbd] INFO -- Error handler signal shutting down system
2010-03-01 18:47:41 [ndbd] INFO -- Error handler shutdown completed - exiting
2010-03-01 18:47:41 [ndbd] ALERT -- Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
For 7.0.9 a little different -
Failed to ADD epollfd: 3 fd 27734 node 4 to epoll-set, errno: 9 Bad file descriptor
2010-03-01 17:50:02 [ndbd] INFO -- Received signal 6. Running error handler.
2010-03-01 17:50:02 [ndbd] INFO -- Signal 6 received; Aborted
2010-03-01 17:50:02 [ndbd] INFO -- ndbd.cpp
2010-03-01 17:50:02 [ndbd] INFO -- Error handler signal shutting down system
2010-03-01 17:50:02 [ndbd] INFO -- Error handler shutdown completed - exiting
2010-03-01 17:50:02 [ndbd] ALERT -- Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
No segfaults or other errors in /var/log/messages
How to repeat:
[shm] section of the ndb_mgmd config.ini file looks like
[SHM]
NodeId1=2
NodeId2=4
ShmKey=123
SigNum=10
Where 2 and 4 are nodeids of sql and ndbd processes, located on the same box.
Ndb node (id=2) starts ok, joins cluster and accepts tcp connections from sqls/apis on another hosts.
Then when trying to startup sql node on the same box, after several second I get the above error and ndbd goes down.
Shm segment with given shmkey remains in the system with nattch=0 and I had to remove it with ipcrm.