Bug #85110 Ndb kernel is stuck in: Polling for Receive
Submitted: 21 Feb 2017 18:03 Modified: 24 Mar 2017 11:44
Reporter: Anirban Chakraborty Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster/J Severity:S1 (Critical)
Version:5.1.22 OS:SUSE (SUSE Linux Enterprise Server 11 (x86_64))
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: MySQL, ndb, NDB Kernal, polling, Receive, Stuck

[21 Feb 2017 18:03] Anirban Chakraborty
Description:
We had 4 node MySQL NDB cluster had up and running, but very often in one of the nodes, we are seeing the error as shown Ndb kernel is stuck in: Polling for Receive, then it triggers to fail the other nodes as well and NDB fails completely.  

Is there any way, we can prevent this happening? This is going to be in Production DB. Can anyone suggest what could be the best suggestion without upgrading the version of MySQL and NDB. 

2017-02-17 22:47:02 [ndbd] WARNING  -- Ndb kernel is stuck in: Polling for Receive
2017-02-17 22:47:02 [ndbd] INFO     -- Watchdog: User time: 77662  System time: 18051
2017-02-17 23:00:48 [ndbd] INFO     -- Watchdog: User time: 78660  System time: 18391
2017-02-17 23:00:48 [ndbd] WARNING  -- Watchdog: Warning overslept 281 ms, expected 100 ms.
2017-02-17 23:05:51 [ndbd] INFO     -- Watchdog: User time: 78774  System time: 18437
2017-02-17 23:05:51 [ndbd] WARNING  -- Watchdog: Warning overslept 553 ms, expected 100 ms.
2017-02-17 23:05:52 [ndbd] INFO     -- Watchdog: User time: 78774  System time: 18437
2017-02-17 23:05:52 [ndbd] WARNING  -- Watchdog: Warning overslept 774 ms, expected 100 ms.
2017-02-17 23:05:52 [ndbd] WARNING  -- Ndb kernel is stuck in: Polling for Receive
2017-02-17 23:05:52 [ndbd] INFO     -- Watchdog: User time: 78774  System time: 18437
2017-02-17 23:05:52 [ndbd] INFO     -- Arbitrator decided to shutdown this node
2017-02-17 23:05:52 [ndbd] INFO     -- QMGR (Line: 4897) 0x0000000a
2017-02-17 23:05:52 [ndbd] INFO     -- Error handler shutting down system
2017-02-17 23:05:52 [ndbd] INFO     -- Error handler shutdown completed - exiting
2017-02-17 23:05:52 [ndbd] ALERT    -- Node 4: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

How to repeat:
The above error triggers everytime we restart the NDB services and had it from the beginning up and running. The error triggers within couple of hours in the same data node. 

Suggested fix:
Complete restart services of NDB cluster and MySQL DB.
[24 Feb 2017 11:44] MySQL Verification Team
Hi,

This does not look like a bug but improperly configured mysql cluster.
Without full logs (prepare them by using ndb_error_reporter) I can't say for sure but what I see from the snippet you sent, that data node is overloaded.

best regards
Bogdan
[25 Mar 2017 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".