Bug #46723 Node restarts without reasonable explanation
Submitted: 14 Aug 2009 15:26 Modified: 24 Aug 2009 8:22
Reporter: Robert Klikics Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telcom-7.0 OS:Linux (Debian 5.0)
Assigned to: Assigned Account CPU Architecture:Any
Tags: cluster mysql-5.1.34 ndb-7.0.6, fail, restart

[14 Aug 2009 15:26] Robert Klikics
Description:
One of my cluster node did a restart without any reason twice in a row right now :(

ndbdmt output:

2009-08-14 16:24:43 [ndbd] WARNING  -- Ndb kernel thread 3 is stuck in: Performing Send elapsed=100
2009-08-14 16:24:44 [ndbd] INFO     -- Watchdog: User time: 18677627  System time: 8457017
2009-08-14 16:24:44 [ndbd] INFO     -- Watchdog: User time: 26133868  System time: 9481077
2009-08-14 16:24:44 [ndbd] WARNING  -- Watchdog: Warning overslept 1212 ms, expected 100 ms.
2009-08-14 16:24:44 [ndbd] INFO     -- suma/Suma.cpp
2009-08-14 16:24:44 [ndbd] INFO     -- SUMA (Line: 4120) 0x0000000c
2009-08-14 16:24:44 [ndbd] INFO     -- Error handler restarting system
2009-08-14 16:24:44 [ndbd] INFO     -- Error handler shutdown completed - exiting
2009-08-14 16:24:45 [ndbd] ALERT    -- Node 5: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2009-08-14 16:24:45 [ndbd] INFO     -- Ndb has terminated (pid 26367) restarting
2009-08-14 16:24:59 [ndbd] INFO     -- Configuration fetched from '192.168.10.100:1186', generation: 1
2009-08-14 16:24:59 [ndbd] INFO     -- Angel pid: 26366 ndb pid: 12838
NDBMT: MaxNoOfExecutionThreads=4
NDBMT: workers=2 threads=2
2009-08-14 16:24:59 [ndbd] INFO     -- NDB Cluster -- DB node 5
2009-08-14 16:24:59 [ndbd] INFO     -- mysql-5.1.34 ndb-7.0.6 --
2009-08-14 16:24:59 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 59454Mb initial: 59474Mb

mgmd output:

2009-08-14 16:24:45 [MgmSrvr] ALERT    -- Node 5: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

The other 3 of 4 nodes did not log anything..

How to repeat:
Actually no idea
[14 Aug 2009 15:27] Robert Klikics
Trace logs

Attachment: trace.tar.gz (application/x-gzip, text), 441.85 KiB.

[17 Aug 2009 7:57] Jonas Oreland
see bug#45612
[17 Aug 2009 13:19] Jørgen Austvik
Thanks for the bug report. Can you please provide some more information?
What load do you have?
What hardware do you have?
What is your configuration?
Do you have any error logs?
[17 Aug 2009 13:32] Robert Klikics
What load do you have?

Around 1.0 Loadavg all the time. CPU-usage around 20-30% (ndbmtd)

What hardware do you have?

Intel(R) Xeon(R) CPU X5460  @ 3.16GHz quadcore, 64 GB RAM

What is your configuration?

4 ndb, 1 mgm.

Do you have any error logs?

No, just the attached tracelogs and the output provided above...

Regards,
Robert
[19 Aug 2009 15:29] Jonas Oreland
Hi,

If you could retest with patch I attached to
http://bugs.mysql.com/bug.php?id=46782 that would be great

/Jonas
[24 Aug 2009 8:22] Jonas Oreland
http://bugs.mysql.com/bug.php?id=46782