Bug #34589 Data node crashed while idle
Submitted: 15 Feb 2008 11:25 Modified: 3 Jun 2008 20:41
Reporter: Stanisław Pitucha Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1.22 x64 OS:Linux (Centos5 on vmware 2cpu)
Assigned to: CPU Architecture:Any

[15 Feb 2008 11:25] Stanisław Pitucha
Description:
Data node crashed while idle - no load, just test table. Running rpms from mysql.com
There are many messages like (every couple minutes):

@4000000047b3ed7726e2e714 2008-02-14 07:27:41 [ndbd] INFO     -- Watchdog: User time: 27982  System time: 240156
@4000000047b3ed7726e3120c 2008-02-14 07:27:41 [ndbd] WARNING  -- Watchdog: Warning overslept 80206 ms, expected 100 ms.

and then crash:

Detected GCP stop(3)...sending kill to [SignalCounter: m_count=2 0000000000000060]
LP 0 state: 0 WW_Gci: 1 gcprec: 0 flq: -256 currfile: 4 tailFileNo: 4 logTailMbyte: 7
  file 0(0)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 0  currentFilepage 0
  file 1(1)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 2(2)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 3(3)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 4(4)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 7  currentFilepage 234
  file 5(5)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 0  currentFilepage 0
  file 6(6)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 7(7)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 8(8)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 9(9)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 10(10)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 11(11)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 12(12)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 13(13)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 14(14)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 15(15)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
LP 1 state: 0 WW_Gci: 1 gcprec: 0 flq: -256 currfile: 20 tailFileNo: 4 logTailMbyte: 9
  file 0(16)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 0  currentFilepage 0
  file 1(17)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 2(18)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 3(19)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 4(20)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 9  currentFilepage 294
  file 5(21)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 0  currentFilepage 0
  file 6(22)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 7(23)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 8(24)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 9(25)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 10(26)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 11(27)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 12(28)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 13(29)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 14(30)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 15(31)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
LP 2 state: 0 WW_Gci: 1 gcprec: 0 flq: -256 currfile: 32 tailFileNo: 0 LogTailMbyte: 2
  file 0(32)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 2  currentFilepage 80
  file 1(33)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 0  currentFilepage 0
  file 2(34)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 3(35)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 4(36)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 5(37)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 6(38)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 7(39)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 8(40)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 9(41)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 10(42)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 11(43)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 12(44)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 13(45)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 14(46)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 15(47)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
LP 3 state: 0 WW_Gci: 1 gcprec: 0 flq: -256 currfile: 48 tailFileNo: 0 logTailMbyte: 2
  file 0(48)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 2  currentFilepage 80
  file 1(49)  FileChangeState: 0  logFileStatus: 20  currentMbyte: 0  currentFilepage 0
  file 2(50)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 3(51)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 4(52)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 5(53)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 6(54)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 7(55)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 8(56)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 9(57)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 10(58)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 11(59)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 12(60)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 13(61)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 14(62)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  file 15(63)  FileChangeState: 0  logFileStatus: 1  currentMbyte: 0  currentFilepage 0
  GCP 0 file: 4 state: 1 sync: 0 page: 234 word: 1581
  GCP 1 file: 20 state: 1 sync: 0 page: 294 word: 7863
  GCP 2 file: 32 state: 1 sync: 0 page: 80 word: 5329
  GCP 3 file: 48 state: 1 sync: 0 page: 80 word: 4841
2008-02-14 07:36:59 [ndbd] INFO     -- Please report this as a bug. Provide as much info as possible, expecially all the ndb_*_out.log files, Than
ks. Shutting down node due to failed handling of GCP_SAVEREQ
2008-02-14 07:36:59 [ndbd] INFO     -- DBLQH (Line: 18973) 0x00000002
2008-02-14 07:36:59 [ndbd] INFO     -- Error handler shutting down system
2008-02-14 07:38:15 [ndbd] INFO     -- Error handler shutdown completed - exiting

Then it restarted correctly.

How to repeat:
Don't know.
[15 Feb 2008 12:43] Susanne Ebrecht
Many thanks for writing a bug report.

Please let us know what you did exactly. Step by step: how you installed and configured this, how many data you have and so on.
[15 Feb 2008 13:07] Stanisław Pitucha
It's 2(api)+2(data)+2(mgm) cluster in following config:
host1: data, host2: data
host3: api+mgm, host4: api+mgm

Both data nodes were started at the same time, but only one crashed (it was master afair). There was no data on cluster - only one empty database. That database had 1 table previously, dropped long before crash happened. Cluster was idle for a week - no changes, no interaction, all nodes running.

All the time there are messages like:
@4000000047ab0103042560e4 2008-02-07 13:00:41 [ndbd] INFO     -- Watchdog: User time: 280  System time: 213
@4000000047ab01030425746c 2008-02-07 13:00:41 [ndbd] WARNING  -- Watchdog: Warning overslept 66465 ms, expected 100 ms.

Those appear at 13:00:41, 13:26:41, 13:49:41, 14:13:41, 14:38:41, ... etc. For a week - always on ...:41 second with 10-30 min spaces.
I can't really tell anything more - that was a cluster prepared for testing, but was empty and idle for a long time - data node was started 7 days before crash, but management nodes were running couple days longer than that.

config.ini:

[NDBD DEFAULT]
DataDir = /var/lib/mysql-cluster
NoOfReplicas = 2
DataMemory = 1024M
IndexMemory = 100M

and sections specifying only Id and Hostname for every node.
[10 Apr 2008 22:28] Sveta Smirnova
Thank you for the feedback.

Please provide all cluster and mysqld log files. Also please try current version 5.1.23 and check if error still exists.
[10 Apr 2008 23:25] Stanisław Pitucha
I've switched that cluster to 5.1.23 some time ago and that crash didn't happen since then. No idea how to trigger it.
I'll report if it happens again, but that cluster seems to be stable now.
[3 Jun 2008 20:41] Sveta Smirnova
Thank you for the feedback.

I'll close the report as "Can't repeat" because last comment. Feel free to reopen it if you see same problem with new version.