MySQL Bugs: #46507: node failure error 2341 PGMAN/DBTUP during backup from mgm

Bug #46507	node failure error 2341 PGMAN/DBTUP during backup from mgm
Submitted:	1 Aug 2009 12:58	Modified:	19 Sep 2010 9:51
Reporter:	Nathan Thera	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	mysql-5.1.41 ndb-7.0.13	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	error 2341 PGMAN DBTUP backup pgman.cpp DbtupPageMap.cpp

Description:
Hi. I am running a 4 node 2 mirror mysql cluster on ndb 7.0.6. I recently noticed that I am unable to backup my data because one data node always shutsdown during the process. The log files and stubs are below.

Nathan

from mgm server:
2009-08-01 06:47:49 [MgmSrvr] ALERT    -- Node 6: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

ndb_6_out.log:
2009-08-01 06:47:48 [ndbd] INFO     -- pgman.cpp
2009-08-01 06:47:48 [ndbd] INFO     -- PGMAN (Line: 1507) 0x00000008
2009-08-01 06:47:48 [ndbd] INFO     -- Error handler restarting system
2009-08-01 06:47:48 [ndbd] INFO     -- Error handler shutdown completed - exiting
2009-08-01 06:47:50 [ndbd] ALERT    -- Node 6: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal e
rror, programming error or missing error message, please report a bug). Temporary error, restart node'.
2009-08-01 06:47:50 [ndbd] INFO     -- Ndb has terminated (pid 13011) restarting
2009-08-01 06:48:33 [ndbd] INFO     -- Configuration fetched from '172.29.71.97:1186', generation: 1
2009-08-01 06:48:33 [ndbd] INFO     -- Angel pid: 22541 ndb pid: 24323
NDBMT: non-mt

From ndb_6_error.log:
Time: Saturday 1 August 2009 - 06:47:48
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: pgman.cpp
Error object: PGMAN (Line: 1507) 0x00000008
Program: ndbd
Pid: 13011
Trace: /var/lib/mysql-cluster/ndb_6_trace.log.19
Version: mysql-5.1.34 ndb-7.0.6
***EOM***

How to repeat:
1) from ndb_mgm node run START BACKUP
2) nodes begin backup process. One node fails and the rest of the nodes continue the backup until it finishes. The backup is incomplete with 1 missing node.
3) the other nodes complete their backup and the crashed node has an incomplete file.

Suggested fix:
No suggested fix. Try backup again after nodes are resynced.
Shutdown offending node and attempt backup to at least obtain all data? (untested)

trace file from the failed backup ndbd node

Attachment: ndb_6_trace.log.zip (application/x-zip-compressed, text), 33.23 KiB.

Please submit configuration and schema defintions.

Actual configuration names changed to represent assigned function, all other settings are the same including node numbers.

Originally config.ini generated from configurator (http://www.severalnines.com/config/) for 7.0.5.

MySQLD information truncated to only show the first node of the multi-connection setup (ndb_cluster_connection_pool).

Nathan 

config.ini:
[TCP DEFAULT]
SendBufferMemory=2M
ReceiveBufferMemory=2M

[NDB_MGMD DEFAULT]
PortNumber=1186
Datadir=/var/lib/mysql-cluster

[NDB_MGMD]
Id=1
Hostname=mgm1
ArbitrationRank=1

[NDB_MGMD]
Id=2
Hostname=mgm2
ArbitrationRank=1

[NDBD DEFAULT]
NoOfReplicas=2
Datadir=/var/lib/mysql-cluster
DataMemory=7168M
IndexMemory=512M
LockPagesInMainMemory=1

MaxNoOfConcurrentTransactions=50000
MaxNoOfConcurrentOperations=600000
StopOnError=0

StringMemory=25
MaxNoOfTables=4096
MaxNoOfOrderedIndexes=2048
MaxNoOfUniqueHashIndexes=512
MaxNoOfAttributes=24576
DiskCheckpointSpeedInRestart=100M
FragmentLogFileSize=256M
InitFragmentLogFiles=FULL
NoOfFragmentLogFiles=36
RedoBuffer=32M

TimeBetweenLocalCheckpoints=20
TimeBetweenGlobalCheckpoints=1000
TimeBetweenEpochs=100

MemReportFrequency=30
BackupReportFrequency=10
BackupDataDir=/var/lib/mysql-cluster/backup

LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15

BackupMaxWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M

TimeBetweenWatchdogCheckInitial=30000
TransactionDeadlockDetectionTimeout=8000

SharedGlobalMemory=384M
DiskPageBufferMemory=1024M

MaxNoOfExecutionThreads=8
BatchSizePerLocalScan=512

[NDBD]
Id=3
Hostname=data1

[NDBD]
Id=4
Hostname=data2

[NDBD]
Id=5
Hostname=data3

[NDBD]
Id=6
Hostname=data4

[MYSQLD DEFAULT]
BatchSize=512

[MYSQLD]
Id=9
Hostname=data1
...
[MYSQLD]
Id=14
Hostname=data2
...
[MYSQLD]
Id=19
Hostname=data3
...
[MYSQLD]
Id=24
Hostname=data4
...
[MYSQLD]
Id=29
Hostname=binlog1
...
[MYSQLD]
Id=34
Hostname=binlog2
...
[MYSQLD]
Id=39
Hostname=api1
...
[MYSQLD]
Id=44
Hostname=api2
...
[MYSQLD]
Id=49
Hostname=api3
...
[MYSQLD]
Id=54
Hostname=api4

output of ndb_restore --print-meta data from one node backup. DB and table names obscured.

Attachment: restoreschema.zip (application/x-zip-compressed, text), 3.61 KiB.

see bug#44195

Hi,

Can you test with a newer version (e.g 7.0.10) since several bugs in the
area has been fixed (and we never manage to reproduce your case)

/Jonas

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Similar error happening in mysql-5.1.41 ndb-7.0.13.
Same setup. Higher datamemory + more tables + upgraded mysql version, running ndbmtd this time around on the setup in the first post.

On backup start one node goes down and cluster can not make a complete backup. 

Nathan

ndb_1_cluster.log:
2010-06-01 05:39:31 [MgmtSrvr] ALERT    -- Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

ndb_4_error.log:
Time: Tuesday 1 June 2010 - 05:39:29
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbtup/DbtupPageMap.cpp
Error object: DBTUP (Line: 103) 0x00000008
Program: ndbmtd
Pid: 2841 thr: 2
Version: mysql-5.1.41 ndb-7.0.13
Trace: /var/lib/mysql-cluster/ndb_4_trace.log.13 /var/lib/mysql-cluster/ndb_4_trace.log.13_t1 /var/lib/mysql-cluster/ndb_4_trace.log.

trace file from ndbmtd failure

Attachment: ndb_4_trace.logs.gz (application/x-gzip, text), 367.50 KiB.

This could be caused by http://bugs.mysql.com/bug.php?id=54986
Can you retest with e.g 7.0.17

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".