Bug #46507 node failure error 2341 PGMAN/DBTUP during backup from mgm
Submitted: 1 Aug 2009 12:58 Modified: 19 Sep 2010 9:51
Reporter: Nathan Thera Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1.41 ndb-7.0.13 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: error 2341 PGMAN DBTUP backup pgman.cpp DbtupPageMap.cpp

[1 Aug 2009 12:58] Nathan Thera
Description:
Hi. I am running a 4 node 2 mirror mysql cluster on ndb 7.0.6. I recently noticed that I am unable to backup my data because one data node always shutsdown during the process. The log files and stubs are below.

Nathan

from mgm server:
2009-08-01 06:47:49 [MgmSrvr] ALERT    -- Node 6: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

ndb_6_out.log:
2009-08-01 06:47:48 [ndbd] INFO     -- pgman.cpp
2009-08-01 06:47:48 [ndbd] INFO     -- PGMAN (Line: 1507) 0x00000008
2009-08-01 06:47:48 [ndbd] INFO     -- Error handler restarting system
2009-08-01 06:47:48 [ndbd] INFO     -- Error handler shutdown completed - exiting
2009-08-01 06:47:50 [ndbd] ALERT    -- Node 6: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal e
rror, programming error or missing error message, please report a bug). Temporary error, restart node'.
2009-08-01 06:47:50 [ndbd] INFO     -- Ndb has terminated (pid 13011) restarting
2009-08-01 06:48:33 [ndbd] INFO     -- Configuration fetched from '172.29.71.97:1186', generation: 1
2009-08-01 06:48:33 [ndbd] INFO     -- Angel pid: 22541 ndb pid: 24323
NDBMT: non-mt

From ndb_6_error.log:
Time: Saturday 1 August 2009 - 06:47:48
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: pgman.cpp
Error object: PGMAN (Line: 1507) 0x00000008
Program: ndbd
Pid: 13011
Trace: /var/lib/mysql-cluster/ndb_6_trace.log.19
Version: mysql-5.1.34 ndb-7.0.6
***EOM***

How to repeat:
1) from ndb_mgm node run START BACKUP
2) nodes begin backup process. One node fails and the rest of the nodes continue the backup until it finishes. The backup is incomplete with 1 missing node.
3) the other nodes complete their backup and the crashed node has an incomplete file.

Suggested fix:
No suggested fix. Try backup again after nodes are resynced.
Shutdown offending node and attempt backup to at least obtain all data? (untested)
[1 Aug 2009 12:59] Nathan Thera
trace file from the failed backup ndbd node

Attachment: ndb_6_trace.log.zip (application/x-zip-compressed, text), 33.23 KiB.

[10 Aug 2009 13:30] Martin Skold
Please submit configuration and schema defintions.
[11 Aug 2009 9:42] Nathan Thera
Actual configuration names changed to represent assigned function, all other settings are the same including node numbers.

Originally config.ini generated from configurator (http://www.severalnines.com/config/) for 7.0.5.

MySQLD information truncated to only show the first node of the multi-connection setup (ndb_cluster_connection_pool).

Nathan 

config.ini:
[TCP DEFAULT]
SendBufferMemory=2M
ReceiveBufferMemory=2M

[NDB_MGMD DEFAULT]
PortNumber=1186
Datadir=/var/lib/mysql-cluster

[NDB_MGMD]
Id=1
Hostname=mgm1
ArbitrationRank=1

[NDB_MGMD]
Id=2
Hostname=mgm2
ArbitrationRank=1

[NDBD DEFAULT]
NoOfReplicas=2
Datadir=/var/lib/mysql-cluster
DataMemory=7168M
IndexMemory=512M
LockPagesInMainMemory=1

MaxNoOfConcurrentTransactions=50000
MaxNoOfConcurrentOperations=600000
StopOnError=0

StringMemory=25
MaxNoOfTables=4096
MaxNoOfOrderedIndexes=2048
MaxNoOfUniqueHashIndexes=512
MaxNoOfAttributes=24576
DiskCheckpointSpeedInRestart=100M
FragmentLogFileSize=256M
InitFragmentLogFiles=FULL
NoOfFragmentLogFiles=36
RedoBuffer=32M

TimeBetweenLocalCheckpoints=20
TimeBetweenGlobalCheckpoints=1000
TimeBetweenEpochs=100

MemReportFrequency=30
BackupReportFrequency=10
BackupDataDir=/var/lib/mysql-cluster/backup

LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15

BackupMaxWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M

TimeBetweenWatchdogCheckInitial=30000
TransactionDeadlockDetectionTimeout=8000

SharedGlobalMemory=384M
DiskPageBufferMemory=1024M

MaxNoOfExecutionThreads=8
BatchSizePerLocalScan=512

[NDBD]
Id=3
Hostname=data1

[NDBD]
Id=4
Hostname=data2

[NDBD]
Id=5
Hostname=data3

[NDBD]
Id=6
Hostname=data4

[MYSQLD DEFAULT]
BatchSize=512

[MYSQLD]
Id=9
Hostname=data1
...
[MYSQLD]
Id=14
Hostname=data2
...
[MYSQLD]
Id=19
Hostname=data3
...
[MYSQLD]
Id=24
Hostname=data4
...
[MYSQLD]
Id=29
Hostname=binlog1
...
[MYSQLD]
Id=34
Hostname=binlog2
...
[MYSQLD]
Id=39
Hostname=api1
...
[MYSQLD]
Id=44
Hostname=api2
...
[MYSQLD]
Id=49
Hostname=api3
...
[MYSQLD]
Id=54
Hostname=api4
[11 Aug 2009 9:44] Nathan Thera
output of ndb_restore --print-meta data from one node backup. DB and table names obscured.

Attachment: restoreschema.zip (application/x-zip-compressed, text), 3.61 KiB.

[20 Aug 2009 6:31] Jonas Oreland
see bug#44195
[18 Jan 2010 10:29] Jonas Oreland
Hi,

Can you test with a newer version (e.g 7.0.10) since several bugs in the
area has been fixed (and we never manage to reproduce your case)

/Jonas
[19 Feb 2010 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[1 Jun 2010 11:25] Nathan Thera
Similar error happening in mysql-5.1.41 ndb-7.0.13.
Same setup. Higher datamemory + more tables + upgraded mysql version, running ndbmtd this time around on the setup in the first post.

On backup start one node goes down and cluster can not make a complete backup. 

Nathan

ndb_1_cluster.log:
2010-06-01 05:39:31 [MgmtSrvr] ALERT    -- Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

ndb_4_error.log:
Time: Tuesday 1 June 2010 - 05:39:29
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbtup/DbtupPageMap.cpp
Error object: DBTUP (Line: 103) 0x00000008
Program: ndbmtd
Pid: 2841 thr: 2
Version: mysql-5.1.41 ndb-7.0.13
Trace: /var/lib/mysql-cluster/ndb_4_trace.log.13 /var/lib/mysql-cluster/ndb_4_trace.log.13_t1 /var/lib/mysql-cluster/ndb_4_trace.log.
[1 Jun 2010 11:26] Nathan Thera
trace file from ndbmtd failure

Attachment: ndb_4_trace.logs.gz (application/x-gzip, text), 367.50 KiB.

[19 Aug 2010 9:51] Jonas Oreland
This could be caused by http://bugs.mysql.com/bug.php?id=54986
Can you retest with e.g 7.0.17
[19 Sep 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".