Bug #46507 node failure error 2341 PGMAN during backup from mgm
Submitted: 1 Aug 14:58 Modified: 11 Aug 11:42
Reporter: Nathan Thera
Status: Open
Category:Server: Cluster Severity:S3 (Non-critical)
Version:mysql-5.1.34 ndb-7.0.6 OS:Any
Assigned to: Pekka Nousiainen Target Version:
Tags: error 2341 PGMAN backup pgman.cpp line 1507
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[1 Aug 14:58] Nathan Thera
Description:
Hi. I am running a 4 node 2 mirror mysql cluster on ndb 7.0.6. I recently noticed that I
am unable to backup my data because one data node always shutsdown during the process.
The log files and stubs are below.

Nathan

from mgm server:
2009-08-01 06:47:49 [MgmSrvr] ALERT    -- Node 6: Forced node shutdown completed. Caused
by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming
error or missing error message, please report a bug). Temporary error, restart node'.

ndb_6_out.log:
2009-08-01 06:47:48 [ndbd] INFO     -- pgman.cpp
2009-08-01 06:47:48 [ndbd] INFO     -- PGMAN (Line: 1507) 0x00000008
2009-08-01 06:47:48 [ndbd] INFO     -- Error handler restarting system
2009-08-01 06:47:48 [ndbd] INFO     -- Error handler shutdown completed - exiting
2009-08-01 06:47:50 [ndbd] ALERT    -- Node 6: Forced node shutdown completed. Caused by
error 2341: 'Internal program error (failed ndbrequire)(Internal e
rror, programming error or missing error message, please report a bug). Temporary error,
restart node'.
2009-08-01 06:47:50 [ndbd] INFO     -- Ndb has terminated (pid 13011) restarting
2009-08-01 06:48:33 [ndbd] INFO     -- Configuration fetched from '172.29.71.97:1186',
generation: 1
2009-08-01 06:48:33 [ndbd] INFO     -- Angel pid: 22541 ndb pid: 24323
NDBMT: non-mt

From ndb_6_error.log:
Time: Saturday 1 August 2009 - 06:47:48
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or
missing error message, please report a bug)
Error: 2341
Error data: pgman.cpp
Error object: PGMAN (Line: 1507) 0x00000008
Program: ndbd
Pid: 13011
Trace: /var/lib/mysql-cluster/ndb_6_trace.log.19
Version: mysql-5.1.34 ndb-7.0.6
***EOM***

How to repeat:
1) from ndb_mgm node run START BACKUP
2) nodes begin backup process. One node fails and the rest of the nodes continue the
backup until it finishes. The backup is incomplete with 1 missing node.
3) the other nodes complete their backup and the crashed node has an incomplete file.

Suggested fix:
No suggested fix. Try backup again after nodes are resynced.
Shutdown offending node and attempt backup to at least obtain all data? (untested)
[1 Aug 14:59] Nathan Thera
trace file from the failed backup ndbd node

Attachment: ndb_6_trace.log.zip (application/x-zip-compressed, text), 33.23 KiB.

[10 Aug 15:30] Martin Skold
Please submit configuration and schema defintions.
[11 Aug 11:42] Nathan Thera
Actual configuration names changed to represent assigned function, all other settings are
the same including node numbers.

Originally config.ini generated from configurator (http://www.severalnines.com/config/)
for 7.0.5.

MySQLD information truncated to only show the first node of the multi-connection setup
(ndb_cluster_connection_pool).

Nathan 

config.ini:
[TCP DEFAULT]
SendBufferMemory=2M
ReceiveBufferMemory=2M

[NDB_MGMD DEFAULT]
PortNumber=1186
Datadir=/var/lib/mysql-cluster

[NDB_MGMD]
Id=1
Hostname=mgm1
ArbitrationRank=1

[NDB_MGMD]
Id=2
Hostname=mgm2
ArbitrationRank=1

[NDBD DEFAULT]
NoOfReplicas=2
Datadir=/var/lib/mysql-cluster
DataMemory=7168M
IndexMemory=512M
LockPagesInMainMemory=1

MaxNoOfConcurrentTransactions=50000
MaxNoOfConcurrentOperations=600000
StopOnError=0

StringMemory=25
MaxNoOfTables=4096
MaxNoOfOrderedIndexes=2048
MaxNoOfUniqueHashIndexes=512
MaxNoOfAttributes=24576
DiskCheckpointSpeedInRestart=100M
FragmentLogFileSize=256M
InitFragmentLogFiles=FULL
NoOfFragmentLogFiles=36
RedoBuffer=32M

TimeBetweenLocalCheckpoints=20
TimeBetweenGlobalCheckpoints=1000
TimeBetweenEpochs=100

MemReportFrequency=30
BackupReportFrequency=10
BackupDataDir=/var/lib/mysql-cluster/backup

LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15

BackupMaxWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M

TimeBetweenWatchdogCheckInitial=30000
TransactionDeadlockDetectionTimeout=8000

SharedGlobalMemory=384M
DiskPageBufferMemory=1024M

MaxNoOfExecutionThreads=8
BatchSizePerLocalScan=512

[NDBD]
Id=3
Hostname=data1

[NDBD]
Id=4
Hostname=data2

[NDBD]
Id=5
Hostname=data3

[NDBD]
Id=6
Hostname=data4

[MYSQLD DEFAULT]
BatchSize=512

[MYSQLD]
Id=9
Hostname=data1
...
[MYSQLD]
Id=14
Hostname=data2
...
[MYSQLD]
Id=19
Hostname=data3
...
[MYSQLD]
Id=24
Hostname=data4
...
[MYSQLD]
Id=29
Hostname=binlog1
...
[MYSQLD]
Id=34
Hostname=binlog2
...
[MYSQLD]
Id=39
Hostname=api1
...
[MYSQLD]
Id=44
Hostname=api2
...
[MYSQLD]
Id=49
Hostname=api3
...
[MYSQLD]
Id=54
Hostname=api4
[11 Aug 11:44] Nathan Thera
output of ndb_restore --print-meta data from one node backup. DB and table names obscured.

Attachment: restoreschema.zip (application/x-zip-compressed, text), 3.61 KiB.

[20 Aug 8:31] Jonas Oreland
see bug#44195