Bug #20612 failed ndbrequire in PGMAN
Submitted: 21 Jun 2006 17:52 Modified: 11 Jun 2007 12:59
Reporter: Nikolay Grishakin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1 OS:Linux (Linux)
Assigned to: Pekka Nousiainen CPU Architecture:Any

[21 Jun 2006 17:52] Nikolay Grishakin
Description:
[ndbdev@ndb13 mysql-test]$ gdb ../storage/ndb/src/kernel/ndbd ./var/ndbcluster-9310/core.25393
GNU gdb Red Hat Linux (6.3.0.0-1.84rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/libth
read_db.so.1".

Core was generated by `/home/ndbdev/ngrishakin/mysql-5.1/storage/ndb/src/kernel/ndbd --no-defaults -
-c'.
Program terminated with signal 6, Aborted.
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib64/libgcc_s.so.1
#0  0x00000033b832f280 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00000033b832f280 in raise () from /lib64/libc.so.6
#1  0x00000033b8330750 in abort () from /lib64/libc.so.6
#2  0x00000000004ab6d5 in childAbort (code=Could not find the frame base for "childAbort(int, unsign
ed int)".
) at main.cpp:105
#3  0x00000000006c5aed in NdbShutdown (type=NST_ErrorHandler, restartType=NRT_Default)
    at Emulator.cpp:255
#4  0x00000000006cfeb8 in ErrorReporter::handleError (messageID=6052,
    problemData=0x7fffffb00fd0 "Remote note id 2.", objRef=0x75d269 "TransporterCallback.cpp",
    nst=NST_ErrorHandler) at ErrorReporter.cpp:206
#5  0x00000000006c4a22 in reportError (callbackObj=0x0, nodeId=2,
    errorCode=TE_SIGNAL_LOST_SEND_BUFFER_FULL, info=0x0) at TransporterCallback.cpp:350
#6  0x00000000006d23ea in TransporterRegistry::prepareSend (this=0x9f21a0,
    signalHeader=0x7fffffb011b0, prio=1 '\001', signalData=0x9f51d8, nodeId=2, thePool=@0x9f2080,
    ptr=0x9f51a8) at TransporterRegistry.cpp:706
#7  0x00000000006bbd59 in SimulatedBlock::sendSignal (this=0xb20c00, ref=16187394, gsn=195,
    signal=0x9f5188, length=4, jobBuffer=JBB) at SimulatedBlock.cpp:261
#8  0x00000000005d2ce8 in Dbtc::releaseAndAbort (this=0xb20c00, signal=0x9f5188)
    at dbtc/DbtcMain.cpp:6044
#9  0x00000000005dda60 in Dbtc::abort015Lab (this=0xb20c00, signal=0x9f5188)
    at dbtc/DbtcMain.cpp:5963
#10 0x00000000005f5ee7 in Dbtc::execCONTINUEB (this=0xb20c00, signal=0x9f5188)
    at dbtc/DbtcMain.cpp:248
#11 0x00000000004d2edb in SimulatedBlock::executeFunction (this=0xb20c00, gsn=164,
    signal=0x9f5188) at ./SimulatedBlock.hpp:575
#12 0x00000000006c2718 in FastScheduler::doJob (this=0x9f22e0) at FastScheduler.cpp:137
#13 0x00000000006c36fb in ThreadConfig::ipControlLoop (this=0xa08a80) at ThreadConfig.cpp:175
#14 0x00000000004ac6c6 in main (argc=4, argv=0x7fffffb01728) at main.cpp:470
(gdb)

How to repeat:
Here are instructions how to run the test:
Before running the test clean up ndb13:/home/ndbdev/ngrishakin/crdd_sys-workdir/
directory.

Location:
ndb13:/home/ndbdev/ngrishakin/mysql-test-extra-5.1/mysql-test/suite/crdd_sys/...

To run it: [ndbdev@ndb13 crdd_sys]$./run_crddsys --config=crdd_sys.cnf. 

Results are in ndb13:/home/ndbdev/ngrishakin/crdd_sys-workdir/...
[21 Jun 2006 17:58] Nikolay Grishakin
Information from log files:

[ndbdev@ndb13 ndbcluster-9310]$ cat ndb_1_error.log
Current byte-offset of file-pointer is: 568

Time: Wednesday 21 June 2006 - 19:23:51
Status: Permanent error, external action needed
Message: Signal lost, out of send buffer memory, please increase SendBufferMemory (Resource configur
ation error)
Error: 6052
Error data: Remote note id 2.
Error object: TransporterCallback.cpp
Program: /home/ndbdev/ngrishakin/mysql-5.1/storage/ndb/src/kernel/ndbd
Pid: 25393
Trace: ./ndb_1_trace.log.1
Version: Version 5.1.12 (beta)
***EOM***
[21 Jun 2006 19:42] Nikolay Grishakin
log files

Attachment: 20216_logs.zip (application/x-zip-compressed, text), 41.42 KiB.

[22 Jun 2006 5:41] Tomas Ulin
please set SendBufferMemory in your config.ini
[22 Jun 2006 22:03] Nikolay Grishakin
I added SendBufferMemory to config.ini file and it did not take it. See the error returned:
" Error line 7: [DB] Unknown parameter: SendBufferMemory Error line 7: Could not parse name-value pair in config file.
Unable to read config file
Unable to start /home/ndbdev/ngrishakin/mysql-5.1/storage/ndb/src/mgmsrv/ndb_mgmd --no-defaults --co re  from /home/ndbdev/ngrishakin/mysql-5.1/mysql-test
Aborting: Failed to install ndb cluster "

BTW, still getting the core!
[23 Jun 2006 0:11] Nikolay Grishakin
Added SendBufferMemory parameter under [TCP DEFAULT] section of both mysql-5.1/mysql-test/ndb/ndb_config_2_node.ini and ndb_config_1_node.ini. Set this parameter to 10485760. But still getting the core.
[23 Jun 2006 17:37] Nikolay Grishakin
Copied core file and all logs to ndbmaster:/bugs/bug20612
[27 Jun 2006 5:39] Tomas Ulin
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing e
rror message, please report a bug)
Error: 2341
Error data: pgman.cpp
Error object: PGMAN (Line: 1973) 0x0000000a

NDBFS   001031 001033 000844 001050 
QMGR    000116 000140 002088 002111 002125 002125 002135 
DBTC    006067 006101 006125 
DBTC    006067 006101 006125 
DBLQH   002431 
DBTC    004050 
DBTUP   002032 
PGMAN   000209 000214 001973 

--------------- Signal ----------------
r.bn: 261 "PGMAN", r.proc: 2, r.sigId: 4390357 gsn: 164 "CONTINUEB" prio: 0
s.bn: 261 "PGMAN", s.proc: 2, s.sigId: 4390354 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000000
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 4390353 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 2, s.sigId: 4390351 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 4390352 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 2, s.sigId: 4390351 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 252 "QMGR", r.proc: 2, r.sigId: 4390351 gsn: 164 "CONTINUEB" prio: 0
s.bn: 252 "QMGR", s.proc: 2, s.sigId: 4390349 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 4390350 gsn: 164 "CONTINUEB" prio: 0
s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 4390348 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel every 10ms
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 4390347 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 2, s.sigId: 4390345 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 4390346 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 2, s.sigId: 4390345 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 252 "QMGR", r.proc: 2, r.sigId: 4390345 gsn: 164 "CONTINUEB" prio: 0
s.bn: 252 "QMGR", s.proc: 2, s.sigId: 4390343 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 4390344 gsn: 164 "CONTINUEB" prio: 0
s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 4390342 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel every 10ms
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 2, r.sigId: 4390341 gsn: 406 "TCGETOPSIZEREQ" prio: 1
s.bn: 246 "DBDIH", s.proc: 1, s.sigId: 4550150 length: 2 trace: 0 #sec: 0 fragInf: 0
 H'00000002 H'00f60001
[27 Jun 2006 5:45] Tomas Ulin
> more config.ini 
[ndbd default]
NoOfReplicas= 2
MaxNoOfConcurrentTransactions= 64
MaxNoOfConcurrentOperations= 300000
DataMemory= 20M
IndexMemory= 300M
Diskless= 0
TimeBetweenWatchDogCheck= 30000
DataDir= .
MaxNoOfOrderedIndexes= 32
MaxNoOfAttributes= 2048
TimeBetweenGlobalCheckpoints= 500
NoOfFragmentLogFiles= 3
DiskPageBufferMemory= 4M
# the following parametes just function as a small regression
# test that the parameter exists
InitialNoOfOpenFiles= 27
[TCP DEFAULT]
SendBufferMemory: 10485760
[ndbd]
HostName= localhost

[ndbd]
HostName= localhost

[ndb_mgmd]
HostName= localhost
DataDir= .    #
PortNumber= 9310

[mysqld]

[mysqld]

[mysqld]

[mysqld]

[mysqld]

[mysqld]

[mysqld]

[mysqld]
[10 Sep 2006 19:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/11665

ChangeSet@1.2306, 2006-09-10 21:58:51+02:00, pekka@orca.ndb.mysql.com +3 -0
  ndb - bug#20612 ins-del fix in tup
[7 Nov 2006 14:39] Serge Kozlov
Couldn't repeat too.
[1 Jun 2007 8:01] Pekka Nousiainen
the context probably:

ndbrequire(queue_count == pl_queue.count() || dump_page_lists());
[3 Jun 2007 17:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28006

ChangeSet@1.2488, 2007-06-03 19:30:37+02:00, tomas@whalegate.ndb.mysql.com +1 -0
  Bug#20612.
[11 Jun 2007 11:39] Bugs System
Pushed into 5.1.20-beta
[11 Jun 2007 12:59] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented fix in 5.1.20 and telco changelogs.
[3 Jul 2007 6:43] Jon Stephens
Also documented for telco-6.2.3 release.