| Bug #19669 | Sometimes StartFailureTimeout isn't applied properly | ||
|---|---|---|---|
| Submitted: | 10 May 2006 10:45 | Modified: | 12 Sep 2006 2:18 |
| Reporter: | Serge Kozlov | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
| Version: | 5.1.10 (beta) | OS: | Linux (FC4) |
| Assigned to: | david li | CPU Architecture: | Any |
[10 May 2006 10:45]
Serge Kozlov
[12 Sep 2006 2:13]
david li
I did the following tests:
Simple Discription:
1. set StartFailureTimeout & MaxNoOfTables in config.ini
(according to the bug report)
2. set StartFailureTimeout, not set MaxNoOfTables in config.ini
(to check whether StartFailureTimeout parameter is effective)
3. set StartFailureTimeout, not set MaxNoOfTables in config.ini,
let it sleep 2 seconds in Dbtup::initRecords().
(to check whether MaxNoOfTables parameter is processed before
StartFailureTimeout)
Detailed Description:
********************
* 1. set StartFailureTimeout & MaxNoOfTables in config.ini
********************
1.1 config.ini
[ndbd default]
NoOfReplicas= 2
DataMemory= 10M
IndexMemory= 10M
StartFailureTimeout=1000 # 1sec
MaxNoOfTables=1000000000 # more than the total memory of test box
[ndb_mgmd]
Id=1
HostName= 127.0.0.1
DataDir=/usr/local/mysql/data
[ndbd]
Id= 2
HostName= 127.0.0.1
DataDir= /usr/local/mysql/data/node2
... (4 ndb nodes)
1.2 start cluster
$libexec/ndb_mgmd
$libexec/ndbd --initial
1.3 test result
ndbd starting
after n seconds ( n > StartFailureTimeout(1 second) )
start failed: cannot allocate memory.
ndb_2_out.log
2006-09-11 16:41:48 [ndbd] ALERT -- Node 2: Forced node shutdown
completed. Occured during startphase 0. Initiated by signal 6. Caused by
error 2327: 'Memory allocation failure, please decrease some
configuration parameters(Configuration error). Permanent error, external
action needed'.
1.4 the call stack
(gdb) bt
#0 0x00bf8c38 in raise () from /lib/tls/i686/libc.so.6
#1 0x00bfa0b8 in abort () from /lib/tls/i686/libc.so.6
#2 0x080e0c8a in childAbort (code=-1, currentStartPhase=0) at
main.cpp:105
#3 0x082ae545 in NdbShutdown (type=NST_ErrorHandlerStartup,
restartType=NRT_Default)
at Emulator.cpp:255
#4 0x082b799e in ErrorReporter::handleError (messageID=2327,
problemData=0xbffff630 "DBTUP could not allocate memory for
TableDescriptor",
objRef=0xbffff530 "Requested: 4x2820146816 = 2690652672 bytes",
nst=NST_ErrorHandlerStartup)
at ErrorReporter.cpp:206
#5 0x082a6182 in SimulatedBlock::allocRecord (this=0xb7be4008,
type=0x8331647 "TableDescriptor",
s=4, n=2820146816, clear=true) at SimulatedBlock.cpp:683
#6 0x0821cd90 in Dbtup::initRecords (this=0xb7be4008) at
dbtup/DbtupGen.cpp:366
#7 0x0821ca14 in Dbtup::execREAD_CONFIG_REQ (this=0xb7be4008,
signal=0x847a2a4)
at dbtup/DbtupGen.cpp:304
#8 0x08103b5d in SimulatedBlock::executeFunction (this=0xb7be4008,
gsn=334, signal=0x847a2a4)
at SimulatedBlock.hpp:575
#9 0x082aa72d in FastScheduler::doJob (this=0x8483bc0) at
FastScheduler.cpp:137
#10 0x082ac1a1 in ThreadConfig::ipControlLoop (this=0x848cdc0) at
ThreadConfig.cpp:175
#11 0x080e1b23 in main (argc=2, argv=0xbffffa84) at main.cpp:470
********************
* 2. set StartFailureTimeout, not set MaxNoOfTables in config.ini
********************
2.1 config.ini
[ndbd default]
NoOfReplicas= 2
DataMemory= 10M
IndexMemory= 10M
StartFailureTimeout=1000 # 1sec
[ndb_mgmd]
Id=1
HostName= 127.0.0.1
DataDir=/usr/local/mysql/data
[ndbd]
Id= 2
HostName= 127.0.0.1
DataDir= /usr/local/mysql/data/node2
... (4 ndb nodes)
2.2 start cluster
$libexec/ndb_mgmd
$libexec/ndbd --initial
2.3 test result
ndbd starting
after 1 second ( StartFailureTimeout(1 second) )
start failed: timeout.
ndb_2_out.log:
2006-09-11 16:08:43 [ndbd] ALERT -- Node 2: Forced node shutdown
completed. Occured during startphase 1. Initiated by signal 6. Caused by
error 2303: 'System error, node killed during node restart by other
node(Internal error, programming error or missing error message, please
report a bug)
2.4 the call stack
(gdb) bt
#0 0x00bf8c38 in raise () from /lib/tls/i686/libc.so.6
#1 0x00bfa0b8 in abort () from /lib/tls/i686/libc.so.6
#2 0x080e0c8a in childAbort (code=-1, currentStartPhase=1) at
main.cpp:105
#3 0x082ae545 in NdbShutdown (type=NST_ErrorHandlerStartup,
restartType=NRT_Default)
at Emulator.cpp:255
#4 0x082b799e in ErrorReporter::handleError (messageID=2303,
problemData=0x878bf70 "Shutting down node as total restart time
exceeds StartFailureTimeout as set in config file 1000",
objRef=0xbffff6e0 "QMGR (Line: 172) 0x0000000e",
nst=NST_ErrorHandlerStartup) at ErrorReporter.cpp:206
#5 0x082a6338 in SimulatedBlock::progError (this=0x8549178, line=172,
err_code=2303,
extra=0x878bf70 "Shutting down node as total restart time exceeds
StartFailureTimeout as set in config file 1000") at
SimulatedBlock.cpp:738
#6 0x08240eca in Qmgr::execCONTINUEB (this=0x8549178, signal=0x847a2a4)
at qmgr/QmgrMain.cpp:172
#7 0x08103b5d in SimulatedBlock::executeFunction (this=0x8549178,
gsn=164, signal=0x847a2a4)
at SimulatedBlock.hpp:575
#8 0x082aa72d in FastScheduler::doJob (this=0x8483bc0) at
FastScheduler.cpp:137
#9 0x082ac1a1 in ThreadConfig::ipControlLoop (this=0x848cdc0) at
ThreadConfig.cpp:175
#10 0x080e1b23 in main (argc=2, argv=0xbffffa74) at main.cpp:470
********************
* 3. set StartFailureTimeout, not set MaxNoOfTables in config.ini,
* let it sleep 2 seconds in Dbtup::initRecords().
********************
3.1 config.ini
the same as test 2.
3.2 start cluster
$libexec/ndb_mgmd
$libexec/ndbd --initial
3.3 test result
ndbd starting
after 3 second ( sleep time(2 seconds) + StartFailureTimeout(1
second) )
start failed: timeout.
ndb_2_out.log:
the same as test 2.
3.4 the call stack
the same as test 2.
***************
* Conclusion
***************
The StartFailureTimeout parameter is processed in CONTINUEB signal in
QMGR block, the MaxNoOfTables in READ_CONFIG_REQ signal in DBTUP
block.
The READ_CONFIG_REQ signal is processed before CONTINUEB signal.
If set MaxNoOfTables to a very large value in config.ini, because
MaxNoOfTables is processed earlier, it will fail in error 'cannot
allocate memory' not in 'start timeout'.
[12 Sep 2006 2:15]
david li
not a bug.
