Bug #19669 | Sometimes StartFailureTimeout isn't applied properly | ||
---|---|---|---|
Submitted: | 10 May 2006 10:45 | Modified: | 12 Sep 2006 2:18 |
Reporter: | Serge Kozlov | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 5.1.10 (beta) | OS: | Linux (FC4) |
Assigned to: | david li | CPU Architecture: | Any |
[10 May 2006 10:45]
Serge Kozlov
[12 Sep 2006 2:13]
david li
I did the following tests: Simple Discription: 1. set StartFailureTimeout & MaxNoOfTables in config.ini (according to the bug report) 2. set StartFailureTimeout, not set MaxNoOfTables in config.ini (to check whether StartFailureTimeout parameter is effective) 3. set StartFailureTimeout, not set MaxNoOfTables in config.ini, let it sleep 2 seconds in Dbtup::initRecords(). (to check whether MaxNoOfTables parameter is processed before StartFailureTimeout) Detailed Description: ******************** * 1. set StartFailureTimeout & MaxNoOfTables in config.ini ******************** 1.1 config.ini [ndbd default] NoOfReplicas= 2 DataMemory= 10M IndexMemory= 10M StartFailureTimeout=1000 # 1sec MaxNoOfTables=1000000000 # more than the total memory of test box [ndb_mgmd] Id=1 HostName= 127.0.0.1 DataDir=/usr/local/mysql/data [ndbd] Id= 2 HostName= 127.0.0.1 DataDir= /usr/local/mysql/data/node2 ... (4 ndb nodes) 1.2 start cluster $libexec/ndb_mgmd $libexec/ndbd --initial 1.3 test result ndbd starting after n seconds ( n > StartFailureTimeout(1 second) ) start failed: cannot allocate memory. ndb_2_out.log 2006-09-11 16:41:48 [ndbd] ALERT -- Node 2: Forced node shutdown completed. Occured during startphase 0. Initiated by signal 6. Caused by error 2327: 'Memory allocation failure, please decrease some configuration parameters(Configuration error). Permanent error, external action needed'. 1.4 the call stack (gdb) bt #0 0x00bf8c38 in raise () from /lib/tls/i686/libc.so.6 #1 0x00bfa0b8 in abort () from /lib/tls/i686/libc.so.6 #2 0x080e0c8a in childAbort (code=-1, currentStartPhase=0) at main.cpp:105 #3 0x082ae545 in NdbShutdown (type=NST_ErrorHandlerStartup, restartType=NRT_Default) at Emulator.cpp:255 #4 0x082b799e in ErrorReporter::handleError (messageID=2327, problemData=0xbffff630 "DBTUP could not allocate memory for TableDescriptor", objRef=0xbffff530 "Requested: 4x2820146816 = 2690652672 bytes", nst=NST_ErrorHandlerStartup) at ErrorReporter.cpp:206 #5 0x082a6182 in SimulatedBlock::allocRecord (this=0xb7be4008, type=0x8331647 "TableDescriptor", s=4, n=2820146816, clear=true) at SimulatedBlock.cpp:683 #6 0x0821cd90 in Dbtup::initRecords (this=0xb7be4008) at dbtup/DbtupGen.cpp:366 #7 0x0821ca14 in Dbtup::execREAD_CONFIG_REQ (this=0xb7be4008, signal=0x847a2a4) at dbtup/DbtupGen.cpp:304 #8 0x08103b5d in SimulatedBlock::executeFunction (this=0xb7be4008, gsn=334, signal=0x847a2a4) at SimulatedBlock.hpp:575 #9 0x082aa72d in FastScheduler::doJob (this=0x8483bc0) at FastScheduler.cpp:137 #10 0x082ac1a1 in ThreadConfig::ipControlLoop (this=0x848cdc0) at ThreadConfig.cpp:175 #11 0x080e1b23 in main (argc=2, argv=0xbffffa84) at main.cpp:470 ******************** * 2. set StartFailureTimeout, not set MaxNoOfTables in config.ini ******************** 2.1 config.ini [ndbd default] NoOfReplicas= 2 DataMemory= 10M IndexMemory= 10M StartFailureTimeout=1000 # 1sec [ndb_mgmd] Id=1 HostName= 127.0.0.1 DataDir=/usr/local/mysql/data [ndbd] Id= 2 HostName= 127.0.0.1 DataDir= /usr/local/mysql/data/node2 ... (4 ndb nodes) 2.2 start cluster $libexec/ndb_mgmd $libexec/ndbd --initial 2.3 test result ndbd starting after 1 second ( StartFailureTimeout(1 second) ) start failed: timeout. ndb_2_out.log: 2006-09-11 16:08:43 [ndbd] ALERT -- Node 2: Forced node shutdown completed. Occured during startphase 1. Initiated by signal 6. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug) 2.4 the call stack (gdb) bt #0 0x00bf8c38 in raise () from /lib/tls/i686/libc.so.6 #1 0x00bfa0b8 in abort () from /lib/tls/i686/libc.so.6 #2 0x080e0c8a in childAbort (code=-1, currentStartPhase=1) at main.cpp:105 #3 0x082ae545 in NdbShutdown (type=NST_ErrorHandlerStartup, restartType=NRT_Default) at Emulator.cpp:255 #4 0x082b799e in ErrorReporter::handleError (messageID=2303, problemData=0x878bf70 "Shutting down node as total restart time exceeds StartFailureTimeout as set in config file 1000", objRef=0xbffff6e0 "QMGR (Line: 172) 0x0000000e", nst=NST_ErrorHandlerStartup) at ErrorReporter.cpp:206 #5 0x082a6338 in SimulatedBlock::progError (this=0x8549178, line=172, err_code=2303, extra=0x878bf70 "Shutting down node as total restart time exceeds StartFailureTimeout as set in config file 1000") at SimulatedBlock.cpp:738 #6 0x08240eca in Qmgr::execCONTINUEB (this=0x8549178, signal=0x847a2a4) at qmgr/QmgrMain.cpp:172 #7 0x08103b5d in SimulatedBlock::executeFunction (this=0x8549178, gsn=164, signal=0x847a2a4) at SimulatedBlock.hpp:575 #8 0x082aa72d in FastScheduler::doJob (this=0x8483bc0) at FastScheduler.cpp:137 #9 0x082ac1a1 in ThreadConfig::ipControlLoop (this=0x848cdc0) at ThreadConfig.cpp:175 #10 0x080e1b23 in main (argc=2, argv=0xbffffa74) at main.cpp:470 ******************** * 3. set StartFailureTimeout, not set MaxNoOfTables in config.ini, * let it sleep 2 seconds in Dbtup::initRecords(). ******************** 3.1 config.ini the same as test 2. 3.2 start cluster $libexec/ndb_mgmd $libexec/ndbd --initial 3.3 test result ndbd starting after 3 second ( sleep time(2 seconds) + StartFailureTimeout(1 second) ) start failed: timeout. ndb_2_out.log: the same as test 2. 3.4 the call stack the same as test 2. *************** * Conclusion *************** The StartFailureTimeout parameter is processed in CONTINUEB signal in QMGR block, the MaxNoOfTables in READ_CONFIG_REQ signal in DBTUP block. The READ_CONFIG_REQ signal is processed before CONTINUEB signal. If set MaxNoOfTables to a very large value in config.ini, because MaxNoOfTables is processed earlier, it will fail in error 'cannot allocate memory' not in 'start timeout'.
[12 Sep 2006 2:15]
david li
not a bug.