| Bug #22299 | mgmd crash due to unchecked TransporterFacade::ThreadData expand() | ||
|---|---|---|---|
| Submitted: | 13 Sep 2006 7:41 | Modified: | 3 Jan 2007 3:32 |
| Reporter: | Stewart Smith | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
| Version: | 5.0, 5.1 | OS: | |
| Assigned to: | Stewart Smith | CPU Architecture: | Any |
[13 Sep 2006 9:23]
Jonas Oreland
Decreasing prio per discussion with Stewart. Basic cause, this has never been observed to happen, even if it's teoretically possible
[3 Nov 2006 12:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/14803 ChangeSet@1.2277, 2006-11-03 23:56:25+11:00, stewart@willster.(none) +1 -0 BUG#22299 mgmd crash due to unchecked TransporterFacade::ThreadData expand() abort if we ever fail to expand a Vector
[8 Nov 2006 6:10]
Stewart Smith
pushed to 5.1-ndb
[4 Dec 2006 8:31]
Martin Skold
Pushed to 5.1.14
[29 Dec 2006 0:36]
Stewart Smith
pushed to 5.0-ndb
[29 Dec 2006 8:18]
Stewart Smith
pushed to 5.0.34
[3 Jan 2007 3:32]
Jon Stephens
I don't see anything here that affects end users directly; closing w/o further action at this time.

Description: > Hello, > management-server crashed again. >> I've now modified the ndb_mgmd init script to enable core dumps >> for the ndb_mgmd process by adding an "ulimit -c unlimited" to >> the startup file and restarted the management server, so on the >> next crash we should be able to get some more information on >> what happened from the core file in /var/lib/mysql-cluster ok, we've got a core file now, the backtrace looks like this: (gdb) Program terminated with signal 6, Aborted. (gdb) bt #0 0xb7e6683b in ??? from /lib/tls/libc.so.6 #1 0xb7e67fa2 in ??? from /lib/tls/libc.so.6 #2 0x080a022f in Vector<unsigned int>::operator[] () #3 0x08095553 in TransporterFacade::ThreadData::open () #4 0x08094872 in TransporterFacade::open () #5 0x080a7740 in SignalSender::SignalSender () #6 0x08080f4d in MgmtSrvr::sendVersionReq () #7 0x08080e4d in MgmtSrvr::versionNode () #8 0x08081e4d in MgmtSrvr::status () #9 0x08088172 in MgmApiSession::getStatus () #10 0x0808ac05 in Parser<MgmApiSession>::run () #11 0x08086465 in MgmApiSession::runSession () #12 0x080c8f0e in sessionThread_C () #13 0x080c189e in ndb_thread_wrapper () #14 0xb7fdbb63 in __nptl_setxid () from /lib/tls/libpthread.so.0 #15 0xb7f1618a in ruserpass () from /lib/tls/libc.so.6 the SIGABRT is probably thrown here: mysql-4.1/ndb/include/util/Vector.hpp:70 66: template<class T> 67: T & 68: Vector<T>::operator[](unsigned i){ 69: if(i >= m_size) * 70: abort(); 71: return m_items[i]; 72: } and caused by the first vector accesses in mysql-4.1/ndb/src/ndbapi/TransporterFacade.cpp:1132 1116: int 1117: TransporterFacade::ThreadData::open(void* objRef, 1118: ExecuteFunction fun, 1119: NodeStatusFunction fun2) 1120: { 1121: Uint32 nextFree = m_firstFree; 1122: 1123: if(m_statusNext.size() >= MAX_NO_THREADS && nextFree == END_OF_LIST){ 1124: return -1; 1125: } 1126: 1127: if(nextFree == END_OF_LIST){ 1128: expand(10); 1129: nextFree = m_firstFree; 1130: } 1131: * 1132: m_firstFree = m_statusNext[nextFree]; 1133: 1134: Object_Execute oe = { objRef , fun }; 1135: 1136: m_statusNext[nextFree] = INACTIVE; 1137: m_objectExecute[nextFree] = oe; 1138: m_statusFunction[nextFree] = fun2; 1139: 1140: return indexToNumber(nextFree); 1141: } looks to me as if the exand() silently fails? (i didn't investigate any further at this point ...) How to repeat: look for a blue moon... Suggested fix: don't crash.