Bug #49207 | Nodefailure can "hang" ndb_mgmd with node id allocation | ||
---|---|---|---|
Submitted: | 30 Nov 2009 13:59 | Modified: | 9 Dec 2009 12:56 |
Reporter: | Jonas Oreland | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-5.1-telco-6.3 | OS: | Any |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[30 Nov 2009 13:59]
Jonas Oreland
[30 Nov 2009 14:29]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/92098 3175 Jonas Oreland 2009-11-30 ndb - bug#49207 - improve mutex handling wrt to alloc_node_id
[30 Nov 2009 14:52]
Jonas Oreland
pushed to 6.3.29 and 7.0.10
[30 Nov 2009 14:52]
Jonas Oreland
note to magnus: preferably this would have resulted in a new testcase in testMgmd, but I didn't...sorry
[1 Dec 2009 13:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/92273 3167 Martin Skold 2009-12-01 [merge] Merge modified: storage/ndb/src/common/debugger/EventLogger.cpp storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp storage/ndb/src/kernel/blocks/ndbfs/AsyncIoThread.hpp storage/ndb/src/kernel/blocks/ndbfs/MemoryChannel.hpp storage/ndb/src/kernel/blocks/pgman.cpp storage/ndb/src/kernel/blocks/pgman.hpp storage/ndb/src/mgmsrv/MgmtSrvr.cpp storage/ndb/src/ndbapi/NdbOperationDefine.cpp storage/ndb/src/ndbapi/NdbOperationSearch.cpp storage/ndb/test/ndbapi/testBlobs.cpp storage/ndb/test/run-test/daily-basic-tests.txt storage/ndb/test/run-test/daily-devel-tests.txt
[1 Dec 2009 13:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/92279 3244 Martin Skold 2009-12-01 [merge] Merge modified: storage/ndb/src/common/debugger/EventLogger.cpp storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp storage/ndb/src/kernel/blocks/ndbfs/AsyncIoThread.hpp storage/ndb/src/kernel/blocks/ndbfs/MemoryChannel.hpp storage/ndb/src/kernel/blocks/pgman.cpp storage/ndb/src/kernel/blocks/pgman.hpp storage/ndb/src/mgmsrv/MgmtSrvr.cpp storage/ndb/src/ndbapi/NdbOperationDefine.cpp storage/ndb/src/ndbapi/NdbOperationSearch.cpp storage/ndb/test/ndbapi/testBlobs.cpp storage/ndb/test/run-test/daily-basic-tests.txt storage/ndb/test/run-test/daily-devel-tests.txt
[1 Dec 2009 14:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/92287 3170 Martin Skold 2009-12-01 [merge] Merge modified: storage/ndb/src/common/debugger/EventLogger.cpp storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.hpp storage/ndb/src/kernel/blocks/ndbfs/MemoryChannel.hpp storage/ndb/src/kernel/blocks/pgman.cpp storage/ndb/src/kernel/blocks/pgman.hpp storage/ndb/src/mgmsrv/MgmtSrvr.cpp storage/ndb/src/ndbapi/NdbOperationDefine.cpp storage/ndb/src/ndbapi/NdbOperationSearch.cpp storage/ndb/test/ndbapi/testBlobs.cpp storage/ndb/test/run-test/daily-basic-tests.txt storage/ndb/test/run-test/daily-devel-tests.txt
[9 Dec 2009 12:56]
Jon Stephens
Documented bugfix in the NDB-6.3.29 and 7.0.10 changelogs as follows: If the master data node receiving a request from a newly-started API or data node for a node ID died before the request has been handled, the management server waited (and kept a mutex) until all handling of this node failure was complete before responding to any other connections, instead of responding to other connections as soon as it was informed of the node failure (that is, it waited until it had received a NF_COMPLETEREP signal rather than a NODE_FAILREP signal). On visible effect of this misbehavior was that it caused management client commands such as SHOW and ALL STATUS to respond with unnecessary slowness in such circumstances. Closed.