Bug #46914 | cluster crash, SimulatedBlock.cpp DBTUP (Line: 662), failed ndbrequire | ||
---|---|---|---|
Submitted: | 25 Aug 2009 10:11 | Modified: | 18 Feb 2010 8:45 |
Reporter: | Bogdan Kecman | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | mysql-5.1-telco-7.0 | OS: | Any |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
Tags: | mysql-5.1.34 ndb-7.0.6 |
[25 Aug 2009 10:11]
Bogdan Kecman
[27 Aug 2009 12:27]
Bogdan Kecman
Looks like ndbmtd reaches the LongMessageBuffer limit faster then ndbd, so incresing LongMessageBuffer from default 4M to 8M or more should solve the problem.
[27 Sep 2009 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[28 Oct 2009 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[9 Nov 2009 19:09]
Andrew Hutchings
Test case: Need minimum 2xndbmtd, 1xmysqld (with 4 [mysqld] sections in config.ini) config.ini: LongMessageBuffer=512K MaxNoOfExecutionThreads=4 my.cnf: ndb-cluster-connection-pool=4 log-bin shell> mysqlslap -uroot --auto-generate-sql -endb -c4 -x4 --number-of-queries=10000 --commit=10
[10 Nov 2009 12:11]
Jonas Oreland
So the problem is *with* replication and ndbtmd In ndbd, the commit triggers fire and puts data directly into SUMA buffer But with ndbmtd, LQH(s) and SUMA runs in different threads, so this is not possible, therefor the LongMessageBuffer is used to pass the data. But that resource can be exhausted, causing the crash. In ndbd 1) it's has it's own memory manager (using DataMemory) 2) if that is exhauseted, it's handled "gracefully" (datanodes stay alive, but replication gets GAP event) --- So a solution, will have to 1) use a different memory pool to pass data between LQH/SUMA (based on DM) 2) handle out of memory gracefully
[10 Nov 2009 12:24]
Andrew Hutchings
Yes, without log-bin I hit bug#48441 instead (until that was fixed). It was very easy to hit this bug once log-bin was turned on.
[8 Feb 2010 8:20]
Andrew Hutchings
Note that the error message for this is now changing to: Status: Permanent error, external action needed Message: Signal lost, out of long signal memory, please increase LongMessageBuffer (Resource configuration error) Error: 2351 Error data: Out of LongMessageBuffer in sendSignal
[8 Feb 2010 21:00]
Jonas Oreland
tentative patch
Attachment: tot.patch (application/octet-stream, text), 14.54 KiB.
[18 Feb 2010 6:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100698 3412 Jonas Oreland 2010-02-18 ndb - bug#46914 - pass replication data from TUP to SUMA in datamemory rather than in longmessagebuffer
[18 Feb 2010 6:38]
Jonas Oreland
pushed to 7.0.13 and 7.1.2
[18 Feb 2010 8:45]
Jon Stephens
Documented bugfix in the NDB-7.0.13 and 7.1.2 changelogs as follows: Replication of a MySQL Cluster using multi-threaded data nodes could fail with forced shutdowns of some data nodes due to the fact that ndbmtd exhausted the LongMessageBuffer much more quickly than ndbd. After this fix, passing of replication data between the DBTUP and SUMA NDB kernel blocks is done using DataMemory rather than LongMessageBuffer. Until you can upgrade, you may be able to work around this issue by increasing the LongMessageBuffer setting; doubling the default value should be sufficient in most cases. Also noted this issue and workaround in the LongMessageBuffer description in the Manual. Closed.
[24 Feb 2010 14:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/101329 3407 Martin Skold 2010-02-24 [merge] Merge added: storage/ndb/cmake/ storage/ndb/cmake/cmake_parse_arguments.cmake storage/ndb/cmake/libutils.cmake modified: .bzrignore mysql-test/suite/ndb/r/ndb_dd_alter.result mysql-test/suite/ndb/r/ndb_dd_restore_compat.result mysql-test/suite/ndb/t/show_attributes.inc storage/ndb/CMakeLists.txt storage/ndb/Makefile.am storage/ndb/config/type_ndbapitools.cmake storage/ndb/include/kernel/GlobalSignalNumbers.h storage/ndb/include/util/Parser.hpp storage/ndb/src/CMakeLists.txt storage/ndb/src/kernel/blocks/CMakeLists.txt storage/ndb/src/kernel/blocks/dbacc/Dbacc.hpp storage/ndb/src/kernel/blocks/dbacc/DbaccInit.cpp storage/ndb/src/kernel/blocks/dbacc/DbaccMain.cpp storage/ndb/src/kernel/blocks/dblqh/Dblqh.hpp storage/ndb/src/kernel/blocks/dblqh/DblqhInit.cpp storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp storage/ndb/src/kernel/blocks/dbtup/Dbtup.hpp storage/ndb/src/kernel/blocks/dbtup/DbtupGen.cpp storage/ndb/src/kernel/blocks/dbtup/DbtupTrigger.cpp storage/ndb/src/kernel/blocks/suma/Suma.cpp storage/ndb/src/kernel/blocks/suma/Suma.hpp storage/ndb/src/kernel/blocks/suma/SumaInit.cpp storage/ndb/src/kernel/vm/ArrayPool.hpp storage/ndb/src/kernel/vm/ndbd_malloc_impl.cpp storage/ndb/src/mgmsrv/CMakeLists.txt storage/ndb/src/mgmsrv/MgmtSrvr.cpp storage/ndb/src/mgmsrv/MgmtSrvr.hpp storage/ndb/src/mgmsrv/Services.cpp storage/ndb/src/ndbapi/NdbDictionary.cpp storage/ndb/test/include/HugoTransactions.hpp storage/ndb/test/include/NdbRestarter.hpp storage/ndb/test/ndbapi/testMgm.cpp storage/ndb/test/ndbapi/testUpgrade.cpp storage/ndb/test/run-test/command.cpp storage/ndb/test/run-test/db.cpp storage/ndb/test/run-test/upgrade-tests.txt storage/ndb/test/src/CpcClient.cpp storage/ndb/test/src/HugoTransactions.cpp storage/ndb/test/src/NdbRestarter.cpp storage/ndb/tools/CMakeLists.txt storage/ndb/tools/desc.cpp