| Bug #46914 | cluster crash, SimulatedBlock.cpp DBTUP (Line: 662), failed ndbrequire | ||
|---|---|---|---|
| Submitted: | 25 Aug 12:11 | Modified: | 9 Nov 20:09 |
| Reporter: | Bogdan Kecman | ||
| Status: | Verified | ||
| Category: | Server: Cluster | Severity: | S2 (Serious) |
| Version: | mysql-5.1-telco-7.0 | OS: | Any |
| Assigned to: | Jonas Oreland | Target Version: | |
| Tags: | mysql-5.1.34 ndb-7.0.6 | ||
| Triage: | Triaged: D2 (Serious) / R3 (Medium) / E4 (High) | ||
[25 Aug 12:11]
Bogdan Kecman
[27 Aug 14:27]
Bogdan Kecman
Looks like ndbmtd reaches the LongMessageBuffer limit faster then ndbd, so incresing LongMessageBuffer from default 4M to 8M or more should solve the problem.
[28 Sep 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[28 Oct 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[9 Nov 20:09]
Andrew Hutchings
Test case: Need minimum 2xndbmtd, 1xmysqld (with 4 [mysqld] sections in config.ini) config.ini: LongMessageBuffer=512K MaxNoOfExecutionThreads=4 my.cnf: ndb-cluster-connection-pool=4 log-bin shell> mysqlslap -uroot --auto-generate-sql -endb -c4 -x4 --number-of-queries=10000 --commit=10
[10 Nov 13:11]
Jonas Oreland
So the problem is *with* replication and ndbtmd In ndbd, the commit triggers fire and puts data directly into SUMA buffer But with ndbmtd, LQH(s) and SUMA runs in different threads, so this is not possible, therefor the LongMessageBuffer is used to pass the data. But that resource can be exhausted, causing the crash. In ndbd 1) it's has it's own memory manager (using DataMemory) 2) if that is exhauseted, it's handled "gracefully" (datanodes stay alive, but replication gets GAP event) --- So a solution, will have to 1) use a different memory pool to pass data between LQH/SUMA (based on DM) 2) handle out of memory gracefully
[10 Nov 13:24]
Andrew Hutchings
Yes, without log-bin I hit bug#48441 instead (until that was fixed). It was very easy to hit this bug once log-bin was turned on.
