Bug #42254 ndbmtd crashes with MaxNoOfThreads=8
Submitted: 21 Jan 2009 22:06 Modified: 25 Jan 2009 12:48
Reporter: Hartmut Holzgraefe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1.30-ndb-6.4.0 OS:Linux
Assigned to: Jonas Oreland CPU Architecture:Any

[21 Jan 2009 22:06] Hartmut Holzgraefe
Description:
While trying to fill a test cluster with two data nodes and MaxNoOfThreads=8 (on a machine that actually has only 4, not 8, cores) the first data node crashes with a segmentation fault. It reports the fact to the management node but does not write any error and trace logs locally. The 2nd data node then fails with an abort signal when it becomes DICT master.

Running the same test with MaxNoOfThreads=4 (which equals the number of actual CPU cores) everything works fine though. None of the data nodes crashes and when forcing a segfault with "kill -11" the crashing nodes writes its error and trace log just fine, and the other data node does not fail when taking over the master role.

How to repeat:
- create a two node cluster with MaxNoOfThreads=8 and NoOfReplicas=2
  (on a 4 core machine, no idea if that matters)

- run the attached test.sql script

- see things crash
[21 Jan 2009 22:11] Hartmut Holzgraefe
SQL test script causing the crash

Attachment: test.sql (text/x-sql), 1.73 KiB.

[21 Jan 2009 22:16] Tomas Ulin
This could be a duplicate of http://bugs.mysql.com/bug.php?id=42052

If it looks like that, please retry with 6.4.1
[21 Jan 2009 22:25] Hartmut Holzgraefe
The 2nd nodes crash on master role takeover may be a duplicate of bug #42052,
the first nodes crash is due to a segmentation fault though, not an abort.
[21 Jan 2009 23:03] Hartmut Holzgraefe
First nodes segfault and lack of error and trace logs persist on 6.4.1,
abort of 2nd node on takeover seems to be fixed though.
[22 Jan 2009 9:13] Tomas Ulin
could verify on ndbsup using 6.4.1
[23 Jan 2009 8:43] Tomas Ulin
the bug is "out of SharedGlobalMemory"

Workaround, increase SharedGlobalMemory to over 64M

Bugfix will change so that Jobbuffers have dedicated memory instead of using SharedGlobalMemory
[23 Jan 2009 14:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63902

3226 Jonas Oreland	2009-01-23
      ndb - bug#42254 - make sure buffers are allocated correctly in ndbmtd
[23 Jan 2009 14:22] Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:jonas@mysql.com-20090123142131-0amhg2p9hgbanmo7) (version source revid:jonas@mysql.com-20090123142131-0amhg2p9hgbanmo7) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[23 Jan 2009 14:25] Jonas Oreland
pushed to 6.4.2 
(not configure.in is incorrect in current close)

hartmut,
please retest if you have time
[25 Jan 2009 12:48] Jon Stephens
Documented bugfix in the NDB-6.4.2 changelog as follows:

        When using ndbmtd, setting MaxNoOfThreads to a value higher than
        the actual number of cores and with insufficient
        SharedGlobalMemory caused the data nodes to crash.

        The fix for this issue changes the behavior of ndbmtd such that
        its internal job buffers no longer rely on SharedGLobalMemory.

Also fixed typo in Synopsis.