MySQL Bugs: #42254: ndbmtd crashes with MaxNoOfThreads=8

Bug #42254	ndbmtd crashes with MaxNoOfThreads=8
Submitted:	21 Jan 2009 22:06	Modified:	25 Jan 2009 12:48
Reporter:	Hartmut Holzgraefe	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	mysql-5.1.30-ndb-6.4.0	OS:	Linux
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
While trying to fill a test cluster with two data nodes and MaxNoOfThreads=8 (on a machine that actually has only 4, not 8, cores) the first data node crashes with a segmentation fault. It reports the fact to the management node but does not write any error and trace logs locally. The 2nd data node then fails with an abort signal when it becomes DICT master.

Running the same test with MaxNoOfThreads=4 (which equals the number of actual CPU cores) everything works fine though. None of the data nodes crashes and when forcing a segfault with "kill -11" the crashing nodes writes its error and trace log just fine, and the other data node does not fail when taking over the master role.

How to repeat:
- create a two node cluster with MaxNoOfThreads=8 and NoOfReplicas=2
  (on a 4 core machine, no idea if that matters)

- run the attached test.sql script

- see things crash

SQL test script causing the crash

Attachment: test.sql (text/x-sql), 1.73 KiB.

This could be a duplicate of http://bugs.mysql.com/bug.php?id=42052

If it looks like that, please retry with 6.4.1

The 2nd nodes crash on master role takeover may be a duplicate of bug #42052,
the first nodes crash is due to a segmentation fault though, not an abort.

First nodes segfault and lack of error and trace logs persist on 6.4.1,
abort of 2nd node on takeover seems to be fixed though.

could verify on ndbsup using 6.4.1

the bug is "out of SharedGlobalMemory"

Workaround, increase SharedGlobalMemory to over 64M

Bugfix will change so that Jobbuffers have dedicated memory instead of using SharedGlobalMemory

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63902

3226 Jonas Oreland	2009-01-23
      ndb - bug#42254 - make sure buffers are allocated correctly in ndbmtd

Pushed into 5.1.31-ndb-6.4.1 (revid:jonas@mysql.com-20090123142131-0amhg2p9hgbanmo7) (version source revid:jonas@mysql.com-20090123142131-0amhg2p9hgbanmo7) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)

pushed to 6.4.2 
(not configure.in is incorrect in current close)

hartmut,
please retest if you have time

Documented bugfix in the NDB-6.4.2 changelog as follows:

        When using ndbmtd, setting MaxNoOfThreads to a value higher than
        the actual number of cores and with insufficient
        SharedGlobalMemory caused the data nodes to crash.

        The fix for this issue changes the behavior of ndbmtd such that
        its internal job buffers no longer rely on SharedGLobalMemory.

Also fixed typo in Synopsis.